transition based dependency parsing with deep learningtransition based dependency parsing with deep...

Post on 05-Oct-2020

28 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Transition Based Dependency Parsing with DeepLearning

Omer Kırnap

Koc University

okirnapkuedutr

September 27 2018

Omer Kırnap (Koc University) MSc Thesis September 27 2018 1 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 2 123

1 Introduction

Omer Kırnap (Koc University) MSc Thesis September 27 2018 3 123

Introduction

What is dependency parsing

Dependency parsing aims to detect word relations by finding the treestructure of a sentence inspired by dependency grammar

Figure Dependency annotations for a sentence ldquo Economic news had little effecton financial marketsrdquo

1

1Figure from S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 4 123

Introduction

Why do we need dependency parsing

Dependencies resolve ambiguity

Useful for some down-stream tasks in NLP

2

2Figure from httpwwwphontroncomslidesnlp-programming-en-11-dependpdfOmer Kırnap (Koc University) MSc Thesis September 27 2018 5 123

Introduction

Dependency Parsing Categorization

Grammar BasedRelying on a formal grammardefining a formal languageasking whether a given inputsentence is in the languagedefined by the grammar or not

Data-drivenMaking essential use of machinelearning from linguistic data in orderto parse new sentences

3

3From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 6 123

Introduction

Data-driven Dependency Parsing

Graph Based Algorithms

Using maximum spanning tree algorithms from graph theory

Transition Based Algorithms

Capitalizing on greedy stack based algorithms to build dependency treewith incremental steps in linear time

4

4From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 7 123

Introduction

Transition Based Dependency Parsing

Transition System Abstract machine with a set of configurations(states) and transitions We use the ArcHybrid transition system[Kuhlmann et al 2011]

Configurations (σ β A)bull σ Stack of tree fragments initially emptybull β Buffer of words initially containing the whole sentencebull A Set of dependency arcs (head relation modifier) initially empty

Transitionsbull shift(σ b|βA) = (σ|b βA)bull leftd(σ|s b|βA) = (σ b|βA cup (b d s))bull rightd(σ|s|t βA) = (σ|s βA cup (s d t))

Omer Kırnap (Koc University) MSc Thesis September 27 2018 8 123

An example parsing of a sentence

Omer Kırnap (Koc University) MSc Thesis September 27 2018 9 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 2 123

1 Introduction

Omer Kırnap (Koc University) MSc Thesis September 27 2018 3 123

Introduction

What is dependency parsing

Dependency parsing aims to detect word relations by finding the treestructure of a sentence inspired by dependency grammar

Figure Dependency annotations for a sentence ldquo Economic news had little effecton financial marketsrdquo

1

1Figure from S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 4 123

Introduction

Why do we need dependency parsing

Dependencies resolve ambiguity

Useful for some down-stream tasks in NLP

2

2Figure from httpwwwphontroncomslidesnlp-programming-en-11-dependpdfOmer Kırnap (Koc University) MSc Thesis September 27 2018 5 123

Introduction

Dependency Parsing Categorization

Grammar BasedRelying on a formal grammardefining a formal languageasking whether a given inputsentence is in the languagedefined by the grammar or not

Data-drivenMaking essential use of machinelearning from linguistic data in orderto parse new sentences

3

3From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 6 123

Introduction

Data-driven Dependency Parsing

Graph Based Algorithms

Using maximum spanning tree algorithms from graph theory

Transition Based Algorithms

Capitalizing on greedy stack based algorithms to build dependency treewith incremental steps in linear time

4

4From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 7 123

Introduction

Transition Based Dependency Parsing

Transition System Abstract machine with a set of configurations(states) and transitions We use the ArcHybrid transition system[Kuhlmann et al 2011]

Configurations (σ β A)bull σ Stack of tree fragments initially emptybull β Buffer of words initially containing the whole sentencebull A Set of dependency arcs (head relation modifier) initially empty

Transitionsbull shift(σ b|βA) = (σ|b βA)bull leftd(σ|s b|βA) = (σ b|βA cup (b d s))bull rightd(σ|s|t βA) = (σ|s βA cup (s d t))

Omer Kırnap (Koc University) MSc Thesis September 27 2018 8 123

An example parsing of a sentence

Omer Kırnap (Koc University) MSc Thesis September 27 2018 9 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

1 Introduction

Omer Kırnap (Koc University) MSc Thesis September 27 2018 3 123

Introduction

What is dependency parsing

Dependency parsing aims to detect word relations by finding the treestructure of a sentence inspired by dependency grammar

Figure Dependency annotations for a sentence ldquo Economic news had little effecton financial marketsrdquo

1

1Figure from S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 4 123

Introduction

Why do we need dependency parsing

Dependencies resolve ambiguity

Useful for some down-stream tasks in NLP

2

2Figure from httpwwwphontroncomslidesnlp-programming-en-11-dependpdfOmer Kırnap (Koc University) MSc Thesis September 27 2018 5 123

Introduction

Dependency Parsing Categorization

Grammar BasedRelying on a formal grammardefining a formal languageasking whether a given inputsentence is in the languagedefined by the grammar or not

Data-drivenMaking essential use of machinelearning from linguistic data in orderto parse new sentences

3

3From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 6 123

Introduction

Data-driven Dependency Parsing

Graph Based Algorithms

Using maximum spanning tree algorithms from graph theory

Transition Based Algorithms

Capitalizing on greedy stack based algorithms to build dependency treewith incremental steps in linear time

4

4From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 7 123

Introduction

Transition Based Dependency Parsing

Transition System Abstract machine with a set of configurations(states) and transitions We use the ArcHybrid transition system[Kuhlmann et al 2011]

Configurations (σ β A)bull σ Stack of tree fragments initially emptybull β Buffer of words initially containing the whole sentencebull A Set of dependency arcs (head relation modifier) initially empty

Transitionsbull shift(σ b|βA) = (σ|b βA)bull leftd(σ|s b|βA) = (σ b|βA cup (b d s))bull rightd(σ|s|t βA) = (σ|s βA cup (s d t))

Omer Kırnap (Koc University) MSc Thesis September 27 2018 8 123

An example parsing of a sentence

Omer Kırnap (Koc University) MSc Thesis September 27 2018 9 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Introduction

What is dependency parsing

Dependency parsing aims to detect word relations by finding the treestructure of a sentence inspired by dependency grammar

Figure Dependency annotations for a sentence ldquo Economic news had little effecton financial marketsrdquo

1

1Figure from S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 4 123

Introduction

Why do we need dependency parsing

Dependencies resolve ambiguity

Useful for some down-stream tasks in NLP

2

2Figure from httpwwwphontroncomslidesnlp-programming-en-11-dependpdfOmer Kırnap (Koc University) MSc Thesis September 27 2018 5 123

Introduction

Dependency Parsing Categorization

Grammar BasedRelying on a formal grammardefining a formal languageasking whether a given inputsentence is in the languagedefined by the grammar or not

Data-drivenMaking essential use of machinelearning from linguistic data in orderto parse new sentences

3

3From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 6 123

Introduction

Data-driven Dependency Parsing

Graph Based Algorithms

Using maximum spanning tree algorithms from graph theory

Transition Based Algorithms

Capitalizing on greedy stack based algorithms to build dependency treewith incremental steps in linear time

4

4From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 7 123

Introduction

Transition Based Dependency Parsing

Transition System Abstract machine with a set of configurations(states) and transitions We use the ArcHybrid transition system[Kuhlmann et al 2011]

Configurations (σ β A)bull σ Stack of tree fragments initially emptybull β Buffer of words initially containing the whole sentencebull A Set of dependency arcs (head relation modifier) initially empty

Transitionsbull shift(σ b|βA) = (σ|b βA)bull leftd(σ|s b|βA) = (σ b|βA cup (b d s))bull rightd(σ|s|t βA) = (σ|s βA cup (s d t))

Omer Kırnap (Koc University) MSc Thesis September 27 2018 8 123

An example parsing of a sentence

Omer Kırnap (Koc University) MSc Thesis September 27 2018 9 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Introduction

Why do we need dependency parsing

Dependencies resolve ambiguity

Useful for some down-stream tasks in NLP

2

2Figure from httpwwwphontroncomslidesnlp-programming-en-11-dependpdfOmer Kırnap (Koc University) MSc Thesis September 27 2018 5 123

Introduction

Dependency Parsing Categorization

Grammar BasedRelying on a formal grammardefining a formal languageasking whether a given inputsentence is in the languagedefined by the grammar or not

Data-drivenMaking essential use of machinelearning from linguistic data in orderto parse new sentences

3

3From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 6 123

Introduction

Data-driven Dependency Parsing

Graph Based Algorithms

Using maximum spanning tree algorithms from graph theory

Transition Based Algorithms

Capitalizing on greedy stack based algorithms to build dependency treewith incremental steps in linear time

4

4From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 7 123

Introduction

Transition Based Dependency Parsing

Transition System Abstract machine with a set of configurations(states) and transitions We use the ArcHybrid transition system[Kuhlmann et al 2011]

Configurations (σ β A)bull σ Stack of tree fragments initially emptybull β Buffer of words initially containing the whole sentencebull A Set of dependency arcs (head relation modifier) initially empty

Transitionsbull shift(σ b|βA) = (σ|b βA)bull leftd(σ|s b|βA) = (σ b|βA cup (b d s))bull rightd(σ|s|t βA) = (σ|s βA cup (s d t))

Omer Kırnap (Koc University) MSc Thesis September 27 2018 8 123

An example parsing of a sentence

Omer Kırnap (Koc University) MSc Thesis September 27 2018 9 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Introduction

Dependency Parsing Categorization

Grammar BasedRelying on a formal grammardefining a formal languageasking whether a given inputsentence is in the languagedefined by the grammar or not

Data-drivenMaking essential use of machinelearning from linguistic data in orderto parse new sentences

3

3From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 6 123

Introduction

Data-driven Dependency Parsing

Graph Based Algorithms

Using maximum spanning tree algorithms from graph theory

Transition Based Algorithms

Capitalizing on greedy stack based algorithms to build dependency treewith incremental steps in linear time

4

4From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 7 123

Introduction

Transition Based Dependency Parsing

Transition System Abstract machine with a set of configurations(states) and transitions We use the ArcHybrid transition system[Kuhlmann et al 2011]

Configurations (σ β A)bull σ Stack of tree fragments initially emptybull β Buffer of words initially containing the whole sentencebull A Set of dependency arcs (head relation modifier) initially empty

Transitionsbull shift(σ b|βA) = (σ|b βA)bull leftd(σ|s b|βA) = (σ b|βA cup (b d s))bull rightd(σ|s|t βA) = (σ|s βA cup (s d t))

Omer Kırnap (Koc University) MSc Thesis September 27 2018 8 123

An example parsing of a sentence

Omer Kırnap (Koc University) MSc Thesis September 27 2018 9 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Introduction

Data-driven Dependency Parsing

Graph Based Algorithms

Using maximum spanning tree algorithms from graph theory

Transition Based Algorithms

Capitalizing on greedy stack based algorithms to build dependency treewith incremental steps in linear time

4

4From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 7 123

Introduction

Transition Based Dependency Parsing

Transition System Abstract machine with a set of configurations(states) and transitions We use the ArcHybrid transition system[Kuhlmann et al 2011]

Configurations (σ β A)bull σ Stack of tree fragments initially emptybull β Buffer of words initially containing the whole sentencebull A Set of dependency arcs (head relation modifier) initially empty

Transitionsbull shift(σ b|βA) = (σ|b βA)bull leftd(σ|s b|βA) = (σ b|βA cup (b d s))bull rightd(σ|s|t βA) = (σ|s βA cup (s d t))

Omer Kırnap (Koc University) MSc Thesis September 27 2018 8 123

An example parsing of a sentence

Omer Kırnap (Koc University) MSc Thesis September 27 2018 9 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Introduction

Transition Based Dependency Parsing

Transition System Abstract machine with a set of configurations(states) and transitions We use the ArcHybrid transition system[Kuhlmann et al 2011]

Configurations (σ β A)bull σ Stack of tree fragments initially emptybull β Buffer of words initially containing the whole sentencebull A Set of dependency arcs (head relation modifier) initially empty

Transitionsbull shift(σ b|βA) = (σ|b βA)bull leftd(σ|s b|βA) = (σ b|βA cup (b d s))bull rightd(σ|s|t βA) = (σ|s βA cup (s d t))

Omer Kırnap (Koc University) MSc Thesis September 27 2018 8 123

An example parsing of a sentence

Omer Kırnap (Koc University) MSc Thesis September 27 2018 9 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

An example parsing of a sentence

Omer Kırnap (Koc University) MSc Thesis September 27 2018 9 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions

top related