dependency parser for swedish project for eda171 by jonas pålsson marcus stamborg

Dependency Parser for Swedish

Project for EDA171

byJonas Pålsson

Marcus Stamborg

Dependency Grammar Describes relations between words in a sentence A relation is between a head and its dependent(s) All words have a head except the root of a sentence

The big brown beaverbrown

The

beaver

big

Dependency Parsing

Find the links that connects words using a computer. Different algorithms exist. Nivre's parser has reported the best results for swedish.

Nivre's Parser Extension to Shift-Reduce. Adds arcs between input and stack. Produces a dependency graph using the following

actions: Shift - moves the input to the stack. Reduce - pops the stack. Left arc - creates an arc from input to stack. Right arc - creates an arc from stack to input.

More about actions

Nivre, J. (2004)

Corpus

Talbanken05 – modernized and computerized version of Talbanken76

Modified for use in CoNNL-X Shared Task Training set is about 11500 sentences We used a test set containing about 300 sentences

Example from the corpus:

1 Jag _ PO PO _ 2 SS _ _2 tycker _ VV VV _ 0 ROOT_ _3 det _ PO PO _ 2 OO _ _

How we did it

Collect data Build model Parse

ARFFBuilder

Trainer

Parser

Train Corpus

Data

Trained Classifier

Test Corpus with relations

Test Corpus

Collect data – Gold Standard Parsing

Build Weka compatible data file (arff). Determining the action sequence from an annotated

corpus is possible using the following rules. (Gold Standard Parsing) If input has stack as head -> Right Arc else if stack has input as head -> Left Arc else if arc exists between input and any word in stack -> Reduce else Shift

Train classifier Weka 3 – Data mining software C4.5 (J48) – Extension to the ID3 algorithm.

Generates decision trees Uses features derived from the current state of the

parser Outputs a trained classifier used by the parser to decide

the next action

Parse using trained classifier Uses the trained classifier to determine the head for

each word in a sentence Uses Nivre's algorithm with action decided by the

classifier Calculates the score as nbrWords assigned correct head

total number of words

Features All features describe the current state of the parser 1st set – Input and stack 2nd set – Input, stack and children. 3rd set – Input, stack and previous input. 4th set – Input, stack, children and previous input. We only used POS in the feature sets Using lexical values actually decreased performance For every set we used constraints to model valid actions

in the current state of the parser

Results

Input 1 2 3 4 5 6Stack 1 0.7161 0.8007 0.7972 0.7967 0.8036 0.8064

2 0.7268 0.8078 0.8055 0.8094 0.8136 0.81293 0.7275 0.8066 0.8076 0.8098 0.8129 0.81314 0.7300 0.8057 0.8076 0.8094 0.8096 0.80915 0.7309 0.8073 0.8071 0.8096 0.8101 0.80976 0.7307 0.8064 0.8071 0.8089 0.8092 0.8094

Scores using features:Stack_n_POS, Input_n_POS, Children

Input 1 2 3 4 5 6Stack 1 0.6936 0.7765 0.7804 0.7801 0.7779 0.7806

2 0.7297 0.7937 0.7970 0.7961 0.7958 0.79463 0.7300 0.7933 0.7963 0.7958 0.7940 0.79444 0.7309 0.7940 0.7967 0.7972 0.7960 0.79535 0.7327 0.7944 0.7974 0.7984 0.7969 0.79606 0.7313 0.7940 0.7972 0.7986 0.7965 0.7960

Scores using features:Stack_n_POS, Input_n_POS

Results cont.

Input 1 2 3 4 5 6Stack 1 0.7242 0.8022 0.8055 0.8052 0.8046 0.8050

2 0.7558 0.8156 0.8168 0.8179 0.8174 0.81823 0.7580 0.8152 0.8186 0.8184 0.8174 0.81844 0.7581 0.8158 0.8177 0.8184 0.8172 0.81755 0.7594 0.8167 0.8182 0.8186 0.8174 0.81776 0.7574 0.8161 0.8181 0.8177 0.8165 0.8172

Scores using features:Stack_n_POS, Input_n_POS, Children, Previous_Input_POS

Input 1 2 3 4 5 6Stack 1 0.7210 0.7999 0.8004 0.8002 0.8062 0.8076

2 0.7279 0.8064 0.8068 0.8108 0.8110 0.81423 0.7283 0.8068 0.8068 0.8101 0.8136 0.81384 0.7307 0.8068 0.8089 0.8106 0.8108 0.81055 0.7316 0.8068 0.8075 0.8103 0.8114 0.81146 0.7344 0.8064 0.8076 0.8101 0.8106 0.8108

Scores using features:Stack_n_POS, Input_n_POS, Previous_Input_POS

Conclusions Lexical values didn’t do much. Score even became

worse. Might be better with different classifying algorithm or different test corpus

Previous input word was a very effective feature, probably the single best addition from only stack and input

Difficult to find optimal feature set

Future improvements

Try other features Siblings Use LEX on specific words More words from original input string

Simulations to find the optimum feature set Use SVM instead of C4.5

Thank you for listening

More to come in the report

dependency parser for swedish project for eda171 by jonas pålsson marcus stamborg

Documents

previous input

gold standard parsingif

test set

current state

head right arcelse

head left arcelse

score asfeaturesall

decision treesuses features