lecture 8 syntactic parsing - courses.cs.ut.ee
TRANSCRIPT
Lecture 8 Syntactic parsing
LTAT.01.001 – Natural Language ProcessingKairit Sirts ([email protected])
03.04.2020
Plan for today
● Why syntax?● Syntactic parsing
● Shallow parsing● Constituency parsing● Dependency parsing
● Transition-based dependency parsing● Neural dependency parsers● Evaluation of dependency parsing
2
Why syntax?
3
Syntax
● Internal structure of words
4
Syntactic ambiguity
5
The chicken is ready to eat.
6
7
8
Exercise
How many different meanings has the sentence:
Time flies like an arrow.
9
The role of syntax in NLP
● Text generation/summarization/machine translation● Useful features for various information extraction tasks● Syntactic structure also reflects the semantic relations between the words
10
Syntactic analysis/parsing
● Shallow parsing● Phrase structure / constituency parsing● Dependency parsing
11
Shallow parsing
12
Shallow parsing
● Also called chunking or light parsing● Split the sentence into non-overlapping syntactic phrases
13
John hit the ball.NP VP NP
NP – Noun phraseVP – Verb phrase
Shallow parsing
The morning flight from Denver has arrived. NP PP NP VP
NP – Noun phrasePP – Prepositional PhraseVP – Verb phrase
14
BIO tagging
A labelling scheme often used in information extraction problems, treated as a sequence tagging task
The morning flight from Denver has arrived. B_NP I_NP I_NP B_PP B_NP B_VP I_VP
B_NP – Beginning of a noun phraseI_NP – Inside a noun phraseB_VB – Beginning of a verb phrase etc
15
Sequence classifier
● Need annotated data for training: POS-tagged, phrase-annotated
● Use a sequence classifier of your choice● CRF with engineered features● Neural sequence tagger
16
Constituency Parsing
17
Constituency parsing
● Full constituency parsing helps to resolve structural ambiguities
18
Bracketed style
● The trees can be represented linearly with brackets
(S (Pr I)(Aux will)(VP (V do)
(NP (Det my)(N homework))NP
)VP)S
19
Context-free grammars
S → NP VPVP → V NPVP → V NP PPNP → NP NPNP → NP PPNP → NNP → ePP → P NP
N → peopleN → fishN → tanksN → rodsV → peopleV → fishV → tanksP → with
20
G = (T, N, S, R)● T – set of terminal symbols● N – set of non-terminal symbols● S – start symbol (S ∈ N)● R – set of rules/productions of
the form X → 𝛾● X ∈ N● 𝛾 ∈ (N U T)*
A grammar G generates a language L
Probabilistic PCFG
G = (T, N, S, R, P)● T – set of terminal symbols● N – set of non-terminal symbols● S – start symbol (S ∈ N)● R – set of rules/productions of the form X → 𝛾● P – probability function
● P: R → [0, 1]● ∀ X ∈ N, ∑𝑿→𝜸∈𝑹𝑷 𝑿 → 𝜸 = 𝟏
A grammar G generates a language model L
21
A PCFG
S → NP VP 1.0VP → V NP 0.6VP → V NP PP 0.4NP → NP NP 0.1NP → NP PP 0.2NP → N 0.7PP → P NP 1.0
N → people 0.5N → fish 0.2N → tanks 0.2N → rods 0.1V → people 0.1V → fish 0.6V → tanks 0.3P → with 1.0
22
The probability of strings and trees
● P(t) – the probability of a tree t is the product of the probabilities of the rules used to generate it.
𝑃 𝑡 = -.→/∈0
𝑃 𝑋 → 𝛾
● P(s) – The probability of the string s is the sum of the probabilities of thetrees which have that string as their yield
𝑃 𝑠 =34
𝑃 𝑠, 𝑡4 =34
𝑃 𝑡4
● where t is a parse of s
23
PCFG for efficient parsing
● For efficient parsing the rules should be unary or binary● Chomsky normal form – all rules have the form:
● X --> Y Z● X --> w● X, Y, Z - non-terminal symbols● w – terminal symbol● No epsilon rules
24
Before and after binarization
25
Finding the most likely tree: CKY parsing
● Dynamic programming algorithm● Proceeds bottom-up and performs Viterbi on trees
26
Dependency Parsing
27
Dependency parsing
28
• Dependency parse is a directed graph G = (V, A)• V – the set of vertices corresponding to words• A – the set of nodes corresponding to dependency relations
Dependency parsing
29
Dependency relations
● The arrows connect heads and their dependents● The main verb is the head or the root of the whole sentence● The arrows are labelled with grammatical functions/dependency
relations
30
Dependent
Head
Root of the sentence
Labelled dependency relation
Properties of a dependency graph
A dependency tree is a directed graph that satisfies the following constraints:1. There is a single designated root node that has no incoming arcs
● Typically the main verb of the sentence
2. Except for the root node, each node has exactly one incoming arc● Each dependent has a single head
3. There is a unique path from the root node to each vertex in V● The graph is acyclic and connected
31
Projectivity
● Projective trees – there are no arc crossings in the dependency graphs● Non-projective trees - crossings due to free word order
32
Dependency relationsClausal argument relations Description Example
NSUBJ Nominal subject United canceled the flight.
DOBJ Direct object United canceled the flight.
IOBJ Indirect object We booked her the flight to Paris.
Nominal modifier relations Description Example
NMOD Nominal modifier We took the morning flight.
AMOD Adjectival modifier Book the cheapest flight.
NUMMOD Numeric modifier Before the storm JetBlue canceled 1000 flights
APPOS Appositional modifier United, a unit of UAL, matched the fares.
DET Determiner Which flight was delayed?
CASE Prepositions, postpositions Book the flight through Houston.
Other notable relations Description Example
CONJ Conjunct We flew to Denver and drove to Steamboat
CC Coordinating conjunction We flew to Denver and drove to Steamboat.. 33
Universal dependencies
● http://universaldependencies.org/● Annotated treebanks in many languages● Uniform annotation scheme across all UD languages:
● Universal POS tags● Universal morphological features● Universal dependency relations
● All data in CONLL-U tabulated format
34
35
CONLL-U format
● A standard tabular format for certain type of annotated data● Each word is in a separate line● 10 tab-separated columns on each line:
1. Word index2. The word itself3. Lemma4. Universal POS5. Language specific POS6. Morphological features7. Head of the current word8. Dependency relation to the head9. Enhanced dependencies10. Any other annotation
36
CONLL-U format: example# text = They buy and sell books.
1 They they PRON PRP Case=Nom|Number=Plur 2 nsubj
2 buy buy VERB VBP Number=Plur|Person=3|Tense=Pres 0 root
3 and and CONJ CC _ 4 cc
4 sell sell VERB VBP Number=Plur|Person=3|Tense=Pres 2 conj
5 books book NOUN NNS Number=Plur 2 obj
6 . . PUNCT . _ 2 punct
# text = I had no clue.
1 I I PRON PRP Case=Nom|Number=Sing|Person=1 2 nsubj
2 had have VERB VBD Number=Sing|Person=1|Tense=Past 0 root
3 no no DET DT PronType=Neg 4 det
4 clue clue NOUN NN Number=Sing 2 obj
5 . . PUNCT . _ 2 punct 37
Dependency parsing methods
● Transition-based parsing● stack-based algorithms/shift-reduce parsing● only generate projective trees
● Graph-based algorithms● can also generate non-projective trees
38
Transition-based dependency parsing
39
Transition-based parsing
● Three main components:● Stack● Buffer● Set of dependency relations
● A configuration is the current state of the stack, buffer and the relation set
40
Arc-standard parsing system
● Initial configuration:
● Stack 𝜎 contains the ROOT symbol● Buffer 𝛽 contains all words in the sentence● Dependency relation set A is empty
41
Arc-standard parsing system
At each step perform either:● Shift – move a word from the buffer to the stack:
42
Arc-standard parsing system
At each step perform either:● Shift – move a word from the buffer to the stack:
● LeftArc – left arc between top two words in the stack, pop the second word:
43
Arc-standard parsing system
At each step perform either:● Shift – move a word from the buffer to the stack:
● LeftArc – left arc between top two words in the stack, pop the second word:
● RightArc – right arc between top two words in the stack, pop the first word:
44
Oracle
● The annotated data is in the form of a treebank● Each sentence is annotated with its dependency tree
● The task of the transition-based parser is to predict the correct parsing operation at each step:● Input is configuration● Output is parsing action: Shift, RightArc or LeftArc
● The role of the oracle is to return the correct parsing operation for each configuration in the training set● It creates a training set for the parsing model
45
Oracle
● Choose LeftArc if it produces a correct head-dependent relation given the reference parse and the current configuration
● Choose RightArc if:● It produces a correct head-dependent relation given the reference parse and the
current configuration● All of the dependents of the word at the top of the stack have already been assigned
● Otherwise choose Shift
46
Example
47
Shift:
LeftArc:
RightArc:
Starting configurationStack Buffer Action Arc
[ROOT] [The, cat, sat, on, the, mat]
48
Stack Buffer Action Arc
[ROOT] [The, cat, sat, on, the, mat] Shift
[ROOT, The] [cat, sat, on, the, mat]
49
Stack Buffer Action Arc
[ROOT] [The, cat, sat, on, the, mat] Shift
[ROOT, The] [cat, sat, on, the, mat] Shift
[ROOT, The, cat] [sat, on, the, mat]
50
Stack Buffer Action Arc
[ROOT] [The, cat, sat, on, the, mat] Shift
[ROOT, The] [cat, sat, on, the, mat] Shift
[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)
[ROOT, cat] [sat, on, the, mat]
51
Stack Buffer Action Arc
[ROOT] [The, cat, sat, on, the, mat] Shift
[ROOT, The] [cat, sat, on, the, mat] Shift
[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)
[ROOT, cat] [sat, on, the, mat] Shift
[ROOT, cat, sat] [on, the, mat]
52
Stack Buffer Action Arc
[ROOT] [The, cat, sat, on, the, mat] Shift
[ROOT, The] [cat, sat, on, the, mat] Shift
[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)
[ROOT, cat] [sat, on, the, mat] Shift
[ROOT, cat, sat] [on, the, mat] Left-Arc nsubj(cat ← sat)
[ROOT, sat] [on, the, mat]
53
Stack Buffer Action Arc
[ROOT] [The, cat, sat, on, the, mat] Shift
[ROOT, The] [cat, sat, on, the, mat] Shift
[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)
[ROOT, cat] [sat, on, the, mat] Shift
[ROOT, cat, sat] [on, the, mat] Left-Arc nsubj(cat ← sat)
[ROOT, sat] [on, the, mat] Shift
[ROOT, sat, on] [the, mat]
54
Stack Buffer Action Arc
[ROOT] [The, cat, sat, on, the, mat] Shift
[ROOT, The] [cat, sat, on, the, mat] Shift
[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)
[ROOT, cat] [sat, on, the, mat] Shift
[ROOT, cat, sat] [on, the, mat] Left-Arc nsubj(cat ← sat)
[ROOT, sat] [on, the, mat] Shift
[ROOT, sat, on] [the, mat] Shift
[ROOT, sat, on, the] [mat]
55
Stack Buffer Action Arc
[ROOT] [The, cat, sat, on, the, mat] Shift
[ROOT, The] [cat, sat, on, the, mat] Shift
[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)
[ROOT, cat] [sat, on, the, mat] Shift
[ROOT, cat, sat] [on, the, mat] Left-Arc nsubj(cat ← sat)
[ROOT, sat] [on, the, mat] Shift
[ROOT, sat, on] [the, mat] Shift
[ROOT, sat, on, the] [mat] Shift
[ROOT, sat, on, the, mat] []
56
Stack Buffer Action Arc
[ROOT] [The, cat, sat, on, the, mat] Shift
[ROOT, The] [cat, sat, on, the, mat] Shift
[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)
[ROOT, cat] [sat, on, the, mat] Shift
[ROOT, cat, sat] [on, the, mat] Left-Arc nsubj(cat ← sat)
[ROOT, sat] [on, the, mat] Shift
[ROOT, sat, on] [the, mat] Shift
[ROOT, sat, on, the] [mat] Shift
[ROOT, sat, on, the, mat] [] Left-Arc det(the ← mat)
[ROOT, sat, on, mat] []
57
Stack Buffer Action Arc
[ROOT] [The, cat, sat, on, the, mat] Shift
[ROOT, The] [cat, sat, on, the, mat] Shift
[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)
[ROOT, cat] [sat, on, the, mat] Shift
[ROOT, cat, sat] [on, the, mat] Left-Arc nsubj(cat ← sat)
[ROOT, sat] [on, the, mat] Shift
[ROOT, sat, on] [the, mat] Shift
[ROOT, sat, on, the] [mat] Shift
[ROOT, sat, on, the, mat] [] Left-Arc det(the ← mat)
[ROOT, sat, on, mat] [] Left-Arc case(on ← mat)
[ROOT, sat, mat] []
58
Stack Buffer Action Arc
[ROOT] [The, cat, sat, on, the, mat] Shift
[ROOT, The] [cat, sat, on, the, mat] Shift
[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)
[ROOT, cat] [sat, on, the, mat] Shift
[ROOT, cat, sat] [on, the, mat] Left-Arc nsubj(cat ← sat)
[ROOT, sat] [on, the, mat] Shift
[ROOT, sat, on] [the, mat] Shift
[ROOT, sat, on, the] [mat] Shift
[ROOT, sat, on, the, mat] [] Left-Arc det(the ← mat)
[ROOT, sat, on, mat] [] Left-Arc case(on ← mat)
[ROOT, sat, mat] [] Right-Arc nmod(sat → mat)
[ROOT, sat] []59
Stack Buffer Action Arc
[ROOT] [The, cat, sat, on, the, mat] Shift
[ROOT, The] [cat, sat, on, the, mat] Shift
[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)
[ROOT, cat] [sat, on, the, mat] Shift
[ROOT, cat, sat] [on, the, mat] Left-Arc nsubj(cat ← sat)
[ROOT, sat] [on, the, mat] Shift
[ROOT, sat, on] [the, mat] Shift
[ROOT, sat, on, the] [mat] Shift
[ROOT, sat, on, the, mat] [] Left-Arc det(the ← mat)
[ROOT, sat, on, mat] [] Left-Arc case(on ← mat)
[ROOT, sat, mat] [] Right-Arc nmod(sat → mat)
[ROOT, sat] [] Right-Arc root(ROOT, sat)
[ROOT [] 60
Stack Buffer Action Arc
[ROOT] [The, cat, sat, on, the, mat] Shift
[ROOT, The] [cat, sat, on, the, mat] Shift
[ROOT, The, cat] [sat, on, the, mat] Left-Arc det(The ← cat)
[ROOT, cat] [sat, on, the, mat] Shift
[ROOT, cat, sat] [on, the, mat] Left-Arc nsubj(cat ← sat)
[ROOT, sat] [on, the, mat] Shift
[ROOT, sat, on] [the, mat] Shift
[ROOT, sat, on, the] [mat] Shift
[ROOT, sat, on, the, mat] [] Left-Arc det(the ← mat)
[ROOT, sat, on, mat] [] Left-Arc case(on ← mat)
[ROOT, sat, mat] [] Right-Arc nmod(sat → mat)
[ROOT, sat] [] Right-Arc root(ROOT, sat)
[ROOT [] Done 61
Neural dependency parsers
62
Kipperwasser and Goldberg, 2016. Simple and Accurate Parsing Using Bidirectional LSTM Feature Representations
63
Kipperwasser and Goldberg, 2016. Simple and Accurate Parsing Using Bidirectional LSTM Feature Representations
● Features: concatenation of the following LSTM vectors:● three top words from the stack● first word in the buffer
● Prediction:● Score the features with a feed-forward NN (MLP)● Train using hinge loss: maximise the MLP score between the highest scoring
correct action and the highest scoring incorrect action
64
G – the set of correct transitions from the current configurationA – the set of all transitions from the current configuration
Dozat et al., 2017. Stanford’s Graph-Based Neural Dependency Parser at the CoNLL 2017 Shared Task.
65
From each LSTM vector create four vectors for classification
● for head word ● for head relation● for dependent word● for dependent relation
head
head rel dep
dep rel
Dozat et al., 2017. Stanford’s Graph-Based Neural Dependency Parser at the CoNLL 2017 Shared Task.
66
● Use biaffine classifier to score thedep vector of each word with the head vector of every other word
● Then use biaffine classifier to score the relation between the head and dependent using the respective relation vectors
Evaluating dependency parsing
67
Evaluation
Unlabelled attachment score:● The proportion of correct head attachments
Labelled attachment score:● The proportion of correct head attachments labelled with the correct
relation
Label accuracy● The proportion of correct incoming relation labels ignoring the head
68
Evaluation
69
UAS = LAS = LA =
Evaluation
70
UAS = 5/6 LAS = 4/6LA = 4/6