incremental structured prediction using a global learning
TRANSCRIPT
![Page 1: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/1.jpg)
Incremental Structured Prediction Using a Global Learning and Beam-Search Framework
Yue Zhang1, Meishan Zhang2, Ting Liu2
Singapore University of Technology and Design1
Harbin Institute of Technology, China2
{mszhang, tliu}@ir.hit.edu.cn
![Page 2: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/2.jpg)
Outline
Introduction Applications
Analysis ZPar
![Page 3: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/3.jpg)
Outline
Introduction Applications
Analysis ZPar
![Page 4: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/4.jpg)
Introduction
Structured prediction problems
An overview of the transition system
Algorithms in details
Beam-search decoding
Online learning using averaged perceptron
![Page 5: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/5.jpg)
Introduction
Structured prediction problems
An overview of the transition system
Algorithms in details
Beam-search decoding
Online learning using averaged perceptron
![Page 6: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/6.jpg)
Structured prediction problems
Two important tasks in NLP
Classification
Output is a single label
Examples
Document classification
Sentiment analysis
Spam filtering
Structured prediction
Output is a set of inter-related labels or a structure
![Page 7: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/7.jpg)
Structured prediction problems
POS Tagging
![Page 8: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/8.jpg)
Structured prediction problems
Dependency parsing
![Page 9: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/9.jpg)
Structured prediction problems
Constituent parsing
![Page 10: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/10.jpg)
Structured prediction problems
Machine Translation
![Page 11: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/11.jpg)
Structured prediction problems
Traditional solution
Score each candidate, select the highest-scored output
Search-space typically exponential
Over 100 possible trees for this seven-word sentence. Over one million trees for a 20-word sentence.
![Page 12: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/12.jpg)
Structured prediction problems
One solution: dynamic programing methods
Independence assumption on features
Local features with global optimization
Solve the exponential problems in polynomial time
![Page 13: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/13.jpg)
Structured prediction problems
One solution: dynamic programing methods
Independence assumption on features
Local features with global optimization
Solve the exponential problems in polynomial time
Examples
POS tagging: Markov assumption, p(ti|ti-1…t1) = p(ti|ti-1)
Viterbi decoding
Dependency parsing: arc-factorization
1st-order MST decoding
![Page 14: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/14.jpg)
Structured prediction problems
The learning problem
How to score candidate items such that a higher reflects a more correct candidate.
Examples
POS-tagging: HMM, CRF
Dependency parsing: MIRA
![Page 15: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/15.jpg)
Structured prediction problems
Transition-based methods with beam search decoding
A framework for structured prediction
![Page 16: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/16.jpg)
Structured prediction problems
Transition-based methods with beam search decoding
A framework for structured prediction
Incremental state transitions
Use transition actions to build the output
Typically left to right
Typically linear time
![Page 17: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/17.jpg)
Structured prediction problems
Transition-based methods with beam search decoding
A framework for structured prediction
Incremental state transitions
The search problem
To find a highest-score action sequence out of an exponential number of sequences, rather than scoring structures directly
Beam-search (non-exhaustive decoding)
![Page 18: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/18.jpg)
Structured prediction problems
Transition-based methods with beam search decoding
A framework for structured prediction
Incremental state transitions
The search problem
Non-local features
Arbitrary features enabled by beam-search
![Page 19: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/19.jpg)
Structured prediction problems
Transition-based methods with beam search decoding
A framework for structured prediction
Incremental state transitions
The search problem
Non-local features
The learning problem
To score candidates such that a higher-scored action sequence leads to a more correct action sequence
Global discriminative learning
![Page 20: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/20.jpg)
Structured prediction problems
Transition-based methods with beam search decoding
A framework for structured prediction
Incremental state transitions
The search problem
Non-local features
The learning problem
The framework of this tutorial
(Zhang and Clark, CL 2011)
![Page 21: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/21.jpg)
Structured prediction problems
Transition-based methods with beam search decoding
The framework of this tutorial
Very high accuracies and efficiencies using this framework
Word segmentation (Zhang and Clark, ACL 2007)
POS-tagging
Dependency parsing (Zhang and Clark, EMNLP 2008; Huang and Sagae ACL 2010, Zhang and Nirve, ACL
2011, Zhang and Nirve, COLING 2012; Goldberg et al., ACL 2013 )
Constituent parsing (Collins and Roark, ACL 2004; Zhang and Clark, IWPT 2009; Zhu et al. ACL 2013)
CCG parsing (Zhang and Clark, ACL 2011)
Machine translation (Liu, ACL 2013)
Joint word segmentation and POS-tagging (Zhang and Clark, ACL 2008; Zhang and Clark, EMNLP 2010)
Joint POS-tagging and dependency parsing (Hatori et al. IJCNLP 2011; Bohnet and Nirve, EMNLP 2012)
Joint word segmentation, POS-tagging and parsing (Hatori et al. ACL 2012; Zhang et al. ACL2013; Zhang et
al. ACL2014)
Joint morphological analysis and syntactic parsing (Bohnet et al., TACL 2013)
![Page 22: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/22.jpg)
Structured prediction problems
Transition-based methods with beam search decoding
The framework of this tutorial
Very high accuracies and efficiencies using this
framework
General
Can apply to any structured predication tasks, which can be transformed into an incremental process
![Page 23: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/23.jpg)
Introduction
Structured prediction problems
An overview of the transition system
Algorithms in details
Beam-search decoding
Online learning using averaged perceptron
![Page 24: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/24.jpg)
A transition system
Automata
State
Start state —— an empty structure
End state —— the output structure
Intermediate states —— partially constructed structures
Actions
Change one state to another
![Page 25: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/25.jpg)
Automata
A transition system
start
![Page 26: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/26.jpg)
Automata
A transition system
start
a0
S1
![Page 27: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/27.jpg)
Automata
A transition system
start …
a0
S1
a1
![Page 28: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/28.jpg)
Automata
A transition system
start …
a0
S1 Si
a1 ai-1
![Page 29: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/29.jpg)
Automata
A transition system
start …
a0
S1 Si …
a1 ai-1 ai
![Page 30: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/30.jpg)
Automata
A transition system
start …
a0
S1 Si … Sn
a1 ai-1 ai an-1
![Page 31: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/31.jpg)
Automata
A transition system
start …
a0
S1 Si … Sn end
a1 ai-1 ai an-1 an
![Page 32: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/32.jpg)
State
Corresponds to partial results during decoding
start state, end state, Si
Actions
The operations that can be applied for state transition
Construct output incrementally
ai
A transition system
start …
a0
S1 Si … Sn end
a1 ai-1 ai an-1 an
![Page 33: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/33.jpg)
A transition-based POS-tagging example
POS tagging
I like reading books → I/PRON like/VERB reading/VERB books/NOUN
Transition system
State
Partially labeled word-POS pairs
Unprocessed words
Actions
TAG(t) 𝑤1/𝑡1 ⋯ 𝑤𝑖/𝑡𝑖 → 𝑤1/𝑡1 ⋯ 𝑤𝑖/𝑡𝑖 𝑤𝑖+1/𝑡
![Page 34: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/34.jpg)
A transition-based POS-tagging example
Start State
I like reading books
![Page 35: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/35.jpg)
A transition-based POS-tagging example
TAG(PRON)
I/PRON like reading books
![Page 36: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/36.jpg)
A transition-based POS-tagging example
TAG(VERB)
I/PRON like/VERB reading books
![Page 37: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/37.jpg)
A transition-based POS-tagging example
TAG(VERB)
books I/PRON like/VERB reading/VERB
![Page 38: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/38.jpg)
A transition-based POS-tagging example
TAG (NOUN)
I/PRON like/VERB reading/VERB books/NOUN
![Page 39: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/39.jpg)
A transition-based POS-tagging example
End State
I/PRON like/VERB reading/VERB books/NOUN
![Page 40: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/40.jpg)
Introduction
Structured prediction problems
An overview of the transition system
Algorithms in details
Beam-search decoding
Online learning using averaged perceptron
![Page 41: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/41.jpg)
Introduction
Structured prediction problems
An overview of the transition system
Algorithms in details
Beam-search decoding
Online learning using averaged perceptron
![Page 42: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/42.jpg)
Find the best sequence of actions
Search
S0
S1
S’1
S’’1
a’0
S2
S’2
S’’2
•••
•••
⁞ ⁞
S’n
S’’n
•••
•••
Sn
•••
⁞⁞⁞
⁞⁞⁞
⁞⁞⁞
⁞
![Page 43: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/43.jpg)
Dynamic programming
Optimum sub-problems are recorded according to dynamic programming signature
Infeasible if features are non-local (which are typically useful)
One solution
Greedy classification
Input: Si
Output:𝑎𝑖 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑎′
𝑤 ∙ 𝑓(Si, 𝑎′)
For better accuracies: beam-search decoding
Search
![Page 44: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/44.jpg)
Beam-search decoding
start
Zhang and Clark, CL 2011
![Page 45: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/45.jpg)
Beam-search decoding
start
a00
a01
a0k
a0(k-1)
S11
S12
… S1(k-1)
S1k
…
Zhang and Clark, CL 2011
![Page 46: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/46.jpg)
Beam-search decoding
start
S11
S12
… S1(k-1)
S1k
…
…
… …
…
a10
a11
a1k
a1(k-1)
…
a00
a01
a0k
a0(k-1)
…
Zhang and Clark, CL 2011
![Page 47: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/47.jpg)
Beam-search decoding
start
S11
S12
… S1(k-1)
S1k
…
…
… …
…
Si1
Si2
… Si(k-1)
Sik
a00
a01
a0k
a0(k-1)
…
a10
a11
a1k
a1(k-1)
…
a(i-1)1
a(i-1)(k-1)
…
a(i-1)0
a(i-1)k
Zhang and Clark, CL 2011
![Page 48: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/48.jpg)
Beam-search decoding
start
S11
S12
… S1(k-1)
S1k
…
…
… …
…
Si1
Si2
… Si(k-1)
Sik
ai1
ai(k-1)
…
ai0
aik
…
…
… …
…
a00
a01
a0k
a0(k-1)
…
a10
a11
a1k
a1(k-1)
…
a(i-1)1
a(i-1)(k-1)
…
a(i-1)0
a(i-1)k
Zhang and Clark, CL 2011
![Page 49: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/49.jpg)
Beam-search decoding
start
S11
S12
… S1(k-1)
S1k
…
…
… …
…
Si1
Si2
… Si(k-1)
Sik
a(i-1)1
a(i-1)(k-1)
…
a(i-1)0
a(i-1)k
…
…
… …
…
End1
End2
…
Endk-1
Endk
an1
an(k-1)
…
an0
ank
a00
a01
a0k
a0(k-1)
…
a10
a11
a1k
a1(k-1)
…
ai1
ai(k-1)
…
ai0
aik
Zhang and Clark, CL 2011
![Page 50: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/50.jpg)
Beam-search decoding
Zhang and Clark, CL 2011
![Page 51: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/51.jpg)
An example: POS-tagging
I like reading books
Beam-search decoding
start
Zhang and Clark, CL 2011
![Page 52: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/52.jpg)
An example: POS-tagging
I like reading books
Beam-search decoding
I/PRON
I/NOUN I/ADV I/ADP
start
PRON
NOUN
ADV
ADP
Zhang and Clark, CL 2011
I/VERB
I/ADV I/PREP I/NR
![Page 53: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/53.jpg)
An example: POS-tagging
I like reading books
Beam-search decoding
I/PRON
I/NOUN I/ADV I/ADP
start I/PRON like/VERB
I/NOUN like/VERB I/PRON like/CONJ
I/NOUN like/CONJ
VERB
VERB PRON
NOUN
ADV
ADP
Zhang and Clark, CL 2011
I/VERB
I/ADV I/PREP I/NR
…...
…… …... ..….
![Page 54: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/54.jpg)
An example: POS-tagging
I like reading books
Beam-search decoding
I/PRON
I/NOUN I/ADV I/ADP
start I/PRON like/VERB
I/NOUN like/VERB I/PRON like/CONJ
I/NOUN like/CONJ
……. PRON
NOUN
ADV
ADP
VERB
VERB
…
…
…
…
Zhang and Clark, CL 2011
I/VERB
I/ADV I/PREP I/NR
…...
…… …... ..….
…...
…… …... ..….
![Page 55: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/55.jpg)
An example: POS-tagging
I like reading books
Beam-search decoding
I/PRON
I/NOUN I/ADV I/ADP
start I/PRON like/VERB
I/NOUN like/VERB I/PRON like/CONJ
I/NOUN like/CONJ
I/PRON like/VERB reading/VERB books/NOUN I/PRON like/VERB reading/ADJ books/NOUN I/PRON like/CONJ reading/ADJ books/NOUN
I/PRON like/VERB reading/NOUN books/NOUN
NOUN
NOUN
NOUN
NOUN
…
…
…
…
PRON
NOUN
ADV
ADP
VERB
VERB
…….
Zhang and Clark, CL 2011
I/VERB
I/ADV I/PREP I/NR
…...
…… …... ..….
…...
…… …... ..….
…...
…… …... ..….
![Page 56: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/56.jpg)
Introduction
Structured prediction problems
An Overview of the transition system
Algorithms in details
Beam-search decoding
Online learning using averaged perceptron
![Page 57: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/57.jpg)
Online learning
start
Zhang and Clark, CL 2011
![Page 58: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/58.jpg)
Online learning
start
S11
S12
… S1g
… S1(k-1)
S1k
a00
a01
a0k
a0(k-1)
…
Zhang and Clark, CL 2011
![Page 59: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/59.jpg)
Online learning
start
S11
S12
… S1g
… S1(k-1)
S1k
a00
a01
a0k
a0(k-1)
…
a10
a11
a1k
a1(k-1)
…
Zhang and Clark, CL 2011
…
…
… …
…
![Page 60: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/60.jpg)
Online learning
start
S11
S12
… S1g
… S1(k-1)
S1k
…
…
… …
…
Si1
Si2
… Sig
… Si(k-1)
Sik
a00
a01
a0k
a0(k-1)
…
a10
a11
a1k
a1(k-1)
…
a(i-1)1
a(i-1)(k-1)
…
a(i-1)0
a(i-1)k
Zhang and Clark, CL 2011
![Page 61: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/61.jpg)
Online learning
S(i+1)g
perceptron update here!
start
S11
S12
… S1g
… S1(k-1)
S1k
…
…
… …
…
Si1
Si2
… Sig
… Si(k-1)
Sik
a00
a01
a0k
a0(k-1)
…
a10
a11
a1k
a1(k-1)
…
a(i-1)1
a(i-1)(k-1)
…
a(i-1)0
a(i-1)k
S(i+1)1
S(i+1)2
… S(i+1) (k-1)
S(i+1)k
a(i-1)1
a(i-1)(k-1)
…
a(i-1)0
a(i-1)k
Zhang and Clark, CL 2011
![Page 62: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/62.jpg)
Online learning
Zhang and Clark, CL 2011
![Page 63: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/63.jpg)
Outline
Introduction Applications
Analysis ZPar
![Page 64: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/64.jpg)
Applications
Word segmentation
Dependency parsing
Context free grammar parsing
Combinatory categorial grammar parsing
Joint segmentation and POS-tagging
Joint POS-tagging and dependency parsing
Joint segmentation, POS-tagging and constituent parsing
Joint segmentation, POS-tagging and dependency parsing
![Page 65: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/65.jpg)
Applications
Word segmentation
Dependency parsing
Context free grammar parsing
Combinatory categorial grammar parsing
Joint segmentation and POS-tagging
Joint POS-tagging and dependency parsing
Joint segmentation, POS-tagging and constituent parsing
Joint segmentation, POS-tagging and dependency parsing
![Page 66: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/66.jpg)
Introduction
Chinese word segmentation 我喜欢读书 Ilikereadingbooks 我 喜欢 读 书 I like reading books
Ambiguity
Out-of-vocabulary words (OOV words) 进步 (make progress; OOV) 进(advance; known) 步(step; known)
Known words 这里面: 这里(here) 面(flour) 很(very) 贵(expensive) 这(here) 里面(inside) 很 (very) 冷 (cold)
洽谈会很成功:
洽谈会(discussion meeting) 很 (very) 成功(successful) 洽谈(discussion) 会(will) 很(very) 成功(succeed)
![Page 67: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/67.jpg)
Introduction
No fixed standard
only about 75% agreement among native speakers
task dependency 北京银行: 北京银行(Bank of Beijing) 北京(Beijing)银行(bank)
Therefore, supervised learning with specific training corpora seems more appropriate.
the dominant approach
![Page 68: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/68.jpg)
Introduction
The character-tagging approach
Map word segmentation into character tagging 我 喜欢 读 书 我/S喜/B欢/E读/S书/S
Context information: neighboring five character window
Traditionally CRF is used
This method can be implemented using our framework also!
(cf. the sequence labeling example in the intro)
![Page 69: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/69.jpg)
Introduction
Limitation of the character tagging method 中国外企业 其中(among which) 国外(foreign) 企业(companies) 中国(in China) 外企(foreign companies) 业务(business)
Motivation of a word-based method
Compare candidates by word information directly
Potential for more linguistically motivated features
Zhang and Clark, ACL 2007
![Page 70: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/70.jpg)
The transition system
State
Partially segmented results
Unprocessed characters
Two candidate actions
Separate ## ## → ## ## #
Append ## ## → ## ## #
Zhang and Clark, ACL 2007
![Page 71: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/71.jpg)
The transition system
Initial State
我喜欢读书
I like reading books
Zhang and Clark, ACL 2007
![Page 72: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/72.jpg)
The transition system
Separate
喜欢读书 我
Zhang and Clark, ACL 2007
![Page 73: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/73.jpg)
The transition system
Separate
欢读书 我 喜
Zhang and Clark, ACL 2007
![Page 74: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/74.jpg)
The transition system
Append
读书 我 喜欢
Zhang and Clark, ACL 2007
![Page 75: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/75.jpg)
The transition system
Separate
书 我 喜欢 读
Zhang and Clark, ACL 2007
![Page 76: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/76.jpg)
The transition system
Separate
我 喜欢 读 书
Zhang and Clark, ACL 2007
![Page 77: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/77.jpg)
The transition system
End State
我 喜欢 读 书
Zhang and Clark, ACL 2007
![Page 78: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/78.jpg)
Beam search
ABCDE
“”
Candidates Agenda
Zhang and Clark, ACL 2007
![Page 79: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/79.jpg)
Beam search
BCDE
A
Zhang and Clark, ACL 2007
Candidates Agenda
![Page 80: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/80.jpg)
Beam search
BCDE
A
Zhang and Clark, ACL 2007
Candidates Agenda
![Page 81: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/81.jpg)
Beam search
CDE
A AB A B
Zhang and Clark, ACL 2007
Candidates Agenda
![Page 82: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/82.jpg)
Beam search
CDE
AB A B
Zhang and Clark, ACL 2007
Candidates Agenda
![Page 83: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/83.jpg)
Beam search
CDE
AB A B
Zhang and Clark, ACL 2007
Candidates Agenda
![Page 84: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/84.jpg)
Beam search
DE
AB A B
ABC AB C A BC A B C
Zhang and Clark, ACL 2007
Candidates Agenda
![Page 85: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/85.jpg)
The beam search decoder
For a given sentence with length=l, there are 2l-1 possible segmentations.
The agenda size is limited, keeping only the B best candidates
Zhang and Clark, ACL 2007
![Page 86: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/86.jpg)
Feature templates
1
2
3
4
5
6
7
8
9
10
11
12
13
14
word w
word bigram w1w2
single character word w
a word starting with character c and having length l
a word ending with character c and having length l
space separated characters c1 and c2
character bigram c1c2 in any word
the first and last characters c1 and c2 of any word
word w immediately before character c
character c immediately before word w
the starting characters c1 and c2 of two consecutive words
the ending characters c1 and c2 of two consecutive words
a word with length l and the previous word w
a word with length l and the next word w
Zhang and Clark, ACL 2007
![Page 87: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/87.jpg)
Experimental results
beam = 1
beam = 2
beam = 4
beam = 8 beam = 16
beam = 32 beam = 64
Tradeoff between speed and accuracies (CTB5).
Zhang and Clark, ACL 2007
![Page 88: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/88.jpg)
Experimental results
Compare with other systems (SIGHAN 2005).
AS CU PU SAV OAV
S01 93.8 90.1 95.1 93.0 95.5
S04 93.9 93.9 94.8
S05 94.2 89.4 91.8 95.9
S06 94.5 92.4 92.4 93.1 95.5
S08 90.4 93.6 92.9 94.8
S09 96.1 94.6 95.4 95.9
S10 94.7 94.7 94/8
S12 95.9 91.6 93.8 95.9
Peng 95.6 92.8 94.1 94.2 95.5
Z&C 07 97.0 94.6 94.6 95.4 95.5
Zhang and Clark, ACL 2007
![Page 89: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/89.jpg)
Applications
Word segmentation
Dependency parsing
Context free grammar parsing
Combinatory categorial grammar parsing
Joint segmentation and POS-tagging
Joint POS-tagging and dependency parsing
Joint segmentation, POS-tagging and constituent parsing
Joint segmentation, POS-tagging and dependency parsing
![Page 90: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/90.jpg)
Dependency syntax
Dependency structures represent syntactic relations (dependencies) by drawing links between word pairs in a sentence.
For the link: a telescope
90
• Modifier • Dependent • Child
• Modifier • Dependent • Child
• Head • Governor • Parent
• Head • Governor • Parent
![Page 91: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/91.jpg)
Dependency graphs
A dependency structure is a directed graph G with the following constraints:
Connected
Acyclic
Single-head
91
tree
![Page 92: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/92.jpg)
A dependency tree structure represents syntactic relations between word pairs in a sentence
I saw her duck with a telescope
gen
obj
mod
I saw her duck with a telescope
mod
obj
Dependency trees
92
subj
obj det
gen
subj
obj det
![Page 93: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/93.jpg)
Categorization (Kübler et al. 2009)
Projective
Non-projective
Dependency trees
93
![Page 94: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/94.jpg)
Score each possible output
Often use dynamic programming to explore search space
The graph-based solution
94 McDonald et al., ACL 2005 Carreras, EMNLP-CONLL 2007; Koo and Collins, ACL 2010
![Page 95: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/95.jpg)
Projective
Arc-eager
Arc-standard (Nirve, CL 2008)
Non-projective
Arc standard + swap (Nirve, ACL 2009)
Transition systems
95
![Page 96: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/96.jpg)
The arc-eager transition system
State
A stack to hold partial structures
A queue of next incoming words
Actions
SHIFT, REDUCE, ARC-LEFT, ARC-RIGHT
![Page 97: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/97.jpg)
State
97
ST STP ...
STLC STRC
The stack
The input
N0 N1 N2 N3 ...
N0LC
The arc-eager transition system
![Page 98: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/98.jpg)
Actions
Shift
98
The arc-eager transition system
ST STP ...
STLC STRC
The stack
The input
N0 N1 N2 N3 ...
N0LC
![Page 99: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/99.jpg)
Actions
Shift
Pushes stack
99
N0LC
The arc-eager transition system
ST STP ...
STLC STRC
The stack
The input
N1 N2 N3 ... N0
![Page 100: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/100.jpg)
Actions
Reduce
100
ST STP ...
STLC STRC
The stack
The input
N0 N1 N2 N3 ...
N0LC
The arc-eager transition system
![Page 101: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/101.jpg)
Actions
Reduce
Pops stack
ST
STP ...
STLC STRC
The stack
The input
N0 N1 N2 N3 ...
N0LC
101
The arc-eager transition system
![Page 102: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/102.jpg)
Actions
Arc-Left
102
ST STP ...
STLC STRC
The stack
The input
N0 N1 N2 N3 ...
N0LC
The arc-eager transition system
![Page 103: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/103.jpg)
Actions
Arc-Left
Pops stack
Adds link
103
STP ...
The stack
The input
N0 N1 N2 N3 ...
N0LC ST
STLC STRC
The arc-eager transition system
![Page 104: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/104.jpg)
Actions
Arc-right
104
ST STP ...
STLC STRC
The stack
The input
N0 N1 N2 N3 ...
N0LC
The arc-eager transition system
![Page 105: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/105.jpg)
Actions
Arc-right
Pushes stack
Adds link
105
The arc-eager transition system
ST STP ...
STLC STRC
The stack
The input
N1 N2 N3 ...
N0LC
N0
![Page 106: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/106.jpg)
106
An example S – Shift R – Reduce AL – ArcLeft AR – ArcRight
He does it here
The arc-eager transition system
![Page 107: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/107.jpg)
107
An example S – Shift R – Reduce AL – ArcLeft AR – ArcRight
He does it here does it here He S
The arc-eager transition system
![Page 108: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/108.jpg)
108
An example S – Shift R – Reduce AL – ArcLeft AR – ArcRight
He does it here does it here He S does it here AL
He
108
The arc-eager transition system
![Page 109: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/109.jpg)
109
An example S – Shift R – Reduce AL – ArcLeft AR – ArcRight
He does it here does it here He S does it here AL
He
it here S
He
does
109
The arc-eager transition system
![Page 110: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/110.jpg)
110
An example S – Shift R – Reduce AL – ArcLeft AR – ArcRight
He does it here does it here He S does it here AL
He
it here S
He
does
He
does it
AR
here
110
The arc-eager transition system
![Page 111: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/111.jpg)
111
An example S – Shift R – Reduce AL – ArcLeft AR – ArcRight
He does it here does it here He S does it here AL
He
it here S
He
does
He
does it
AR
here R
He
does here
it
111
The arc-eager transition system
![Page 112: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/112.jpg)
112
An example S – Shift R – Reduce AL – ArcLeft AR – ArcRight
He does it here does it here He S does it here AL
He
it here S
He
does
He
does it
AR
here R
He
does here
it
AR
He
does here
it
112
The arc-eager transition system
![Page 113: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/113.jpg)
113
An example S – Shift R – Reduce AL – ArcLeft AR – ArcRight
He does it here does it here He S does it here AL
He
it here S
He
does
He
does it
AR
here R
He
does here
it
AR
He
does here
it
R
He
does
it here
The arc-eager transition system
![Page 114: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/114.jpg)
Arc-eager
Time complexity: linear
Every word is pushed once onto the stack
Every word except the root is popped once
Links are added between ST and N0
As soon as they are in place
'eager'
114
The arc-eager transition system
![Page 115: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/115.jpg)
Arc-eager
Labeled parsing? – expand the link-adding actions ArcLeft subject ArcLeft ArcLeft noun modifier ... ArcRight object ArcRight ArcRight prep modifier
...
115
The arc-eager transition system
![Page 116: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/116.jpg)
State
A stack to hold partial candidates
A queue of next incoming words
Actions
SHIFT LEFT-REDUCE RIGHT-REDUCE
Builds arcs between ST0 and ST1
Associated with shift-reduce CFG parsing process
116
The arc-standard transition system
![Page 117: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/117.jpg)
Actions
Shift
117
ST ST1 ...
STLC STRC
The stack
The input
N0 N1 N2 N3 ...
The arc-standard transition system
![Page 118: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/118.jpg)
Actions
Shift
Pushes stack
118
ST ST1 ...
STLC STRC
The stack
The input
N1 N2 N3 ... N0
The arc-standard transition system
![Page 119: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/119.jpg)
Actions
Left-reduce
119
ST ST1 ...
STLC STRC
The stack
The input
N0 N1 N2 N3 ...
The arc-standard transition system
![Page 120: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/120.jpg)
Actions
Left-reduce
Pops stack
Adds link
120
ST
ST1
...
STLC STRC
The stack
The input
N0 N1 N2 N3 ...
The arc-standard transition system
![Page 121: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/121.jpg)
Actions
Right-reduce
121
ST ST1 ...
STLC STRC
The stack
The input
N0 N1 N2 N3 ...
The arc-standard transition system
![Page 122: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/122.jpg)
Actions
Right-reduce
Pops stack
Adds link
122
ST
ST1 ...
STLC STRC
The stack
The input
N0 N1 N2 N3 ...
The arc-standard transition system
![Page 123: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/123.jpg)
Characteristic
Time complexity: linear
Empirically comparable with arc-eager, but accuracies for different languages are different
123
The arc-standard transition system
![Page 124: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/124.jpg)
Non-projectivity
Online reordering (Nivre 2009)
Based on an extra action to the parser: swap
Not linear any more
Can be quadratic due to swap
Expected linear time
ST ST1 ...
STLC STRC
The stack
The input
N0 N1 N2 N3 ... ST ...
STLC STRC
The stack
The input
ST1 N0 N1 N2 N3 ...
![Page 125: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/125.jpg)
Non-projectivity
Initial
125
A meeting was scheduled for this today
![Page 126: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/126.jpg)
Non-projectivity
SHIFT
126
A meeting was scheduled for this today
![Page 127: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/127.jpg)
Non-projectivity
SHIFT
127
A meeting was scheduled for this today
![Page 128: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/128.jpg)
A transition-based parsing process
ARC-LEFT
128
meeting was scheduled for this today
A
![Page 129: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/129.jpg)
A transition-based parsing process
SHIFT
129
meeting was scheduled for this today
A
![Page 130: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/130.jpg)
A transition-based parsing process
SHIFT
130
meeting was scheduled for this today
A
![Page 131: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/131.jpg)
A transition-based parsing process
SHIFT
131
meeting was scheduled for this today
A
![Page 132: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/132.jpg)
A transition-based parsing process
SWAP
meeting was for scheduled this today
A
![Page 133: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/133.jpg)
A transition-based parsing process
SWAP
meeting for was scheduled this today
A
![Page 134: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/134.jpg)
A transition-based parsing process
SHIFT
meeting for was scheduled this today
A
![Page 135: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/135.jpg)
A transition-based parsing process
SHIFT
meeting for was scheduled this today
A
![Page 136: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/136.jpg)
A transition-based parsing process
SHIFT
meeting for was scheduled this today
A
![Page 137: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/137.jpg)
A transition-based parsing process
SWAP
meeting for was this scheduled today
A
![Page 138: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/138.jpg)
A transition-based parsing process
SWAP
meeting for this was scheduled today
A
![Page 139: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/139.jpg)
A transition-based parsing process
ARC-RIGHT
meeting for was scheduled today
A this
![Page 140: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/140.jpg)
A transition-based parsing process
ARC-RIGHT
meeting was scheduled today
A for
this
![Page 141: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/141.jpg)
A transition-based parsing process
SHIFT
meeting was scheduled today
A for
this
![Page 142: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/142.jpg)
A transition-based parsing process
ARC-LEFT
was scheduled today
A for
this
meeting
![Page 143: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/143.jpg)
A transition-based parsing process
SHIFT
was scheduled today
A for
this
meeting
![Page 144: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/144.jpg)
A transition-based parsing process
SHIFT
was scheduled today
A for
this
meeting
![Page 145: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/145.jpg)
A transition-based parsing process
ARC-RIGHT
was scheduled
A for
this
meeting today
![Page 146: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/146.jpg)
A transition-based parsing process
ARC-RIGHT
was
A for
this
meeting scheduled
today
![Page 147: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/147.jpg)
The arc-eager parser using our framework
The arc-eager transition process
Beam-search decoding
Keeps N different partial state items in agenda.
Use the total score of all actions to rank state items
Avoid error propagations from early decisions
Global discriminative training
Zhang and Clark, EMNLP 2008
![Page 148: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/148.jpg)
A tale of two parsers
Zhang and Clark, EMNLP 2008
Higher order, more features
Graph-based
MST parser
Carreras, 2007
Koo and Collins, 2010
Higher order, more features
This tutorial framework
Transition-based
Malt parser
Zhang and Clark, 2008
Zhang and Clark, 2011
More features
comparable
comparable
![Page 149: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/149.jpg)
Beam-search decoding
Our parser
Decoding
He does it here
Zhang and Clark, EMNLP 2008
![Page 150: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/150.jpg)
Beam-search decoding
Our parser
Decoding
He does it here does it here He S
Zhang and Clark, EMNLP 2008
![Page 151: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/151.jpg)
Beam-search decoding
Our parser
Decoding
He does it here does it here He S does it here AL
He
it here He does
it here He does
Zhang and Clark, EMNLP 2008
![Page 152: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/152.jpg)
Beam-search decoding
Our parser
Decoding
He does it here does it here He S does it here AL
He
it here He does
it here He does
it here S
He
does
S here He does it
here He does it
Zhang and Clark, EMNLP 2008
![Page 153: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/153.jpg)
Beam-search decoding
Our parser
Decoding
He does it here does it here He S does it here AL
He
it here He does
it here He does
it here S
He
does
S here He does it
here He does it
He
does it here
He does it here
here He does
it
Zhang and Clark, EMNLP 2008
![Page 154: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/154.jpg)
Beam-search decoding
Our parser
Decoding
He does it here does it here He S does it here AL
He
it here He does
it here He does
it here S
He
does
S here He does it
here He does it
He
does it here
He does it here
here He does
it
here He does
He
He
does it here
He
does here
it
He
does it here
He
Zhang and Clark, EMNLP 2008
![Page 155: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/155.jpg)
Beam-search decoding
Our parser
Decoding
He does it here does it here He S does it here AL
He
it here He does
it here He does
it here S
He
does
S here He does it
here He does it
He
does it here
He does it here
here He does
it
here He does
He
does here
it
He
does it here
He
does here
it He
He
does it here
He
does here
it
He
does it here
He
Zhang and Clark, EMNLP 2008
![Page 156: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/156.jpg)
Beam-search decoding
Our parser
Decoding
He does it here does it here He S does it here AL
He
it here He does
it here He does
it here S
He
does
S here He does it
here He does it
He
does it here
He does it here
here He does
it
here He does
He
does here
it
He
does it here
He
does here
it He
He
does it here
He
does here
it
He
does it here
He He
does
it
He here
does it
here
Zhang and Clark, EMNLP 2008
![Page 157: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/157.jpg)
The feature templates
The context
S0 – top of stack
S0h – head of S0
S0l – left modifier of S0
S0r – right modifier of S0
S0 S0h ...
S0l S0r
The stack The input
N0 N1 N2 N3 ...
N0l
N0 – head of queue
N0l – left modifier of N0
N1 – next in queue
N2 – next of N1
Zhang and Clark, EMNLP 2008
![Page 158: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/158.jpg)
The feature templates
The base features
from single words
S0wp; S0w; S0p; N0wp; N0w; N0p;
N1wp; N1w; N1p; N2wp; N2w; N2p;
from word pairs
S0wpN0wp; S0wpN0w; S0wN0wp; S0wpN0p;
S0pN0wp; S0wN0w; S0pN0p
N0pN1p
from three words
N0pN1pN2p; S0pN0pN1p; S0hpS0pN0p;
S0pS0lpN0p; S0pS0rpN0p; S0pN0pN0lp
Zhang and Clark, EMNLP 2008
![Page 159: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/159.jpg)
The feature templates
The extended features
Distance
Standard in MSTParser (McDonald et al., 2005)
Used in easy-first (Goldberg and Elhadad, 2010)
When used in transition-based parsing, combined with action (this paper)
distance
S0wd; S0pd; N0wd; N0pd;
S0wN0wd; S0pN0pd;
Zhang and Clark, ACL 2011
![Page 160: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/160.jpg)
The feature templates
The extended features
Valency
Number of modifiers
Graph-based submodel of Zhang and Clark (2008)
The models of Martins et al. (2009)
The models of Sagae and Tsujii (2007)
valency
S0wvr; S0pvr; S0wvl; S0pvl; N0wvl; N0pvl;
Zhang and Clark, ACL 2011
![Page 161: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/161.jpg)
The feature templates
The extended features
Extended unigrams
S0h, S0l, S0r and N0l has been applied to transition-based parsers via POS-combination
We add their unigram word, POS and label information (this paper)
unigrams
S0hw; S0hp; S0l; S0lw; S0lp; S0ll;
S0rw; S0rp; S0rl;N0lw; N0lp; N0ll;
Zhang and Clark, ACL 2011
![Page 162: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/162.jpg)
The feature templates
The extended features
Third order
Graph-based dependency parsers (Carreras, 2007; Koo and Collins, 2010)
third-order
S0h2w; S0h2p; S0hl; S0l2w; S0l2p; S0l2l;
S0r2w; S0r2p; S0r2l; N0l2w; N0l2p; N0l2l;
S0pS0lpS0l2p; S0pS0rpS0r2p;
S0pS0hpS0h2p; N0pN0lpN0l2p;
Zhang and Clark, ACL 2011
![Page 163: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/163.jpg)
The feature templates
The extended features
Set of labels
More global feature
Has not been applied to transition-based parsing
label set 1
S0wsr; S0psr; S0wsl; S0psl; N0wsl; N0psl;
Zhang and Clark, ACL 2011
![Page 164: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/164.jpg)
Experiments
Chinese Data (CTB5)
English Data (Penn Treebank)
Zhang and Clark, ACL 2011
![Page 165: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/165.jpg)
Results
Model
Li et al. (2012)
Jun et al. (2011)
H&S10
This Method
Chinese
English Model
Li et al. (2012)
MSTParser
K08 standard
K&C10 model
H&S10
This Method
Zhang and Clark, ACL 2011
![Page 166: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/166.jpg)
Applications
Word segmentation
Dependency parsing
Context free grammar parsing
Combinatory categorial grammar parsing
Joint segmentation and POS-tagging
Joint POS-tagging and dependency parsing
Joint segmentation, POS-tagging and constituent parsing
Joint segmentation, POS-tagging and dependency parsing
![Page 167: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/167.jpg)
We use Wang et al. (2006)'s shift-reduce transition-based process
A state item = a pair <stack, queue>
Stack: holds the partial parse trees already built
Queue: holds the incoming words with POS
Actions
SHIFT, REDUCE-BINARY-L/R, REDUCE-UNARY
Corresponds to arc-standard
The shift-reduce parsing process
Wang et al., ACL 2011
![Page 168: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/168.jpg)
Actions
SHIFT
The shift-reduce parsing process
stack queue
NR布朗 VV访问 NR上海
Zhang and Clark, IWPT 2009
布朗(Brown) 访问(visits) 上海(Shanghai)
![Page 169: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/169.jpg)
Actions
SHIFT
The shift-reduce parsing process
stack queue
NR布朗 VV访问 NR上海
Zhang and Clark, IWPT 2009
布朗(Brown) 访问(visits) 上海(Shanghai)
![Page 170: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/170.jpg)
Actions
REDUCE-UNARY-X
The shift-reduce parsing process
stack queue
NR布朗 VV访问 NR上海
Zhang and Clark, IWPT 2009
布朗(Brown) 访问(visits) 上海(Shanghai)
![Page 171: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/171.jpg)
Actions
REDUCE-UNARY-X
The shift-reduce parsing process
stack queue
VV访问 NR上海
NR布朗
Zhang and Clark, IWPT 2009
布朗(Brown) 访问(visits) 上海(Shanghai)
![Page 172: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/172.jpg)
Actions
REDUCE-UNARY-X
The shift-reduce parsing process
stack queue
X VV访问 NR上海
NR布朗
Zhang and Clark, IWPT 2009
布朗(Brown) 访问(visits) 上海(Shanghai)
![Page 173: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/173.jpg)
Actions
REDUCE-BINARY-{L/R}-X
The shift-reduce parsing process
stack queue
NP VV访问
NR布朗
NP
NR上海
Zhang and Clark, IWPT 2009
布朗(Brown) 访问(visits) 上海(Shanghai)
![Page 174: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/174.jpg)
Actions
REDUCE-BINARY-{L/R}-X
The shift-reduce parsing process
stack queue
NP
VV访问 NR布朗 NP
NR上海
Zhang and Clark, IWPT 2009
![Page 175: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/175.jpg)
Actions
REDUCE-BINARY-{L/R}-X
The shift-reduce parsing process
stack queue
NP
VV访问 NR布朗 NP
NR上海
VP
Zhang and Clark, IWPT 2009
![Page 176: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/176.jpg)
Actions
TERMINATE
The shift-reduce parsing process
stack queue
S
Zhang and Clark, IWPT 2009
![Page 177: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/177.jpg)
Actions
TERMINATE
The shift-reduce parsing process
stack queue
S
ans
Zhang and Clark, IWPT 2009
![Page 178: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/178.jpg)
Example
SHIFT
The shift-reduce parsing process
stack queue
NR布朗 VV访问 NR上海
Zhang and Clark, IWPT 2009
![Page 179: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/179.jpg)
Example
REDUCE-UNARY-NP
The shift-reduce parsing process
stack queue
NR布朗 VV访问 NR上海
Zhang and Clark, IWPT 2009
![Page 180: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/180.jpg)
Example
SHIFT
The shift-reduce parsing process
NP VV访问 NR上海
NR布朗
stack queue
Zhang and Clark, IWPT 2009
![Page 181: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/181.jpg)
Example
SHIFT
The shift-reduce parsing process
NP VV访问 NR上海
NR布朗
stack queue
Zhang and Clark, IWPT 2009
![Page 182: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/182.jpg)
Example
REDUCE-UNARY-NP
The shift-reduce parsing process
NP VV访问 NR上海
NR布朗
stack queue
Zhang and Clark, IWPT 2009
![Page 183: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/183.jpg)
Example
REDUCE-BINARY-L-VP
The shift-reduce parsing process
stack queue
NP VV访问
NR布朗
NP
NR上海
Zhang and Clark, IWPT 2009
![Page 184: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/184.jpg)
Example
REDUCE-BINARY-R-IP
The shift-reduce parsing process
stack queue
NP
VV访问 NR布朗 NP
NR上海
VP
Zhang and Clark, IWPT 2009
![Page 185: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/185.jpg)
Example
TERMINATE
The shift-reduce parsing process
stack queue
NP
VV访问 NR布朗 NP
NR上海
VP
IP
Zhang and Clark, IWPT 2009
![Page 186: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/186.jpg)
Example
The shift-reduce parsing process
stack queue
NP
VV访问 NR布朗 NP
NR上海
VP
IP
Zhang and Clark, IWPT 2009
![Page 187: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/187.jpg)
Grammar binarization
The shift-reduce parser require binarized trees
Treebank trees are not binarized
Penn Treebank/CTB ↔ Parser
Binarize CTB data to make training data
Unbinarize parser output back to Treebank format
Reversible
![Page 188: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/188.jpg)
Grammar binarization
The binarization process
Find head
Binarize left nodes
Binarize right nodes
Y
A B C D E F
![Page 189: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/189.jpg)
Grammar binarization
The binarization process
Find head
Binarize left nodes
Binarize right nodes
Y
A B C (D) E F
![Page 190: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/190.jpg)
Grammar binarization
The binarization process
Find head
Binarize left nodes
Binarize right nodes
Y
A
B C (D) E F
Y*
![Page 191: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/191.jpg)
Grammar binarization
The binarization process
Find head
Binarize left nodes
Binarize right nodes
Y
A
B
C (D) E F
Y*
Y*
![Page 192: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/192.jpg)
Grammar binarization
The binarization process
Find head
Binarize left nodes
Binarize right nodes
Y
A
B
C
(D) E F
Y*
Y*
Y*
![Page 193: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/193.jpg)
Grammar binarization
The binarization process
Find head
Binarize left nodes
Binarize right nodes
Y
A
B
C
(D) E
F
Y*
Y*
Y*
Y*
![Page 194: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/194.jpg)
Beam-search decoding
Deterministic parsing: B=1
The statistical parser
Initial item stack: empty queue: input
Zhang and Clark, IWPT 2009
![Page 195: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/195.jpg)
Beam-search decoding
Deterministic parsing: B=1
The statistical parser
Initial item stack: empty queue: input
SHIFT state item 1
Zhang and Clark, IWPT 2009
![Page 196: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/196.jpg)
Beam-search decoding
Deterministic parsing: B=1
The statistical parser
Initial item stack: empty queue: input
SHIFT state item 1
SHIFT
REDUCE-UNARY-X
state item 2
state item 3
state item 4
...
state item N
different label {
Zhang and Clark, IWPT 2009
![Page 197: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/197.jpg)
Beam-search decoding
Deterministic parsing: B=1
The statistical parser
Initial item stack: empty queue: input
SHIFT state item 1
SHIFT
REDUCE-UNARY-X
state item 2
state item 3
state item 4
...
state item N
different label {
Zhang and Clark, IWPT 2009
![Page 198: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/198.jpg)
Beam-search decoding
Deterministic parsing: B=1
The statistical parser
Initial item stack: empty queue: input
SHIFT state item 1
SHIFT
REDUCE-UNARY-X
state item 2
state item 3
state item 4
...
state item N
different label {
SHIFT
REDUCE-UNARY-X
REDUCE-BINARY-{L/R}-X
Zhang and Clark, IWPT 2009
![Page 199: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/199.jpg)
Beam-search decoding
Deterministic parsing: B=1
The statistical parser
Initial item stack: empty queue: input
SHIFT state item 1
SHIFT
REDUCE-UNARY-X
state item 2
state item 3
state item 4
...
state item N
different label {
SHIFT
REDUCE-UNARY-X
REDUCE-BINARY-{L/R}-X
Zhang and Clark, IWPT 2009
![Page 200: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/200.jpg)
Beam-search decoding
Deterministic parsing: B=1
Beam-search: B>1
The statistical parser
Initial state item
Zhang and Clark, IWPT 2009
![Page 201: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/201.jpg)
Beam-search decoding
Deterministic parsing: B=1
Beam-search: B>1
The statistical parser
Initial state item
state item 1
SHIFT
Zhang and Clark, IWPT 2009
![Page 202: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/202.jpg)
Beam-search decoding
Deterministic parsing: B=1
Beam-search: B>1
The statistical parser
Initial state item
state item 1
SHIFT state item 1
state item 2
state item 3 ...
state item N
Zhang and Clark, IWPT 2009
![Page 203: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/203.jpg)
Beam-search decoding
Deterministic parsing: B=1
Beam-search: B>1
The statistical parser
Initial state item
state item 1
SHIFT state item 1
state item 2
state item 3 ...
state item N
state item 121
state item 234
state item 165 ...
state item 230
discarded
Zhang and Clark, IWPT 2009
![Page 204: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/204.jpg)
Beam-search decoding
Deterministic parsing: B=1
Beam-search: B>1
The statistical parser
Initial state item
state item 1
SHIFT state item 1
state item 2
state item 3 ...
state item N
state item 121
state item 234
state item 165 ...
state item 230
discarded
Zhang and Clark, IWPT 2009
![Page 205: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/205.jpg)
Features
Extracted from top nodes on the stack S0, S1, S2, S3, the left and right or single child of S0 and S1, and the first words on the queue N0, N1, N2, N3.
The statistical parser
stack queue
… S1 S
0
S0l S
0r S
1u
N0
…
Zhang and Clark, IWPT 2009
![Page 206: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/206.jpg)
Features
Manually combine word and constituent information
Unigrams
The statistical parser
Zhang and Clark, IWPT 2009
![Page 207: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/207.jpg)
Features
Manually combine of word and constituent information
Bigrams
The statistical parser
Zhang and Clark, IWPT 2009
![Page 208: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/208.jpg)
Features
Manually combine of word and constituent information
Trigrams
The statistical parser
Zhang and Clark, IWPT 2009
![Page 209: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/209.jpg)
An improvement
Unlike dependency parsing, different parse trees of the same input can use the different numbers of actions
The IDLE action
Align the unequal number of actions for different output trees
The statistical parser
Zhu et al., ACL 2013
![Page 210: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/210.jpg)
LEFT: REDUCE-BINARY-R(NP), IDLE
RIGHT: REDUCE-UNARY(NP), REDUCE-BINARY-L(VP)
The statistical parser
Zhu et al., ACL 2013
![Page 211: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/211.jpg)
English PTB
Chinese CTB51
Standard evaluation of bracketed P, R and F
Experiments
Zhu et al., ACL 2013
![Page 212: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/212.jpg)
English results on PTB
Experiments
LR LP F1 #Sent/Second
Ratnaparkhi (1997) 86.3 87.5 86.9 Unk
Collins (1999) 88.1 88.3 88.2 3.5
Charniak (2000) 89.5 89.9 89.5 5.7
Sagae & Lavie (2005) 86.1 86.0 86.0 3.7
Sagae & Lavie (2006) 87.8 88.1 87.9 2.2
Petrov & Klein (2007) 90.1 90.2 90.1 6.2
Carreras et al. (2008) 90.7 91.4 91.1 Unk
This implementation 90.2 90.7 90.4 89.5
Zhu et al., ACL 2013
![Page 213: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/213.jpg)
Chinese results on CTB51
Experiments
LR LP F1
Charniak (2000) 79.6 82.1 80.8
Bikel (2004) 79.3 82.0 80.6
Petrov & Klein (2007) 81.9 84.8 83.3
This implementation 82.1 84.3 83.2
Zhu et al., ACL 2013
![Page 214: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/214.jpg)
Applications
Word segmentation
Dependency parsing
Context free grammar parsing
Combinatory categorial grammar parsing
Joint segmentation and POS-tagging
Joint POS-tagging and dependency parsing
Joint segmentation, POS-tagging and constituent parsing
Joint segmentation, POS-tagging and dependency parsing
![Page 215: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/215.jpg)
Introduction to CCG parsing
Lexical categories
basic categories: N (nouns), NP (noun phrases), PP (prepositional phrases), ...
complex categories: S\NP (intransitive verbs), (S\NP)/NP (transitive verbs), ...
Adjacent phrases are combined to form larger phrases using category combination e.g.:
function application: NP S\NP ⇒ S
function composition: (S\NP)/(S\NP) (S\NP)/NP ⇒ (S\NP)/NP
Unary rules change the type of a phrase
Type raising: NP ⇒ S/(S\NP)
Type changing: S[pss]\NP ⇒ NP\NP
Zhang and Clark, ACL 2011
![Page 216: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/216.jpg)
Introduction to CCG parsing
An example derivation IBM bought Lotus
Zhang and Clark, ACL 2011
![Page 217: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/217.jpg)
Introduction to CCG parsing
An example derivation IBM bought Lotus NP (S[dcl]\NP)/NP NP
Zhang and Clark, ACL 2011
![Page 218: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/218.jpg)
Introduction to CCG parsing
An example derivation IBM bought Lotus NP (S[dcl]\NP)/NP NP S[dcl]\NP
Zhang and Clark, ACL 2011
![Page 219: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/219.jpg)
Introduction to CCG parsing
An example derivation IBM bought Lotus NP (S[dcl]\NP)/NP NP S[dcl]\NP S[dcl]
Zhang and Clark, ACL 2011
![Page 220: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/220.jpg)
Introduction to CCG parsing
Rule extraction
Manually define the lexicon and combinatory rule schemas (Steedman, 2000; Clark and Curran, 2007)
Extracting rule instances from corpus (Hockenmaier, 2003; Fowler and Penn, 2010)
Zhang and Clark, ACL 2011
![Page 221: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/221.jpg)
The shift-reduce parser
State
A stack of partial derivations
A queue of input words
A set of shift-reduce actions
SHIFT
COMBINE
UNARY
FINISH
Q1 Q
2 ...
The stack
The queue
... S2
(w2) S
1(w
1)
Zhang and Clark, ACL 2011
![Page 222: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/222.jpg)
The shift-reduce parser
Shift-reduce actions
SHIFT-X
Pushes the head of the queue onto the stack
Assigns label X (a lexical category)
SHIFT action performs lexical category disambiguation
Q1 Q
2 ...
The stack
The queue
Before SHIFT
Q2 ...
The stack
The queue
After SHIFT
... S2
(w2) S
1(w
1) X(Q
1) ... S
2(w
2) S
1(w
1)
Zhang and Clark, ACL 2011
![Page 223: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/223.jpg)
The shift-reduce parser
Shift-reduce actions
COMBINE-X
Pops the top two nodes off the stack
Combines into a new node X, and push it onto stack
Corresponds to the use of a combinatory rule in CCG
Q1 Q
2 ...
The stack
The queue
Before COMBINE
... S2
(w2) S
1(w
1) Q
1 Q
2 ...
The stack
The queue
After COMBINE
S2
(w2) S
1(w
1)
... X(w2)
Zhang and Clark, ACL 2011
![Page 224: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/224.jpg)
The shift-reduce parser
Shift-reduce actions
UNARY-X
Pops the top of the stack
Create a new node with category X; pushes it onto stack
Corresponds to the use of a unary rule in CCG
Q1 Q
2 ...
The stack
The queue
Before UNARY
... S2
(w2) S
1(w
1) ... S
2(w
2) X(w
1) Q
1 Q
2 ...
The stack
The queue
After UNARY
S1
(w1)
Zhang and Clark, ACL 2011
![Page 225: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/225.jpg)
The shift-reduce parser
Shift-reduce actions
FINISH
Terminates the parsing process
Can be applied when all input words have been pushed onto the stack
Allows fragmentary analysis:
when the stack holds multiple items that cannot be combined
such cases can arise from incorrect lexical category assignment
Zhang and Clark, ACL 2011
![Page 226: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/226.jpg)
The shift-reduce parser
An example parsing process
IBM bought Lotus yesterday
initial
Zhang and Clark, ACL 2011
![Page 227: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/227.jpg)
The shift-reduce parser
An example parsing process
bought Lotus yesterday NPIBM
SHIFT
Zhang and Clark, ACL 2011
![Page 228: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/228.jpg)
The shift-reduce parser
An example parsing process
Lotus yesterday NPIBM ((S[dcl]\NP)/NP)bought
SHIFT
Zhang and Clark, ACL 2011
![Page 229: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/229.jpg)
The shift-reduce parser
An example parsing process
yesterday NPIBM ((S[dcl]\NP)/NP)bought NPLotus
SHIFT
Zhang and Clark, ACL 2011
![Page 230: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/230.jpg)
The shift-reduce parser
An example parsing process
yesterday NPIBM (S[dcl]\NP)bought
((S[dcl]\NP)/NP)bought NPLotus
COMBINE
Zhang and Clark, ACL 2011
![Page 231: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/231.jpg)
The shift-reduce parser
An example parsing process
NPIBM (S[dcl]\NP)bought (S\NP)\(S\NP)yesterday
((S[dcl]\NP)/NP)bought NPLotus
SHIFT
Zhang and Clark, ACL 2011
![Page 232: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/232.jpg)
The shift-reduce parser
An example parsing process
NPIBM (S[dcl]\NP)bought
((S[dcl]\NP)/NP)bought NPLotus
(S[dcl]\NP)bought (S\NP)\(S\NP)yesterday
COMBINE
Zhang and Clark, ACL 2011
![Page 233: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/233.jpg)
The shift-reduce parser
An example parsing process
S[dcl]bought
((S[dcl]\NP)/NP)bought NPLotus
(S[dcl]\NP)bought (S\NP)\(S\NP)yesterday
NPIBM (S[dcl]\NP)bought
COMBINE
Zhang and Clark, ACL 2011
![Page 234: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/234.jpg)
The shift-reduce parser
An example parsing process
S[dcl]bought
((S[dcl]\NP)/NP)bought NPLotus
(S[dcl]\NP)bought (S\NP)\(S\NP)yesterday
NPIBM (S[dcl]\NP)bought
FINISH
Zhang and Clark, ACL 2011
![Page 235: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/235.jpg)
Features
Beam-search decoding
context
Stack nodes: S0 S1 S2 S3
Queue nodes: Q0 Q1 Q2 Q3
Stack subnodes: S0L S0R S0U S1L/R/U
Q0 Q
1 Q
2 Q
3 ...
The stack
The queue
... S3 S
2 S
1 S
0
S1U
S0L
S0R
S0wp, S0c, S0pc, S0wc, S1wp, S1c, S1pc, S1wc, S2pc, S2wc, S3pc, S3wc,
Q0wp, Q1wp, Q2wp, Q3wp,
S0Lpc, S0Lwc, S0Rpc, S0Rwc, S0Upc, S0Uwc, S1Lpc, S1Lwc, S1Rpc, S1Rwc, S1Upc, S1Uwc,
S0wcS1wc, S0cS1w, S0wS1c, S0cS1c, S0wcQ0wp, S0cQ0wp, S0wcQ0p, S0cQ0p, S1wcQ0wp, S1cQ0wp, S1wcQ0p, S1cQ0p,
S0wcS1cQ0p, S0cS1wcQ0p, S0cS1cQ0wp, S0cS1cQ0p, S0pS1pQ0p, S0wcQ0pQ1p, S0cQ0wpQ1p, S0cQ0pQ1wp, S0cQ0pQ1p, S0pQ0pQ1p, S0wcS1cS2c, S0cS1wcS2c, S0cS1cS2wc, S0cS1cS2c, S0pS1pS2p,
S0cS0HcS0Lc, S0cS0HcS0Rc, S1cS1HcS1Rc, S0cS0RcQ0p, S0cS0RcQ0w, S0cS0LcS1c, S0cS0LcS1w, S0cS1cS1Rc, S0wS1cS1Rc.
Zhang and Clark, ACL 2011
![Page 236: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/236.jpg)
Experimental data
CCGBank (Hockenmaier and Steedman, 2007)
Split into three subsets:
Training (section 02 – 21)
Development (section 00)
Testing (section 23)
Extract CCG rules
Binary instances: 3070
Unary instances: 191
Evaluation F-score over CCG dependencies
Use C&C tools for transformation
Zhang and Clark, ACL 2011
![Page 237: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/237.jpg)
Test results
F&P = Fowler and Penn (2010)
LP LR LF lsent. cats. evaluated
shift-reduce 87.43 83.61 85.48 35.19 93.12 all sentences
C&C (normal-form) 85.58 82.85 84.20 32.90 92.84 all sentences
shift-reduce 87.43 83.71 85.53 35.34 93.15 99.58% (C&C coverage)
C&C (hybrid) 86.17 84.74 85.45 32.92 92.98 99.58% (C&C coverage)
C&C (normal-form) 85.48 84.60 85.04 33.08 92.86 99.58% (C&C coverage)
F&P (Petrov I-5)* 86.29 85.73 86.01 -- -- -- (F&P ∩ C&C coverage; 96.65% on dev. test)
C&C hybrid* 86.46 85.11 85.78 -- -- -- (F&P ∩ C&C coverage; 96.65% on dev. test)
Zhang and Clark, ACL 2011
![Page 238: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/238.jpg)
Error Comparisons
As sentence length increases Both parsers give lower performance
No difference in the rate of accuracy degradation
When dependency length increases
Zhang and Clark, ACL 2011
![Page 239: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/239.jpg)
Applications
Word segmentation
Dependency parsing
Context free grammar parsing
Combinatory categorial grammar parsing
Joint segmentation and POS-tagging
Joint POS-tagging and dependency parsing
Joint segmentation, POS-tagging and constituent parsing
Joint segmentation, POS-tagging and dependency parsing
![Page 240: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/240.jpg)
Introduction of Chinese POS-tagging
Word segmentation is a necessary step before POS-tagging Input 我喜欢读书 Ilikereadingbooks Segment 我 喜欢 读 书 I like reading books Tag 我/PN 喜欢/V 读/V 书/N I/PN like/V reading/V books/N
The traditional approach treats word segmentation and POS-tagging as two separate steps
![Page 241: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/241.jpg)
Two observations
Segmentation errors propagate to the step of POS-tagging Input 我喜欢读书 llikereadingbooks Segment 我喜 欢 读 书 Ili ke reading books Tag 我喜/N 欢/V 读/V 书/N Ili/N ke/V reading/V books/N
Information about POS helps to improve segmentation 一/CD (1) 个/M (measure word) 人/N (person) or 一/CD (1) 个人/JJ (personal) 二百三十三/CD (233) or 二/CD (2) 百/CD (hundred) 三/CD (3) 十/CD (ten) 三/CD (3)
![Page 242: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/242.jpg)
Joint segmentation and tagging
The observations lead to the solution of joint segmentation and POS-tagging Input 我喜欢读书 Ilikereading Output 我/PN 喜欢/V 读/V 书/N I/PN like/V reading/V books/N
Consider segmentation and POS information simultaneously
The most appropriate output is chosen from all possible segmented and tagged outputs
![Page 243: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/243.jpg)
The transition system
State
Partial segmented results
Unprocessed characters
Two actions
Separate (t) : t is a POS tag
Append
Zhang and Clark, EMNLP 2010
![Page 244: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/244.jpg)
The transition system
Initial state
我喜欢读书
Zhang and Clark, EMNLP 2010
![Page 245: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/245.jpg)
The transition system
Separate(PN)
喜欢读书 我/PN
Zhang and Clark, EMNLP 2010
![Page 246: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/246.jpg)
The transition system
Separate (V)
欢读书 我/PN 喜/V
Zhang and Clark, EMNLP 2010
![Page 247: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/247.jpg)
The transition system
Append
读书 我/PN 喜欢/V
Zhang and Clark, EMNLP 2010
![Page 248: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/248.jpg)
The transition system
Separate (V)
书 我/PN 喜欢/V 读/V
Zhang and Clark, EMNLP 2010
![Page 249: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/249.jpg)
The transition system
Separate (N)
我/PN 喜欢/V 读/V 书/N
Zhang and Clark, EMNLP 2010
![Page 250: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/250.jpg)
The transition system
End state
我/PN 喜欢/V 读/V 书/N
Zhang and Clark, EMNLP 2010
![Page 251: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/251.jpg)
Feature templates
Zhang and Clark, EMNLP 2010
![Page 252: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/252.jpg)
Feature templates
Zhang and Clark, EMNLP 2010
![Page 253: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/253.jpg)
Experiments
Penn Chinese Treebank 5 (CTB-5)
Zhang and Clark, EMNLP 2010
![Page 254: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/254.jpg)
Experiments
SF JF
K09 (error-driven) 97.87 93.67
This work 97.78 93.67
Zhang 2008 97.82 93.62
K09 (baseline) 97.79 93.60
J08a 97.85 93.41
J08b 97.74 93.37
N07 97.83 93.32
SF = segmentation F-score; JF = joint segmentation and POS-tagging F-score
Accuracy comparisons between various joint segmentors and POS-taggers on CTB5
Zhang and Clark, EMNLP 2010
![Page 255: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/255.jpg)
Applications
Word segmentation
Dependency parsing
Context free grammar parsing
Combinatory categorial grammar parsing
Joint segmentation and POS-tagging
Joint POS-tagging and dependency parsing
Joint segmentation, POS-tagging and constituent parsing
Joint segmentation, POS-tagging and dependency parsing
![Page 256: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/256.jpg)
Introduction
Traditional dependency parsing
Input: POS-tagged sentence e.g He/PN does/V it/PN here/RB
Output:
Accurate dependency parsing heavily relies on POS tagging information
Error propagation
Syntactic information can be helpful for POS disambiguation
He/PN does/V it/PN here/RB
![Page 257: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/257.jpg)
Introduction
Joint POS-tagging and dependency parsing
Input: POS-tagged sentence e.g He does it here
Output:
He/PN does/V it/PN here/RB
![Page 258: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/258.jpg)
The extended arc-standard transition system
Extended arc-standard dependency parsing transition
State
A stack to hold partial candidates
A queue of next incoming words
Four actions
SHIFT(t), LEFT-REDUCE, RIGHT-REDUCE t is the POS tag
Hatori et al. IJCNLP 2011
![Page 259: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/259.jpg)
Actions
SHIFT(t)
259
ST ST1 ...
STLC STRC
The stack
The input
N0 N1 N2 N3 ...
The extended arc-standard transition system
Hatori et al. IJCNLP 2011
![Page 260: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/260.jpg)
Actions
SHIFT(t)
Pushes stack
260
ST ST1 ...
STLC STRC
The stack
The input
N1 N2 N3 ... N0/t
The extended arc-standard transition system
Hatori et al. IJCNLP 2011
![Page 261: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/261.jpg)
Actions
LEFT-REDUCE
261
ST ST1 ...
STLC STRC
The stack
The input
N0 N1 N2 N3 ...
The extended arc-standard transition system
Hatori et al. IJCNLP 2011
![Page 262: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/262.jpg)
Actions
LEFT-REDUCE
Pops stack
Adds link
262
ST
ST1
...
STLC STRC
The stack
The input
N0 N1 N2 N3 ...
The extended arc-standard transition system
Hatori et al. IJCNLP 2011
![Page 263: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/263.jpg)
Actions
RIGHT-REDUCE
263
ST ST1 ...
STLC STRC
The stack
The input
N0 N1 N2 N3 ...
The extended arc-standard transition system
Hatori et al. IJCNLP 2011
![Page 264: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/264.jpg)
Actions
RIGHT-REDUCE
Pops stack
Adds link
264
ST
ST1 ...
STLC STRC
The stack
The input
N0 N1 N2 N3 ...
The extended arc-standard transition system
Hatori et al. IJCNLP 2011
![Page 265: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/265.jpg)
An example S(t) – SHIFT(t)
LR – LEFT-REDUCE
RR – RIGHT-REDUCE
He does it here
The extended arc-standard transition system
Hatori et al. IJCNLP 2011
![Page 266: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/266.jpg)
An example S(t) – SHIFT(t)
LR – LEFT-REDUCE
RR – RIGHT-REDUCE
He does it here does it here He/PN S(PN)
The extended arc-standard transition system
Hatori et al. IJCNLP 2011
![Page 267: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/267.jpg)
An example S(t) – SHIFT(t)
LR – LEFT-REDUCE
RR – RIGHT-REDUCE
He does it here does it here He/PN S(PN) it here S(V)
The extended arc-standard transition system
He/PN does/V
Hatori et al. IJCNLP 2011
![Page 268: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/268.jpg)
An example S(t) – SHIFT(t)
LR – LEFT-REDUCE
RR – RIGHT-REDUCE
He does it here does it here He/PN S(PN) it here S(V) it here LR
He/PN
Does/V
The extended arc-standard transition system
He/PN does/V
Hatori et al. IJCNLP 2011
![Page 269: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/269.jpg)
An example S(t) – SHIFT(t)
LR – LEFT-REDUCE
RR – RIGHT-REDUCE
He does it here does it here He/PN S(PN) it here S(V) it here LR
He/PN
Does/V
He/PN
does/V it/PN
S(P
N)
here
The extended arc-standard transition system
He/PN does/V
Hatori et al. IJCNLP 2011
![Page 270: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/270.jpg)
An example S(t) – SHIFT(t)
LR – LEFT-REDUCE
RR – RIGHT-REDUCE
He does it here does it here He/PN S(PN) it here S(V) it here LR
He/PN
Does/V
He/PN
does/V it/PN
S(P
N)
here RR
He/PN
does/V here
it/PN
The extended arc-standard transition system
He/PN does/V
Hatori et al. IJCNLP 2011
![Page 271: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/271.jpg)
An example S(t) – SHIFT(t)
LR – LEFT-REDUCE
RR – RIGHT-REDUCE
He does it here does it here He/PN S(PN) it here S(V) it here LR
He/PN
Does/V
He/PN
does/V it/PN
S(P
N)
here RR
He/PN
does/V here
it/PN
S(RB)
He/PN
does/V here/RB
it/PN
The extended arc-standard transition system
He/PN does/V
Hatori et al. IJCNLP 2011
![Page 272: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/272.jpg)
An example S(t) – SHIFT(t)
LR – LEFT-REDUCE
RR – RIGHT-REDUCE
He does it here does it here He/PN S(PN) it here S(V) it here LR
He/PN
Does/V
He/PN
does/V it/PN
S(P
N)
here RR
He/PN
does/V here
it/PN
S(RB)
He/PN
does/V here/RB
it/PN
RR
He/PN
does/V
it/PN
here/RB
The extended arc-standard transition system
He/PN does/V
Hatori et al. IJCNLP 2011
![Page 273: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/273.jpg)
Features
POS tag features
Hatori et al. IJCNLP 2011
![Page 274: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/274.jpg)
Features
Dependency parsing features
Hatori et al. IJCNLP 2011
![Page 275: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/275.jpg)
Features
Syntactic features
Hatori et al. IJCNLP 2011
![Page 276: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/276.jpg)
Experiments
CTB5 dataset
![Page 277: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/277.jpg)
Results
Model LAS UAS POS
Li et al. (2011) (unlabeled) 80.74 93.08
Li et al. (2012) (unlabeled) --- 81.21 94.51
Li et al. (2012) (labeled) 79.01 81.67 94.60
Hatori et al. (2011) (unlabeled) --- 81.33 93.94
Bohnet and Nirve (2012) (labeled) 77.91 81.42 93.24
Our implementation (unlabeled) --- 81.20 94.15
Out implementation (labeled) 78.30 81.26 94.28
![Page 278: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/278.jpg)
Applications
Word segmentation
Dependency parsing
Context free grammar parsing
Combinatory categorial grammar parsing
Joint segmentation and POS-tagging
Joint POS-tagging and dependency parsing
Joint segmentation, POS-tagging and constituent parsing
Joint segmentation, POS-tagging and dependency parsing
![Page 279: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/279.jpg)
Traditional: word-based Chinese parsing
CTB-style word-based syntax tree for “中国 (China) 建筑业 (architecture industry) 呈现 (show) 新 (new) 格局 (pattern)”.
Zhang et al. ACL 2013
![Page 280: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/280.jpg)
This: character-based Chinese parsing
Character-level syntax tree with hierarchal word structures for “中 (middle) 国 (nation) 建 (construction) 筑 (building) 业 (industry) 呈 (present) 现 (show) 新 (new) 格 (style) 局 (situation)”.
Zhang et al. ACL 2013
![Page 281: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/281.jpg)
Why character-based?
Chinese words have syntactic structures.
Zhang et al. ACL 2013
![Page 282: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/282.jpg)
Why character-based?
Chinese words have syntactic structures.
Zhang et al. ACL 2013
![Page 283: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/283.jpg)
Why character-based?
Deep character information of word structures.
Zhang et al. ACL 2013
![Page 284: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/284.jpg)
Why character-based?
Deep character information of word structures.
Zhang et al. ACL 2013
![Page 285: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/285.jpg)
Why character-based?
Build syntax tree from character sequences.
Not require segmentation or POS-tagging as input.
Benefit from joint framework, avoid error propagation.
Zhang et al. ACL 2013
![Page 286: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/286.jpg)
Word structure annotation
Binarized tree structure for each word.
Zhang et al. ACL 2013
![Page 287: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/287.jpg)
Word structure annotation
Binarized tree structure for each word.
b, i denote whether the below character is at a word’s beginning position.
l, r, c denote the head direction of current node, respectively left, right and coordination.
Zhang et al. ACL 2013
![Page 288: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/288.jpg)
Word structure annotation
Binarized tree structure for each word.
b, i denote whether the below character is at a word’s beginning position.
l, r, c denote the head direction of current node, respectively left, right and coordination.
We extend word-based phrase-structures into character-based syntax trees using the word structures demonstrated above.
Zhang et al. ACL 2013
![Page 289: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/289.jpg)
Word structure annotation
Annotation input: a word and its POS.
A word may have different structures according to different POS.
Zhang et al. ACL 2013
![Page 290: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/290.jpg)
The character-based parsing model
A transition-based parser
Zhang et al. ACL 2013
![Page 291: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/291.jpg)
The character-based parsing model
A transition-based parser
Extended from Zhang and Clark (2009), a word-based transition parser.
Zhang et al. ACL 2013
![Page 292: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/292.jpg)
The character-based parsing model
A transition-based parser
Extended from Zhang and Clark (2009), a word-based transition parser.
Incorporating features of a word-based parser as well as a joint SEG&POS system.
Zhang et al. ACL 2013
![Page 293: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/293.jpg)
The character-based parsing model
A transition-based parser
Extended from Zhang and Clark (2009), a word-based transition parser.
Incorporating features of a word-based parser as well as a joint SEG&POS system.
Adding the deep character information from word structures.
Zhang et al. ACL 2013
![Page 294: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/294.jpg)
The transition system
SHIFT-SEPARATE(t), SHIFT-APPEND, REDUCE-SUBWORD(d),
REDUCE-WORD, REDUCE-BINARY(d;l), REDUCE-UNARY(l), TERMINATE
State:
Actions:
Zhang et al. ACL 2013
![Page 295: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/295.jpg)
Actions
SHIFT-SEPARATE(t)
Zhang et al. ACL 2013
![Page 296: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/296.jpg)
Actions
SHIFT-SEPARATE(t)
Zhang et al. ACL 2013
![Page 297: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/297.jpg)
Actions
SHIFT-APPEND
Zhang et al. ACL 2013
![Page 298: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/298.jpg)
Actions
SHIFT-APPEND
Zhang et al. ACL 2013
![Page 299: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/299.jpg)
Actions
REDUCE-SUBWORD(d)
Zhang et al. ACL 2013
![Page 300: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/300.jpg)
Actions
REDUCE-SUBWORD(d)
Zhang et al. ACL 2013
![Page 301: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/301.jpg)
Actions
REDUCE-WORD
Zhang et al. ACL 2013
![Page 302: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/302.jpg)
Actions
REDUCE-WORD
Zhang et al. ACL 2013
![Page 303: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/303.jpg)
Actions
REDUCE-BINARY(d; l)
Zhang et al. ACL 2013
![Page 304: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/304.jpg)
Actions
REDUCE-BINARY(d; l)
Zhang et al. ACL 2013
![Page 305: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/305.jpg)
Actions
REDUCE-UNARY(l)
Zhang et al. ACL 2013
![Page 306: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/306.jpg)
Actions
REDUCE-UNARY(l)
Zhang et al. ACL 2013
![Page 307: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/307.jpg)
Actions
TERMINATE
Zhang et al. ACL 2013
![Page 308: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/308.jpg)
Features
From word-based parser (Zhang and Clark, 2009)
From joint SEG&POS-Tagging (Zhang and Clark, 2010)
Zhang et al. ACL 2013
![Page 309: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/309.jpg)
Features
From word-based parser (Zhang and Clark, 2009)
From joint SEG&POS-Tagging (Zhang and Clark, 2010)
baseline features
Zhang et al. ACL 2013
![Page 310: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/310.jpg)
Features
From word-based parser (Zhang and Clark, 2009)
From joint SEG&POS-Tagging (Zhang and Clark, 2010)
baseline features
Deep character features
Zhang et al. ACL 2013
![Page 311: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/311.jpg)
Features
From word-based parser (Zhang and Clark, 2009)
From joint SEG&POS-Tagging (Zhang and Clark, 2010)
baseline features
Deep character features
new features
Zhang et al. ACL 2013
![Page 312: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/312.jpg)
Features
Zhang et al. ACL 2013
![Page 313: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/313.jpg)
Features
Zhang et al. ACL 2013
![Page 314: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/314.jpg)
Experiments
Penn Chinese Treebank 5 (CTB-5)
Zhang et al. ACL 2013
![Page 315: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/315.jpg)
Experiments
Baseline models
Pipeline model including:
Joint SEG&POS-Tagging model (Zhang and Clark, 2010).
Word-based CFG parsing model (Zhang and Clark, 2009).
Zhang et al. ACL 2013
![Page 316: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/316.jpg)
Experiments
Our proposed models
Joint model with flat word structures
Joint model with annotated word structures
Zhang et al. ACL 2013
![Page 317: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/317.jpg)
Results
Task P R F
Pipeline Seg 97.35 98.02 97.69
Tag 93.51 94.15 93.83
Parse 81.58 82.95 82.26
Flat word Seg 97.32 98.13 97.73
structures Tag 94.09 94.88 94.48
Parse 83.39 83.84 83.61
Annotated Seg 97.49 98.18 97.84
word
structures Tag 94.46 95.14 94.80
Parse 84.42 84.43 84.43
WS 94.02 94.69 94.35
Zhang et al. ACL 2013
![Page 318: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/318.jpg)
Compare with other systems
Task Seg Tag Parse
Kruengkrai+ ’09 97.87 93.67 –
Sun ’11 98.17 94.02 –
Wang+ ’11 98.11 94.18 –
Li ’11 97.3 93.5 79.7
Li+ ’12 97.50 93.31 –
Hatori+ ’12 98.26 94.64 –
Qian+ ’12 97.96 93.81 82.85
Ours pipeline 97.69 93.83 82.26
Ours joint flat 97.73 94.48 83.61
Ours joint annotated 97.84 94.80 84.43
Zhang et al. ACL 2013
![Page 319: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/319.jpg)
Applications
Word segmentation
Dependency parsing
Context free grammar parsing
Combinatory categorial grammar parsing
Joint segmentation and POS-tagging
Joint POS-tagging and dependency parsing
Joint segmentation, POS-tagging and constituent parsing
Joint segmentation, POS-tagging and dependency parsing
![Page 320: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/320.jpg)
Traditional word-based dependency parsing
Inter-word dependencies
Zhang et al. ACL 2014
![Page 321: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/321.jpg)
Character-level dependency parsing
Inter- and intra-word dependencies
Zhang et al. ACL 2014
![Page 322: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/322.jpg)
Main method
An overview
Transition-based framework with global learning and beam search (Zhang and Clark, 2011)
Extensions from word-level transition-based dependency parsing models
Arc-standard (Nirve 2008; Huang et al., 2009 )
Arc-eager (Nirve 2008; Zhang and Clark, 2008)
Zhang et al. ACL 2014
![Page 323: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/323.jpg)
Main method
Word-level transition-based dependency parsing
Arc-standard
Zhang et al. ACL 2014
![Page 324: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/324.jpg)
Main method
Word-level transition-based dependency parsing
Arc-eager
Zhang et al. ACL 2014
![Page 325: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/325.jpg)
Main method
Word-level to character-level
Arc-standard
Zhang et al. ACL 2014
![Page 326: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/326.jpg)
Main method
Word-level to character-level
Arc-standard
Zhang et al. ACL 2014
![Page 327: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/327.jpg)
Main method
Word-level to character-level
Arc-eager
Zhang et al. ACL 2014
![Page 328: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/328.jpg)
Main method
Word-level to character-level
Arc-eager
Zhang et al. ACL 2014
![Page 329: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/329.jpg)
Main method
New features
Zhang et al. ACL 2014
![Page 330: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/330.jpg)
Experiments
Data
CTB5.0, CTB6.0, CTB7.0
Zhang et al. ACL 2014
![Page 331: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/331.jpg)
Experiments
Proposed models
STD (real, pseudo)
Joint segmentation and POS-tagging with inner dependencies
STD (pseudo, real)
Joint segmentation, POS-tagging and dependency parsing
STD (real, real)
Joint segmentation, POS-tagging and dependency parsing with inner dependencies
EAG (real, pseudo)
Joint segmentation and POS-tagging with inner dependencies
EAG (pseudo, real)
Joint segmentation, POS-tagging and dependency parsing
EAG (real, real)
Joint segmentation, POS-tagging and dependency parsing with inner dependencies
Zhang et al. ACL 2014
![Page 332: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/332.jpg)
Experiments
Final results
Zhang et al. ACL 2014
![Page 333: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/333.jpg)
Experiments
Analysis: word structure predication
OOV words
Overall
Assuming that the segmentation is correct
STD(real,real) 67.98%
EAG(real,real) 69.01%
STD(real,real) 87.64%
EAG(real,real) 89.07%
Zhang et al. ACL 2014
![Page 334: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/334.jpg)
Experiments
Analysis: word structure predication
OOV words
Zhang et al. ACL 2014
![Page 335: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/335.jpg)
Outline
Introduction Applications
Analysis ZPar
![Page 336: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/336.jpg)
Analysis
Empirical analysis
Theoretical analysis
![Page 337: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/337.jpg)
Analysis
Empirical analysis
Theoretical analysis
![Page 338: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/338.jpg)
Empirical analysis
Effective on all the tasks: beam-search + global learning + rich features
What are the effects of global learning and beam-search, respectively
Study empirically using dependency parsing
Zhang and Nivre, COLING 2012
![Page 339: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/339.jpg)
Empirical analysis
Learning, search, features
Arc-eager parser
Learning
Global training
Optimize the entire transition sequence for a sentence
Structured predication
Local training
Each transition is considered in isolation
No global view of the transition sequence for a sentence
Classfier
Zhang and Nivre, COLING 2012
![Page 340: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/340.jpg)
Empirical analysis
Learning, search, features
Arc-eager parser
Learning
Features
Base features (local features) (Zhang and Clark, EMNLP 2008)
Features refer to combinations of atomic features (words and their POS tags) of the nodes on the stack and in the queue only.
All features (including rich non-local features) (Zhang and Nirve, ACL 2011)
Dependency distance
Valence
Grand and child features
Third-order features
Zhang and Nivre, COLING 2012
![Page 341: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/341.jpg)
Empirical analysis
Learning, search, features
Arc-eager parser
Learning
Features
Search
Beam = 1, greedy
Beam > 1
Zhang and Nivre, COLING 2012
![Page 342: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/342.jpg)
Empirical analysis
Contrast
Zhang and Nivre, COLING 2012
![Page 343: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/343.jpg)
Empirical analysis
Observations
Beam = 1, global learning ≈ local learning
Beam > 1, global learning ↑, local learning ↓
Richer features, make ↑ or ↓ faster.
Zhang and Nivre, COLING 2012
![Page 344: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/344.jpg)
Empirical analysis
Why does not local learning benefit from beam-search?
Zhang and Nivre, COLING 2012
![Page 345: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/345.jpg)
Empirical analysis
Does greedy, local learning benefit from rich features?
Beam search (Zpar) and Greedy search (Malt) with non-local features
Zhang and Nivre, COLING 2012
![Page 346: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/346.jpg)
Empirical analysis
Conclusions
Global learning and beam-search benefit each other
Global learning and beam-search accommodate richer features without overfitting
Global learning and beam-search should be used simultaneously
Zhang and Nivre, COLING 2012
![Page 347: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/347.jpg)
Analysis
Empirical analysis
Theoretical analysis
![Page 348: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/348.jpg)
Theoretical analysis
The perceptron
Online learning framework
1 training examples ( , ) |
set 0
for 1
for 1
calculate decode( , )
if( )
( ,
T
i i i
i i
i i
i i
x y
w
r C
i T
z w x
z y
w w x y
Inputs :
Initialization :
Algorithm :
) ( , )i ix z
w
output :
Michael Collins, EMNLP 2002
![Page 349: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/349.jpg)
Theoretical analysis
The perceptron
If the data 𝑥𝑡, 𝑦𝑡 | 𝑇 𝑡 = 1
is separable and for all 𝜙 𝑥, 𝑦 ≤ 𝑅,
then there exists some 𝜆 > 0, making the max error number (updating number) be less than 𝑅2/𝜆2
Michael Collins, EMNLP 2002
1
1
0
1
if can seperate the data, th
( ( ( , ) ( , )))
( ( , ) ( , ))
thus,
assume 0 and
en
another fact || || 1,
then
( ,
) ( , ))p
t
k k p
t t t
k p
t t t
k k
k
t t
w u w x y x y u
w u x y x y u
w u w u
w
u
x y u x
u
w k
y u
![Page 350: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/350.jpg)
Theoretical analysis
The perceptron
If the data 𝑥𝑡, 𝑦𝑡 | 𝑇 𝑡 = 1
is separable and for all 𝜙 𝑥, 𝑦 ≤ 𝑅,
then there exists some 𝜆 > 0, making the max error number (updating number) be less than 𝑅2/𝜆2
Michael Collins, EMNLP 2002
1
1
0
1
if can seperate the data, th
( ( ( , ) ( , )))
( ( , ) ( , ))
thus,
assume 0 and
en
another fact || || 1,
then
( ,
) ( , ))p
t
k k p
t t t
k p
t t t
k k
k
t t
w u w x y x y u
w u x y x y u
w u w u
w
u
x y u x
u
w k
y u
the margin
![Page 351: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/351.jpg)
Theoretical analysis
The perceptron
If the data 𝑥𝑡, 𝑦𝑡 | 𝑇 𝑡 = 1
is separable and for all 𝜙 𝑥, 𝑦 ≤ 𝑅,
then there exists some 𝜆 > 0, making the max error number (updating number) be less than 𝑅2/𝜆2
Michael Collins, EMNLP 2002
1 1
1 2 2 2
1 2 2 2
( ( , ) ( , ))
|| || || || 2( ( , ) ( , )) || ( , ) ( , )||
if we have this update, then
thus, || || || || || ( , ) ( , )|| ||
( , ) ( , ))
k k p
t t t
k k p k p
t t t t t t
k k p
t t t
pk kt t t
w w x y x y
w w x y x y w x y x y
w w x y x y
x y w x y w
2 2
0
1 2 2
|| 4
assume 0
then || || 4
k
k
w R
w
w kR
![Page 352: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/352.jpg)
Theoretical analysis
The perceptron
If the data 𝑥𝑡, 𝑦𝑡 | 𝑇 𝑡 = 1
is separable and for all 𝜙 𝑥, 𝑦 ≤ 𝑅,
then there exists some 𝜆 > 0, making the max error number (updating number) be less than 𝑅2/𝜆2
Michael Collins, EMNLP 2002
1 1
1 2 2 2
1 2 2 2
( ( , ) ( , ))
|| || || || 2( ( , ) ( , )) || ( , ) ( , )||
if we have this update, then
thus, || || || || || ( , ) ( , )|| ||
( , ) ( , ))
k k p
t t t
k k p k p
t t t t t t
k k p
t t t
pk kt t t
w w x y x y
w w x y x y w x y x y
w w x y x y
x y w x y w
2 2
0
1 2 2
|| 4
assume 0
then || || 4
k
k
w R
w
w kR
This is satisfied in dynamic programming, it may not hold in beam-search
![Page 353: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/353.jpg)
Theoretical analysis
The perceptron
If the data 𝑥𝑡, 𝑦𝑡 | 𝑇 𝑡 = 1
is separable and for all 𝜙 𝑥, 𝑦 ≤ 𝑅,
then there exists some 𝜆 > 0, making the max error number (updating number) be less than 𝑅2/𝜆2
Michael Collins, EMNLP 2002
1
1 2 2
2 2 1 2 2
2 2
2 2
|| || 4
Thus, || || 4
4, another words, also
k
k
k
w k
w kR
k w kR
R Rk k
![Page 354: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/354.jpg)
Theoretical analysis
The perceptron
If the data 𝑥𝑡, 𝑦𝑡 | 𝑇 𝑡 = 1
is not separable, we should assume
that there is an oracle u so that the number of errors made by it is o(T).
Michael Collins, EMNLP 2002
1
1 0 0
0
1
( ( ( , ) ( , )))
( ( , ) ( , ))
thus when ,
( ( )) ( ) ( )
assume 0 and another fact || || 1,
then ( )
k k p
t t t
k p
t t t
k
k
w u w x y x y u
w u x y x y u
k CT
w u k o k o k CR w u k o k w u
w u
w k o k
![Page 355: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/355.jpg)
Theoretical analysis
The perceptron
Huang et al., NAACL 2012
![Page 356: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/356.jpg)
Theoretical analysis
The perceptron
Huang et al., NAACL 2012
![Page 357: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/357.jpg)
Theoretical analysis
The perceptron
The third factor must be less than zero! (violation)
Huang et al., NAACL 2012
![Page 358: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/358.jpg)
Theoretical analysis
Why early-update?
early update -- when correct label first falls off the beam
up to this point the incorrect prefix should score higher
standard update (full update) -- no guarantee!
Huang et al., NAACL 2012
(pruned)
![Page 359: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/359.jpg)
Outline
Introduction Applications
Analysis ZPar
![Page 360: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/360.jpg)
ZPar
Introduction
Usage
Development
On-going work
Contributions welcome
![Page 361: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/361.jpg)
ZPar
Brief introduction
Usage
Development
On-going work
Contributions welcome
![Page 362: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/362.jpg)
Brief introduction
Initiated in 2009 at Oxford, extended at Cambridge and SUTD, with more developers being involved
![Page 363: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/363.jpg)
Brief introduction
2009—2014, Oxford, Cambridge, SUTD
Functionalities extended
![Page 364: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/364.jpg)
Brief introduction
2009—2014, Oxford, Cambridge, SUTD
Functionalities extended
Released several versions
![Page 365: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/365.jpg)
Brief introduction
2009—2014, Oxford, Cambridge, SUTD
Functionalities extended
Released several versions
Contains all implementations of this tutorial
Segmentation
POS tagging (single or joint)
Dependency parsing (single or joint)
Constituent parsing (single or joint)
CCG parsing (single or joint)
![Page 366: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/366.jpg)
Brief introduction
2009—2014, Oxford, Cambridge, SUTD
Functionalities extended
Released several versions
Contains all implementations of this tutorial
Code structure
![Page 367: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/367.jpg)
ZPar
Introduction
Usage
Development
On-going work
Contributions welcome
![Page 368: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/368.jpg)
Usage
Download
http://sourceforge.net/projects/zpar/files/0.6/
![Page 369: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/369.jpg)
Usage
For off-the-shelf Chinese language processing:
Compile: make zpar
![Page 370: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/370.jpg)
Usage
For off-the-shelf Chinese language processing:
Compile: make zpar
Usage
![Page 371: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/371.jpg)
Usage
For off-the-shelf Chinese language processing:
Compile: make zpar
Usage
Model download
![Page 372: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/372.jpg)
Usage
For off-the-shelf Chinese language processing:
Compile: make zpar
Usage
Model download
An example
![Page 373: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/373.jpg)
Usage
For off-the-shelf English language processing:
Compile: make zpar.en
![Page 374: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/374.jpg)
Usage
For off-the-shelf English language processing:
Compile: make zpar.en
Usage
![Page 375: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/375.jpg)
Usage
For off-the-shelf English language processing:
Compile: make zpar.en
Usage
Model download
![Page 376: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/376.jpg)
Usage
For off-the-shelf English language processing:
Compile: make zpar.en
Usage
Model download
An example
![Page 377: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/377.jpg)
Usage
A generic ZPar
For many languages the tasks are similar
POS-tagging (consists morphological analysis) and parsing
![Page 378: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/378.jpg)
Usage
For generic processing:
Compile: make zpar.ge
Usage
![Page 379: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/379.jpg)
Usage
For generic processing:
Compile: make zpar.ge
Usage
An example
![Page 380: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/380.jpg)
Usage
Using the individual components
Chinese word segmentation
Makefile modification
Make
Train
Decode
SEGMENTOR_IMPL = agenda
make segmentor
./train input_file model_file iteration
./segmentor model_file input_file output_file
![Page 381: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/381.jpg)
Usage
Using the individual components
Chinese/English POS tagger
Makefile modification
Make
Train
Decode
CHINESE_TAGGER_IMPL = agenda
ENGLISH_TAGGER_IMPL = agenda
make chinese.postagger
make english.postagger
./train input_file model_file iteration
./tagger model_file input_file output_file
For Chinese POS-tagging
For English POS-tagging
![Page 382: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/382.jpg)
Usage
Using the individual components
Chinese/English dependency parsing
Makefile modification
Make
Train
Decode
CHINESE_DEPPARSER_IMPL = arceager
ENGLISH_DEPPARSER_IMPL = arceager
make chinese.depparser
make english.depparser
./train input_file model_file iteration
./tagger input_file output_file model_file
![Page 383: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/383.jpg)
Usage
Using the individual components
Chinese/English constituent parsing
Makefile modification
Make
Train
Decode
CHINESE_CONPARSER_IMPL = cad
ENGLISH_CONPARSER_IMPL = cad
make chinese.conparser
make english.conparser
./train input_file model_file iteration
./tagger input_file output_file model_file
For English/Chinese constituent parsing
For Chinese character-level constituent parsing
![Page 384: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/384.jpg)
Usage
A tip for training: obtain a best model
For i = 1 to maxN
./train inputfile modelfile 1
evaluate on a develop file and get current model’s performance
if(current performance is the best performance)
save current model
endif
End for
![Page 385: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/385.jpg)
Usage
More documentation at http://people.sutd.edu.sg/~yue_zhang/doc/index.html
![Page 386: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/386.jpg)
ZPar
Introduction
Usage
Development
On-going work
Contributions welcome
![Page 387: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/387.jpg)
Development
Add new implementation (dependency parsing as an example)
New folder under implementations
![Page 388: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/388.jpg)
Development
Add new implementation (dependency parsing as an example)
New folder under implementations
Modify necessary files
![Page 389: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/389.jpg)
Development
Add new implementation (dependency parsing as an example)
New folder under implementations
Modify necessary files
Modify the Makefile
# currently support eisner, covington, nivre, combined and joint implementations
CHINESE_DEPPARSER_IMPL = newmethod CHINESE_DEPPARSER_LABELED = false
CHINESE_DEPLABELER_IMPL = naive
# currently support sr implementations
CHINESE_CONPARSER_IMPL = jcad
# currently support only agenda
ENGLISH_TAGGER_IMPL = collins
# currently support eisner, covington, nivre, combined implementations
ENGLISH_DEPPARSER_IMPL = newmethod ENGLISH_DEPPARSER_LABELED = true
ENGLISH_DEPLABELER_IMPL = naive
# currently support sr implementations
ENGLISH_CONPARSER_IMPL = cad
![Page 390: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/390.jpg)
Development
Flexible—give your own Makefile for other tasks
![Page 391: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/391.jpg)
ZPar
Introduction
Usage
Development
On-going work
Contributions welcome
![Page 392: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/392.jpg)
On-going work
The release of ZPar 0.7 this year
New implementations
Deep learning POS-tagger (Ma et al., ACL 2014)
Character-based Chinese dependency parsing (Zhang et al., ACL 2014)
Non-projective parser with more optimizations
Double-stack and double-queue models for parsing heterogeneous dependencies (Zhang et al., COLING 2014)
![Page 393: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/393.jpg)
On-going work
The release of ZPar 0.7 this year
New implementations
The generic system will replace the Chinese system as the default version
![Page 394: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/394.jpg)
ZPar
Introduction
Usage
Development
On-going work
Contributions welcome
![Page 395: Incremental Structured Prediction Using a Global Learning](https://reader031.vdocuments.us/reader031/viewer/2022030300/621ec7e6284a0e3dfa5889dd/html5/thumbnails/395.jpg)
Contributions welcome
Open source contributions
User interfaces
Tokenizer html, ….
Optimizations
Reduced memory usage
Parallel versions
Microsoft windows versions