gf2ud and ud2gfschool.grammaticalframework.org › 2017 › slides › prasanth-gf... ·...
Post on 03-Jul-2020
4 Views
Preview:
TRANSCRIPT
GF2UD and UD2GFUD: Universal Dependencies
Prasanth Kolachina GF Summer school, 2017
the black cat sees us today
le chat noir nous voit aujourd’hui
dependency parser
ud2gf
GF
gf2ud
Universal Dependencies
Principles of Design● UD needs to be satisfactory on linguistic analysis grounds for individual
languages.● UD needs to be good for linguistic typology, i.e., providing a suitable basis for
bringing out cross-linguistic parallelism across languages and language families.
● UD must be suitable for rapid, consistent annotation by a human annotator.● UD must be suitable for computer parsing with high accuracy.● UD must be easily comprehended and used by a non-linguist …. (API
grammar)● UD must support well downstream language understanding tasks (relation
extraction, reading comprehension, machine translation, ...).
Mission of Grammatical FrameworkThe mission of GF is to formalize
the grammars of the world
and
make them available for computer applications.
A community-driven effort to annotate multilingual treebanks
Cross-lingual consistency in annotations across languages
17 Part-of-Speech tags ; 40 dependency labels ; morphological features
Annotated corpora released every 6 months;
Ongoing V2
50 Languages, 70 Treebanks
Universal Dependencies
Predication
nsubj csubj
dobj iobj ccomp xcomp
advmod nmodadvcl mark cop
det nummodamod apposneg nmod
acl case
conj cc punct
root dep
list dislocated parataxis remnant reparandum
Other
Clausal Predicates
Noun dependents
Coordination
Unknowns
Adverbials
nsubjpass csubjpass
auxpass
Passive voice
Auxiliary verbs and negation
aux neg
Copulas and special marker
compound mwe name
Compounding
nsubj
dobj iobj
cop
det amod
Clausal Predicates
Noun dependents
Auxiliary verbs and negation
aux neg
Copulas
Structures in GF
the black cat sees us
Rationaledependencies GF
parsing robustness robust brittle
parsing speed fast slow
semantics loose compositional
generation ? accurate
the black cat sees us today
le chat noir nous voit aujourd’hui
dependency parser
ud2gf
GF
gf2ud
the black cat sees us today
le chat noir nous voit aujourd’hui
dependency parser
ud2gf
GF
sem∃!A.(cat(A) & MODIFIER(black,A)& (∃ B.(see(B) & SUBJECT(B)=A & OBJECT(B) = we & MODIFIER(today,B))))
GF2UD grammatical roles to arguments and hide functions
Dependency configurationPredVP nsubj headComplTV head dobjDetCN det headAdjCN amod head
nsubj
dobjdet
amod
Dependency configurationPredVP nsubj headComplTV head dobjDetCN det headAdjCN amod head
nsubj
dobjdet
amod
nsubj
dobjdet
amod
nsubj
dobjdet
amod
nsubj
dobjdet
amod
nsubj
dobjdet
amod
nsubj
dobjdet
amod
the
black cat
sees
us
POS configuration
Det DETAP ADJCN NOUNTV VERBPron PRON
nsubj
dobjdet
amod
the
black cat
sees
us
le chat noir nous voit
Syncategorematic words
- pinpointing a difference in the ways of thinking: - dependency grammar is about words, - GF is about meanings
categorematic: word with its own category and function
fun cat_CN : CNlin cat_CN = “cat”
syncategorematic: word that is “between categories”
fun ComplAP : AP -> VPlin ComplAP ap = “is” ++ AP
No semantics (fun) of its own. Not an argument. No label.
adding default labels
we get UD wants
Other syncategorematic words
- negation words- tense auxiliaries- infinitive marks- (sometimes) prepositions
Extended dependency configurationabstract local
abstract | concrete local | nonlocal
- more complicated, not universal+ less work than rewriting the grammar anyway+ UD is still undergoing changes
Concrete configs
UseComp in English
UseComp head {“is”, “was”, “be”, “are”} cop head
In Swedish
UseComp head {“ar”, “var”, “vara”, “varit”} cop head
Local Concrete configurationsMappings defined on linearization of an abstract function for a specific language
These are necessary because of the ``level of abstraction’’ in GF abstract syntax
The mappings specify re-labelling operations
relabel an existing edge with new label
modify an existing edge by changing the head and adding a new label
These operations match a set of words, or a record field or match anything
Demo ?
> parse “the cat sees us” | visual_dep -output=conll -file=ud.labels
1 the the_Det DET Det _ 2 det _ _2 cat cat_CN NOUN CN _ 3 nsubj _ _3 sees see_TV VERB TV _ 0 dep _ _4 us we_Pron PRON Pron _ 3 dobj _ _
UD2GF
1 the the DET _ 3 det2 black black ADJ _ 3 amod3 cat cat NOUN _ 4 nsubj4 sees see VERB _ 0 root5 us we PRON _ 4 dobj6 today today ADV _ 4 advmod
1 the the DET _ 3 det2 black black ADJ _ 3 amod3 cat cat NOUN _ 4 nsubj4 sees see VERB _ 0 root5 us we PRON _ 4 dobj6 today today ADV _ 4 advmod
tree
root see VERB _ 4 nsubj cat NOUN _ 3
det the DET _ 1
amod black ADJ _ 2
dobj we PRON _ 5
advmod today ADV _ 6
1 the the DET _ 3 det2 black black ADJ _ 3 amod3 cat cat NOUN _ 4 nsubj4 sees see VERB _ 0 root5 us we PRON _ 4 dobj6 today today ADV _ 4 advmod
tree
root see VERB _ 4 nsubj cat NOUN _ 3
det the DET _ 1
amod black ADJ _ 2
dobj we PRON _ 5
advmod today ADV _ 6
lexicon
see_V2 “see”
cat_N “cat”
the_Det “the”
black_A “black”
we_Pron “we”
today_Adv “today”
1 the the DET _ 3 det2 black black ADJ _ 3 amod3 cat cat NOUN _ 4 nsubj4 sees see VERB _ 0 root5 us we PRON _ 4 dobj6 today today ADV _ 4 advmod
tree
root see VERB _ 4 nsubj cat NOUN _ 3
det the DET _ 1
amod black ADJ _ 2
dobj we PRON _ 5
advmod today ADV _ 6
lexically annotated tree
root see_V2 V2 4 nsubj cat_N N 3
det the_Det Det 1
amod black_A A 2
dobj we_Pron Pron 5
advmod today_Adv Adv 6
lexicon
see_V2 “see”
cat_N “cat”
the_Det “the”
black_A “black”
we_Pron “we”
today_Adv “today”
tree
root see_V2 V2 4 nsubj cat_N N 3
det the_Det Det 1
amod black_A A 2
dobj we_Pron Pron 5
advmod today_Adv Adv 6
Postorder traversal: subtrees before their head
Invariant: every node has a valid GF tree
Goal: total GF tree at root
A node is done when no more functions apply
tree
root see_V2 V2 4 nsubj cat_N N 3
det the_Det Det 1
amod black_A A 2
dobj we_Pron Pron 5
advmod today_Adv Adv 6
tree
root see_V2 V2 4 nsubj (UseN 3) [cat_N] CN 3
det the_Det Det 1
amod (PositA 2) [black_A] AP 2
dobj (UsePron 5) [we_Pron] NP 5
advmod today_Adv Adv 6
endo
ModCN 2 3
tree
root see_V2 V2 4 nsubj (UseN 3) [cat_N] CN 3
det the_Det Det 1
amod (PositA 2) [black_A] AP 2
dobj (UsePron 5) [we_Pron] NP 5
advmod today_Adv Adv 6
when an endocentric function applies, use it first
exo
DetCN 1 3
tree
root see_V2 V2 4 nsubj (ModCN 2 3) [(UseN 3),cat_N] CN 3
det the_Det Det 1
amod (PositA 2) [black_A] AP 2
dobj (UsePron 5) [we_Pron] NP 5
advmod today_Adv Adv 6
exo
DetCN 1 3
tree
root see_V2 V2 4 nsubj (ModCN 2 3) [(UseN 3),cat_N] CN 3
det the_Det Det 1
amod (PositA 2) [black_A] AP 2
dobj (UsePron 5) [we_Pron] NP 5
advmod today_Adv Adv 6
tree
root see_V2 V2 4 nsubj (DetCN 1 3) [(ModCN 2 3),(UseN 3),cat_N] NP 3
det the_Det Det 1
amod (PositA 2) [black_A] AP 2
dobj (UsePron 5) [we_Pron] NP 5
advmod today_Adv Adv 6
tree
root (PredVP 3 4) [(AdvVP 4 6),(ComplV2 4 5),see_V2] VP 4 nsubj (DetCN 1 3) [(ModCN 2 3),(UseN 3),cat_N] NP 3
det the_Det Det 1
amod (PositA 2) [black_A] AP 2
dobj (UsePron 5) [we_Pron] NP 5
advmod today_Adv Adv 6
Root node contains a complete GF tree
Problems
Ambiguity
There can be several candidate Functions and Categories.
Incompleteness
The tree may have nodes not referenced from the AST.
Problems and solutions
Ambiguity
There can be several candidate Functions and Categories.
Maintain a list of trees at each node, not just one tree.
Incompleteness
The tree may have nodes not referenced from the AST.
Auxiliary rules for syntcategorematic words.
Backup functions attached as adverbial modifiers to AST nodes.
STRING: Fast and friendly service , they know my order when I walk in the door !
root NOUN service_N : N [] {} (4) 4 amod ADJ fast_A : A [] {} (1) 1 cc CONJ "and" : Conjand_ [and_Conj : Conj] {} (2) 2 conj ADJ friendly_A : A [] {} (3) 3 punct PUNCT "," : Comma_ [] {} (5) 5 parataxis VERB know_VQ : VQ [know_VS : VS, know_V2 : V2, know_V : V] {} (7) 7 nsubj PRON they_Pron : Pron [theyFem_Pron : Pron] {} (6) 6 dobj NOUN order_N : N [] {} (9) 9 nmod:poss PRON i_Pron : Pron [] {} (8) 8 advcl VERB walk_V2 : V2 [walk_V : V] {} (12) 12 mark ADV when_Subj : Subj [when_IAdv : IAdv] {} (10) 10 nsubj PRON i_Pron : Pron [iFem_Pron : Pron] {} (11) 11 nmod NOUN door_N : N [] {} (15) 15 case ADP in_Prep : Prep [] {} (13) 13 det DET DefArt : Quant [] {} (14) 14 punct PUNCT StringPN "!" : PN [StringPunct "!" : Punct] {} (16) 16
Eng: fast and friendly service "!" [ they know my order when I walk in the door ]Fin: nopea ja ystävällinen palvelu "!" [ he tuntevat minun järjestykseni kun minä kävelen ovessa ]Swe: snabb och vänlig tjänst "!" [ de känner min ordning när jag går i dörren ]
PARSED: 16/16 WITHOUT BACKUP: 6/16
PARSER OUTPUT IN CONLL FORMAT:
1 if if SCONJ SCONJ _ 4 mark _ _2 a a DET DET Definite=Ind|PronType=Art 3 det _ _3 man man NOUN NOUN Number=Sing 4 nsubj _ _4 owns own VERB VERB Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 8 advcl _ _5 a a DET DET Definite=Ind|PronType=Art 6 det _ _6 donkey donkey NOUN NOUN Number=Sing 4 dobj _ _7 it it PRON PRON Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs 8 nsubj _ _8 beats beat VERB VERB Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 0 root _ _9 he he PRON PRON Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs 8 dobj _ _
Experiments
Analysing and converting UD treebanks- English, Finnish, Swedish
Connecting GF generation to UD parser front-end
Results
UD_English/en-ud-test.conllu
UD_Finnish/fi-ud-test.conllu
UD_Swedish/sv-ud-test.conllu
language #trees #confs %cov’d %int’d
English 2077 45 98 75
Finnish 648 20 91 60
Finnish* 648 0 81 59
Swedish 1219 35 94 68
Swedish* 1219 0 84 63
Demo and End-user Applications
> parse “the cat sees us” | visual_dep -output=conll -file=ud.labels
1 the the_Det DET Det _ 2 det _ _2 cat cat_CN NOUN CN _ 3 nsubj _ _3 sees see_TV VERB TV _ 0 root _ _4 us we_Pron PRON Pron _ 3 dobj _ _
GF2UD
UD2GF
$ ud2gf
-lEng -t10000 -k3000 -a1 -g1 -Dscamifgtn
-CUDTranslate.labels,UDTranslateEng.labels
treebanks/UD_English/en-ud-test.conllu
https://github.com/GrammaticalFramework/gf-contrib/tree/master/ud2gf
UD pipelinesSyntaxNet : Google’s parser -- not lemmatizer
Stanford CORENLP -- no morphological analysis
UDPipe
Inhouse Graph-based Parsing pipeline
Questions ?
Non-local Abstract mappingsExpression patterns correspond to sub-trees or multi-level rules in GF Abstract Syntax
Have higher precedence that the corresponding local rule for the top function
But, we could get rid of these non-local mappings by re-engineering the RGL Abstract Syntax quite easily
Could result in a increase of grammar size
(PredVP ? (PassV2 ?)) nsubjpass head
PredVP nsubj head
(PredSCVP ? (PassV2 ?)) csubjpass head
PredSCVP csubj head
Non-local Abstract mappings
Some sources of non-universal configurationsComplV2 : V2 -> NP -> VP head (dobj | iobj | nmod)
ComplVV : VV -> VP -> VP aux head | head xcomp
| head mark xcomp
ExistNP : NP -> Cl “there is”, “det finns”,...
Syncategoramatic words introduced by non-local rule
The same expression patterns from Non-local Abstract rules are used, to specify relabelling operations
(UseCl ? PNeg ?) head {“not”, “n’t”} neg head
(UseCl ? ? ?) head {*} aux head
Auxiliaries for passive voice constructions?
(UseCl ? ? (PredVP ? (PassV2 ?))) auxpass head
Non-Local Concrete mappings
top related