gf2ud and ud2gfschool.grammaticalframework.org › 2017 › slides › prasanth-gf... ·...

Post on 03-Jul-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

GF2UD and UD2GFUD: Universal Dependencies

Prasanth Kolachina GF Summer school, 2017

the black cat sees us today

le chat noir nous voit aujourd’hui

dependency parser

ud2gf

GF

gf2ud

Universal Dependencies

Principles of Design● UD needs to be satisfactory on linguistic analysis grounds for individual

languages.● UD needs to be good for linguistic typology, i.e., providing a suitable basis for

bringing out cross-linguistic parallelism across languages and language families.

● UD must be suitable for rapid, consistent annotation by a human annotator.● UD must be suitable for computer parsing with high accuracy.● UD must be easily comprehended and used by a non-linguist …. (API

grammar)● UD must support well downstream language understanding tasks (relation

extraction, reading comprehension, machine translation, ...).

Mission of Grammatical FrameworkThe mission of GF is to formalize

the grammars of the world

and

make them available for computer applications.

A community-driven effort to annotate multilingual treebanks

Cross-lingual consistency in annotations across languages

17 Part-of-Speech tags ; 40 dependency labels ; morphological features

Annotated corpora released every 6 months;

Ongoing V2

50 Languages, 70 Treebanks

Universal Dependencies

Predication

nsubj csubj

dobj iobj ccomp xcomp

advmod nmodadvcl mark cop

det nummodamod apposneg nmod

acl case

conj cc punct

root dep

list dislocated parataxis remnant reparandum

Other

Clausal Predicates

Noun dependents

Coordination

Unknowns

Adverbials

nsubjpass csubjpass

auxpass

Passive voice

Auxiliary verbs and negation

aux neg

Copulas and special marker

compound mwe name

Compounding

nsubj

dobj iobj

cop

det amod

Clausal Predicates

Noun dependents

Auxiliary verbs and negation

aux neg

Copulas

Structures in GF

the black cat sees us

Rationaledependencies GF

parsing robustness robust brittle

parsing speed fast slow

semantics loose compositional

generation ? accurate

the black cat sees us today

le chat noir nous voit aujourd’hui

dependency parser

ud2gf

GF

gf2ud

the black cat sees us today

le chat noir nous voit aujourd’hui

dependency parser

ud2gf

GF

sem∃!A.(cat(A) & MODIFIER(black,A)& (∃ B.(see(B) & SUBJECT(B)=A & OBJECT(B) = we & MODIFIER(today,B))))

GF2UD grammatical roles to arguments and hide functions

Dependency configurationPredVP nsubj headComplTV head dobjDetCN det headAdjCN amod head

nsubj

dobjdet

amod

Dependency configurationPredVP nsubj headComplTV head dobjDetCN det headAdjCN amod head

nsubj

dobjdet

amod

nsubj

dobjdet

amod

nsubj

dobjdet

amod

nsubj

dobjdet

amod

nsubj

dobjdet

amod

nsubj

dobjdet

amod

the

black cat

sees

us

POS configuration

Det DETAP ADJCN NOUNTV VERBPron PRON

nsubj

dobjdet

amod

the

black cat

sees

us

le chat noir nous voit

Syncategorematic words

- pinpointing a difference in the ways of thinking: - dependency grammar is about words, - GF is about meanings

categorematic: word with its own category and function

fun cat_CN : CNlin cat_CN = “cat”

syncategorematic: word that is “between categories”

fun ComplAP : AP -> VPlin ComplAP ap = “is” ++ AP

No semantics (fun) of its own. Not an argument. No label.

adding default labels

we get UD wants

Other syncategorematic words

- negation words- tense auxiliaries- infinitive marks- (sometimes) prepositions

Extended dependency configurationabstract local

abstract | concrete local | nonlocal

- more complicated, not universal+ less work than rewriting the grammar anyway+ UD is still undergoing changes

Concrete configs

UseComp in English

UseComp head {“is”, “was”, “be”, “are”} cop head

In Swedish

UseComp head {“ar”, “var”, “vara”, “varit”} cop head

Local Concrete configurationsMappings defined on linearization of an abstract function for a specific language

These are necessary because of the ``level of abstraction’’ in GF abstract syntax

The mappings specify re-labelling operations

relabel an existing edge with new label

modify an existing edge by changing the head and adding a new label

These operations match a set of words, or a record field or match anything

Demo ?

> parse “the cat sees us” | visual_dep -output=conll -file=ud.labels

1 the the_Det DET Det _ 2 det _ _2 cat cat_CN NOUN CN _ 3 nsubj _ _3 sees see_TV VERB TV _ 0 dep _ _4 us we_Pron PRON Pron _ 3 dobj _ _

UD2GF

1 the the DET _ 3 det2 black black ADJ _ 3 amod3 cat cat NOUN _ 4 nsubj4 sees see VERB _ 0 root5 us we PRON _ 4 dobj6 today today ADV _ 4 advmod

1 the the DET _ 3 det2 black black ADJ _ 3 amod3 cat cat NOUN _ 4 nsubj4 sees see VERB _ 0 root5 us we PRON _ 4 dobj6 today today ADV _ 4 advmod

tree

root see VERB _ 4 nsubj cat NOUN _ 3

det the DET _ 1

amod black ADJ _ 2

dobj we PRON _ 5

advmod today ADV _ 6

1 the the DET _ 3 det2 black black ADJ _ 3 amod3 cat cat NOUN _ 4 nsubj4 sees see VERB _ 0 root5 us we PRON _ 4 dobj6 today today ADV _ 4 advmod

tree

root see VERB _ 4 nsubj cat NOUN _ 3

det the DET _ 1

amod black ADJ _ 2

dobj we PRON _ 5

advmod today ADV _ 6

lexicon

see_V2 “see”

cat_N “cat”

the_Det “the”

black_A “black”

we_Pron “we”

today_Adv “today”

1 the the DET _ 3 det2 black black ADJ _ 3 amod3 cat cat NOUN _ 4 nsubj4 sees see VERB _ 0 root5 us we PRON _ 4 dobj6 today today ADV _ 4 advmod

tree

root see VERB _ 4 nsubj cat NOUN _ 3

det the DET _ 1

amod black ADJ _ 2

dobj we PRON _ 5

advmod today ADV _ 6

lexically annotated tree

root see_V2 V2 4 nsubj cat_N N 3

det the_Det Det 1

amod black_A A 2

dobj we_Pron Pron 5

advmod today_Adv Adv 6

lexicon

see_V2 “see”

cat_N “cat”

the_Det “the”

black_A “black”

we_Pron “we”

today_Adv “today”

tree

root see_V2 V2 4 nsubj cat_N N 3

det the_Det Det 1

amod black_A A 2

dobj we_Pron Pron 5

advmod today_Adv Adv 6

Postorder traversal: subtrees before their head

Invariant: every node has a valid GF tree

Goal: total GF tree at root

A node is done when no more functions apply

tree

root see_V2 V2 4 nsubj cat_N N 3

det the_Det Det 1

amod black_A A 2

dobj we_Pron Pron 5

advmod today_Adv Adv 6

tree

root see_V2 V2 4 nsubj (UseN 3) [cat_N] CN 3

det the_Det Det 1

amod (PositA 2) [black_A] AP 2

dobj (UsePron 5) [we_Pron] NP 5

advmod today_Adv Adv 6

endo

ModCN 2 3

tree

root see_V2 V2 4 nsubj (UseN 3) [cat_N] CN 3

det the_Det Det 1

amod (PositA 2) [black_A] AP 2

dobj (UsePron 5) [we_Pron] NP 5

advmod today_Adv Adv 6

when an endocentric function applies, use it first

exo

DetCN 1 3

tree

root see_V2 V2 4 nsubj (ModCN 2 3) [(UseN 3),cat_N] CN 3

det the_Det Det 1

amod (PositA 2) [black_A] AP 2

dobj (UsePron 5) [we_Pron] NP 5

advmod today_Adv Adv 6

exo

DetCN 1 3

tree

root see_V2 V2 4 nsubj (ModCN 2 3) [(UseN 3),cat_N] CN 3

det the_Det Det 1

amod (PositA 2) [black_A] AP 2

dobj (UsePron 5) [we_Pron] NP 5

advmod today_Adv Adv 6

tree

root see_V2 V2 4 nsubj (DetCN 1 3) [(ModCN 2 3),(UseN 3),cat_N] NP 3

det the_Det Det 1

amod (PositA 2) [black_A] AP 2

dobj (UsePron 5) [we_Pron] NP 5

advmod today_Adv Adv 6

tree

root (PredVP 3 4) [(AdvVP 4 6),(ComplV2 4 5),see_V2] VP 4 nsubj (DetCN 1 3) [(ModCN 2 3),(UseN 3),cat_N] NP 3

det the_Det Det 1

amod (PositA 2) [black_A] AP 2

dobj (UsePron 5) [we_Pron] NP 5

advmod today_Adv Adv 6

Root node contains a complete GF tree

Problems

Ambiguity

There can be several candidate Functions and Categories.

Incompleteness

The tree may have nodes not referenced from the AST.

Problems and solutions

Ambiguity

There can be several candidate Functions and Categories.

Maintain a list of trees at each node, not just one tree.

Incompleteness

The tree may have nodes not referenced from the AST.

Auxiliary rules for syntcategorematic words.

Backup functions attached as adverbial modifiers to AST nodes.

STRING: Fast and friendly service , they know my order when I walk in the door !

root NOUN service_N : N [] {} (4) 4 amod ADJ fast_A : A [] {} (1) 1 cc CONJ "and" : Conjand_ [and_Conj : Conj] {} (2) 2 conj ADJ friendly_A : A [] {} (3) 3 punct PUNCT "," : Comma_ [] {} (5) 5 parataxis VERB know_VQ : VQ [know_VS : VS, know_V2 : V2, know_V : V] {} (7) 7 nsubj PRON they_Pron : Pron [theyFem_Pron : Pron] {} (6) 6 dobj NOUN order_N : N [] {} (9) 9 nmod:poss PRON i_Pron : Pron [] {} (8) 8 advcl VERB walk_V2 : V2 [walk_V : V] {} (12) 12 mark ADV when_Subj : Subj [when_IAdv : IAdv] {} (10) 10 nsubj PRON i_Pron : Pron [iFem_Pron : Pron] {} (11) 11 nmod NOUN door_N : N [] {} (15) 15 case ADP in_Prep : Prep [] {} (13) 13 det DET DefArt : Quant [] {} (14) 14 punct PUNCT StringPN "!" : PN [StringPunct "!" : Punct] {} (16) 16

Eng: fast and friendly service "!" [ they know my order when I walk in the door ]Fin: nopea ja ystävällinen palvelu "!" [ he tuntevat minun järjestykseni kun minä kävelen ovessa ]Swe: snabb och vänlig tjänst "!" [ de känner min ordning när jag går i dörren ]

PARSED: 16/16 WITHOUT BACKUP: 6/16

PARSER OUTPUT IN CONLL FORMAT:

1 if if SCONJ SCONJ _ 4 mark _ _2 a a DET DET Definite=Ind|PronType=Art 3 det _ _3 man man NOUN NOUN Number=Sing 4 nsubj _ _4 owns own VERB VERB Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 8 advcl _ _5 a a DET DET Definite=Ind|PronType=Art 6 det _ _6 donkey donkey NOUN NOUN Number=Sing 4 dobj _ _7 it it PRON PRON Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs 8 nsubj _ _8 beats beat VERB VERB Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 0 root _ _9 he he PRON PRON Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs 8 dobj _ _

Experiments

Analysing and converting UD treebanks- English, Finnish, Swedish

Connecting GF generation to UD parser front-end

Results

UD_English/en-ud-test.conllu

UD_Finnish/fi-ud-test.conllu

UD_Swedish/sv-ud-test.conllu

language #trees #confs %cov’d %int’d

English 2077 45 98 75

Finnish 648 20 91 60

Finnish* 648 0 81 59

Swedish 1219 35 94 68

Swedish* 1219 0 84 63

Demo and End-user Applications

> parse “the cat sees us” | visual_dep -output=conll -file=ud.labels

1 the the_Det DET Det _ 2 det _ _2 cat cat_CN NOUN CN _ 3 nsubj _ _3 sees see_TV VERB TV _ 0 root _ _4 us we_Pron PRON Pron _ 3 dobj _ _

GF2UD

UD2GF

$ ud2gf

-lEng -t10000 -k3000 -a1 -g1 -Dscamifgtn

-CUDTranslate.labels,UDTranslateEng.labels

treebanks/UD_English/en-ud-test.conllu

https://github.com/GrammaticalFramework/gf-contrib/tree/master/ud2gf

UD pipelinesSyntaxNet : Google’s parser -- not lemmatizer

Stanford CORENLP -- no morphological analysis

UDPipe

Inhouse Graph-based Parsing pipeline

Questions ?

Non-local Abstract mappingsExpression patterns correspond to sub-trees or multi-level rules in GF Abstract Syntax

Have higher precedence that the corresponding local rule for the top function

But, we could get rid of these non-local mappings by re-engineering the RGL Abstract Syntax quite easily

Could result in a increase of grammar size

(PredVP ? (PassV2 ?)) nsubjpass head

PredVP nsubj head

(PredSCVP ? (PassV2 ?)) csubjpass head

PredSCVP csubj head

Non-local Abstract mappings

Some sources of non-universal configurationsComplV2 : V2 -> NP -> VP head (dobj | iobj | nmod)

ComplVV : VV -> VP -> VP aux head | head xcomp

| head mark xcomp

ExistNP : NP -> Cl “there is”, “det finns”,...

Syncategoramatic words introduced by non-local rule

The same expression patterns from Non-local Abstract rules are used, to specify relabelling operations

(UseCl ? PNeg ?) head {“not”, “n’t”} neg head

(UseCl ? ? ?) head {*} aux head

Auxiliaries for passive voice constructions?

(UseCl ? ? (PredVP ? (PassV2 ?))) auxpass head

Non-Local Concrete mappings

top related