prague dependency treebank 1.0 functional generative description
TRANSCRIPT
![Page 1: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/1.jpg)
Prague Dependency Treebank 1.0
Functional Generative Description
![Page 2: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/2.jpg)
Functional Generative Description
theoretical framework based on the findings of European structural linguistics, esp. of the classical Prague School
methodological requirements of a formal description levels:
tectogrammatical (underlying) representations (TRs) with dependency based syntax
morphemics phonemics and phonetics
TRs (see Sgall, Hajičová and Panevová 1986, formally specified by Petkevič, also in a declarative way)
![Page 3: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/3.jpg)
The Language Layers
Phonemic, Morphonological, Morphemic, Analytical (surface syntax) Tectogrammatical (deep syntax).
![Page 4: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/4.jpg)
Dependency tree
My younger brother arrived there yesterday.
Linearized form, one-to-one relation:((I)Appurt (younger)Rstr brother)Act arrive.Pret.Indic (Dir there) (Temp yesterday)
![Page 5: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/5.jpg)
Dependency Tree
labels - lexical meanings (abstract symbols) with indices functors
subscripts at parentheses oriented towards head grammatemes - values of morphological categories
Tense, Modality, Number, Definiteness, etc. projectivity valency
arguments (inner participants) and adjuncts (circumstantials or 'free modifications')
obligatory and optional with a given head, deletable or not
![Page 6: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/6.jpg)
Dependency Tree
Arguments/participants of verbs Actor/Bearer
(underlying subject) Objective (Patient,
underlying direct object) Addressee
(underlying indirect object) Effect ('second' object: to
choose so. as sth.) Origin
(to make sth. out of sth.)
Adjuncts Locative, several
Directional and Temporal modifications
Condition, Means, Manner, etc.
![Page 7: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/7.jpg)
Dependency Tree
Arguments (inner participants) Material (Partitive)
two baskets of sth. Identity
the river Danube; the notion of operator
Adjuncts (free modifications) Possession
(Appurtenance)
my table; Jim's brother Restrictive
rich man Descriptive
the Swedes, who are a Scandinavian nation
Complementations dependent mainly on nouns
![Page 8: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/8.jpg)
Dependency Tree
syntactic grammatemes Loc, Dir - in, on, under, between... Regard - with, without
operational (testable) criteria for distinguishing
arguments from adjuncts, from each other
deletability (dialogue test)
![Page 9: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/9.jpg)
Simplified valency frames
read V Act Addr Obj
change V Act Obj Orig Eff
give V Act Addr Obj
brother N Appurt
man N
glass N Material
full A Material
obligatory complementations in blue
![Page 10: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/10.jpg)
Topic-focus articulation
contextual boundness main verb CB/NB (T/F) dependents to the left/right
communicative dynamism left-right (mother, sisters,
transitive) partial ordering
underlying word order left-right linear ordering
left-to-right order of nodes together with the index T or (prototypically) F indicates the TFA of the sentence (of the TR)
young
there
T
![Page 11: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/11.jpg)
Topic-focus articulation
TFA - one of the basic aspects of underlying structures
young
there
T
yesterday
F
![Page 12: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/12.jpg)
Complex sentence
a subordinated (dependent) clause (i.e. its main verb) depends on a word contained in its governing clause
My brother, whom you know, arrived there yesterday.
![Page 13: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/13.jpg)
Complex sentence
function words (synsemantic) are viewed as function morphemes, syntactically fixed to certain lexical (autosemantic) words - prepositions and articles to nouns, conjunctions and auxiliaries to verbs
Martin came there late, since he had to accompany his sick mother.
![Page 14: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/14.jpg)
Complex sentence
Martin arrived late to the session, since he had to accompany his sick mother.
schematically (morphemes):
Martin arrive.ed late to the session since he have.ed to accompany he.s sick mother.
dot - close connection of morphemes ('semes')
![Page 15: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/15.jpg)
deleted items restored order of items - difference between 'underlying' and surface
(morphemic) word order transductive components - Panevová, Oliva, Borota
coordination (multidimensional) Jim and Mary, who have two children, went to Boston. the linearized notation is adequate: ((Jim Mary)Conj ((who)Act have (Pat (two)Rstr children)))Act
went (Dir Boston)
structures close to Boolean, i.e. no complex 'innate properties' specific for natural language are needed.
![Page 16: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/16.jpg)
Prague Dependency Treebank - corpus annotation an intermediate level - 'analytical'
representations dependency trees, not always projective nodes for all word tokens, even for punctuation
marks tectogrammmatical tree: coordinating
conjunction as the head
![Page 17: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/17.jpg)
Prague Dependency Treebank 1.0
Morphological Layer
![Page 18: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/18.jpg)
ANNOTATED CORPORA
PDT version 1.0, 2000
(1996 - 2000)
(currently) ver. 2
Penn Treebank, release 3, 1999
(1989 - 1999)
PropBank (currently)
![Page 19: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/19.jpg)
The Levels in PDT
Morphemic Analytical Tectogrammatical
![Page 20: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/20.jpg)
TAG SETs
Czech - ambiguous inflective language nový, nového, novému, novém, novým, nová, nové, novou, nových, novým, novými, … novější, novejšího, novějšímu, novějším, …., nejnovější, nejnovějšího, nejnovějšímu, nejnovějším….. nejnovějších, nejnovějším, …
English - language with poor inflectionwork, works, worked, working
![Page 21: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/21.jpg)
![Page 22: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/22.jpg)
TEXT SOURCES
Lidové noviny
Mladá Fronta Dnes
Vesmír
Českomoravský
Profit
...taken from Czech
National Corpus
´88, ´89 WSJ articles
Air Travel Information
System transcripts
Brown Corpus
Switchboard transcripts
![Page 23: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/23.jpg)
ANNOTATION STRATEGY - Penn Treebank
TEXT
Ken Church‘s stochastic tagger,
Eric Brill‘s transformation tagger
corrections by annotator (GNU Emacs Lisp based package)
![Page 24: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/24.jpg)
ANNOTATION STRATEGY - PDT
Automatic Morphological Analyzer (AMA)
two independent annotators; Linux, Win tools
differences resolved by third annotator
comparison with the current AMA; manual resolution; Win tools
![Page 25: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/25.jpg)
INTERNAL FORMAT
SGML coding, csts dtd word/tag(|tag)*
![Page 26: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/26.jpg)
<s id=“ln95040:020-p1s1“><f>Pokus<l>pokus<t>NNIS1-----A----<f>o<l>o<t>RR--4----------<f>zázrak<l>zázrak<t>NNIS4-----A----<d>.<l>.<t>Z:-------------
The/DT envelope/NN arrives/VBZ in/IN the/DT mail/NN ./.
SAMPLES
![Page 27: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/27.jpg)
SGML coding
SGML coding
word/tag
word/lemma/tag
CONVERSION
pdt2wsj.pl
pdt2wsjFLT.pl
![Page 28: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/28.jpg)
DATA SIZE
# wordtokens
# sentences
PDT 1.0 1 730K 112K
Penn Treebank
release 3
4 600K 350K
![Page 29: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/29.jpg)
DATA SETs of MORPHOLOGICALLY ANNOTATED DATAfor tagging only #tokens/sentences
training data 1 470K/95K
development test data 130K/8K
evaluation test data 127K/8K
for parsing (preprocessing step)
training data 475K/29K
development test data 130K/8K
evaluation test data 127K/8K
![Page 30: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/30.jpg)
TOOLS
Automatic Morphological Analyser/Generator of Czech HMAnalyze.pl,
HMGenerate.pl Dictionary: CZE_a Remote Access
Czech Taggers
HMM
Exponential
![Page 31: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/31.jpg)
Prague Dependency Treebank 1.0
Analytical Layer in PDT
![Page 32: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/32.jpg)
Introduction
Input: morphologically tagged sentences
Graph Editor: “user-friendly” software
Output: ATS structure „surface“ syntax tree structure nodes labelled by the analytical functions
![Page 33: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/33.jpg)
Analytical Functions Pred - Predicate if it depends on the tree root Sb - Subject Obj - Object Adv - Adverbial Atv - Complement AtvV - Complement, if one governor is present Atr - Attribute Pnom - Nominal predicate‘s nominal part, depends on the
copula „to be“ AuxV - Auxiliary verb „to be“ Coord - Coordination node Apos - Apposition node AuxR - Reflexive particle, which is neither Obj nor AuxT
(passive) AuxT - Reflexive particle, lexically bound to the verb
![Page 34: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/34.jpg)
Analytical Functions AuxP - Preposition or a part of compound preposition AuxC - Subordinate conjunction AuxO - (Superfluously) referring particle or emotional particle AuxZ - Rhematizer or another node acting to another
constituent AuxX - Comma, but not the main coordinating comma AuxG - Other graphical symbols being not classified as AuxK AuxY - Other words, such as particles without a specific
syntactic function, parts of lexical idioms, etc. AuxS - Sentence holder (the only added root to the tree) AuxK - Punctuation at the end of the sentence
or direct speech or citation clause ExD - Ellipsis handling: functions for nodes which pseudo
depend on a node on which the would not depend if there were no ellipsis
AtrAtr, AtrAdv, AdvAtr, AtrObj, ObjAtr + *_Co, *_Pa, *_Ap
![Page 35: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/35.jpg)
Two stages (chronologically)
(A) manual „analytic“ annotation (ATS) training data for (B)(a)
(B) (a) semiautomatic procedure (Collin‘s parser) (b) manual correcting of (B)(a)
![Page 36: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/36.jpg)
Constraints and limitations
any string has a node of its own word-form, punctuation mark, etc. AuxV, AuxP, AuxC, AuxX, AuxG…
reflecting the coordination and apposition relations so called third dimension of the graph in the plain tree
(X_Co, X_Ap, X_Pa, where X is one of analytic functions, such as Sb, Obj, Adv, etc.)
![Page 37: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/37.jpg)
Constraints and limitations
no missing nodes (on the surface) can be added analytic funtion Ex_D is used
relations between semi-automatic and manual procedure 80% edges are established correctly automatically
![Page 38: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/38.jpg)
Project organization
team consisting of 5-6 annotators handbook for ATS structure annotation 100000 sentences on ATS tectogrammatical annotation follows
![Page 39: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/39.jpg)
Projectivity/Nonprojectivity/Surface Order A(B, C)
B C
A
B C
A
CB
A
![Page 40: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/40.jpg)
Projectivity/Non-projectivity/Surface Order A(B( C ))
B
C
A
C
B
A
C
B
A
![Page 41: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/41.jpg)
Adv
AuxT
První restituční zákon českého parlamentu se do sněmovních lavic může vrátit jako bumerang.
![Page 42: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/42.jpg)
Prague Dependency Treebank 1.0
From the Analyticaltowards
the Tectogrammatical layer
![Page 43: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/43.jpg)
Introduction
ATS annotation nodes:
word forms punctuation graphical symbols
TGTS annotation autosemantic words deletions
edges: surface relations
deep layer functions
![Page 44: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/44.jpg)
Input Czech
sentence
Morphological tagging and lexical
disambiguation
TokenizationSyntactic parsing and analytic function
assignment
Tree structure pruning
Attribute assignments TGTS
ATS PDT1.0
Annotation process
![Page 45: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/45.jpg)
Transition procedure
deterministic procedure operating on trees macro language for Graph Editor (perl) automatic changes & tools for annotators
Requirements new attributes for tectogrammatical layer ATS is recoverable from TGTS automatized to a maximally high degree
![Page 46: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/46.jpg)
New attributes
trlemma - lemma of the original node or lemma composed of joined nodes
morphological grammatemes gender, number, degree of comparison, tense,gender, number, degree of comparison, tense, aspect, iterativeness, verbal modality, deontic aspect, iterativeness, verbal modality, deontic
modality, sentence modalitymodality, sentence modality
positionposition of the nodeof the node functor, topic-focus articulation, syntactic grammateme,functor, topic-focus articulation, syntactic grammateme, type of relation (dependency, coordination, apposition), type of relation (dependency, coordination, apposition), phraseme, deletion, quoted word, direct speech, phraseme, deletion, quoted word, direct speech, coreference, antecedentcoreference, antecedent
![Page 47: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/47.jpg)
Tree Structure Pruning
U toho, kdo začíná opravdu od nuly, není daňový výnos pro stát podstatný.
For those, who start actually at zero, the tax outcome for the state is not substantial.
![Page 48: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/48.jpg)
Tree Structure Pruning
U toho, kdo začíná opravdu od nuly, není daňový výnos pro stát podstatný.
For those, who start actually at zero, the tax outcome for the state is not substantial.
REG
![Page 49: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/49.jpg)
Verbal Nodes
•… enterpreneurs should have (their) taxes …
•… podnikatelé by měli mít daně …
PRED
verbmod=CDNdeontmod=HRT
![Page 50: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/50.jpg)
Attribute Assignments
prepositions stored as fw attribute quoted words
clause in quotes -> DSP one pair of quotes in the sentence -> DSPP string in quotes -> QUOT
gender, number, tense, degcmp, aspect default values
![Page 51: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/51.jpg)
Macros for Annotators
keyboard shortcuts (in Graph editor) structure changes
hide/recover nodes merge nodes
add new nodes functor assignments
![Page 52: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/52.jpg)
Manual annotation
structure checking functors deletions of obligatory modifications
feedback for formulating the handbook for annotators
![Page 53: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/53.jpg)
Prague Dependency Treebank 1.0
Tectogrammatical Layer
![Page 54: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/54.jpg)
![Page 55: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/55.jpg)
C T
T
T
T
T
F
FT
T
![Page 56: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/56.jpg)
Jirka se včera opil do němoty a Honza dneska. George himself yesterday drank to silence and Honza today.
![Page 57: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/57.jpg)
Attributes of Coreferrential relations only in MC
attribute valuescoref the lemma of the antecedentcorsnt NIL - in the same sentence
PREV1 ... PREVi - position of the sentence which includes the antecedent
grammatical coreferenceantec the functor of the antecedent
![Page 58: Prague Dependency Treebank 1.0 Functional Generative Description](https://reader036.vdocuments.us/reader036/viewer/2022081421/5697bf941a28abf838c8feb5/html5/thumbnails/58.jpg)
Example
Honza slíbil přijít včas.Honza promised to come in time.
coref: Honzacorsnt: NILcornum: 1antec: ACT