referential choice: factors and modeling andrej a. kibrik, mariya v. khudyakova, grigoriy b. dobrov,...
TRANSCRIPT
![Page 1: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/1.jpg)
1
REFERENTIAL CHOICE:
FACTORS AND MODELING
Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik
Night Whites SPbFebruary 28, 2014
![Page 2: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/2.jpg)
222
Referential choice in discourse
When a speaker needs to mention (or refer to) a specific, definite referent, s/he chooses between several options, including: Full noun phrase
• Proper name (e.g. Peter)• Description = common noun (with or without
modifiers) (e.g. the tzar)• Mix: Peter the Great
Reduced NP, particularly a third person pronoun (e.g. he)
![Page 3: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/3.jpg)
3
Example
The Victorian house that Ms. Johnson is inspecting has been deemed unsafe by town officials. But she asks a workman toting the bricks from the lawn to give her a boost through an open first-floor window. Once inside, she spends nearly four hours Ø measuring and diagramming each room in the 80-year-old house, Ø gathering enough information to Ø estimate what it would cost to rebuild it. She snaps photos of the buckled floors and the plaster that has fallen away from the walls.
Description Proper name Pronoun
Zero
![Page 4: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/4.jpg)
4
Research question
How is referential choice made?
![Page 5: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/5.jpg)
5
Why is this question important?
Reference is among the most basic cognitive operations performed by language users
Reference constitutes a lion’s share of all information in natural communication
Consider text manipulation according to the method of Biber et al. 1999: 230-232
![Page 6: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/6.jpg)
6
Referential expressions marked in green
The Victorian house that Ms. Johnson is inspecting has been deemed unsafe by town officials. But she asks a workman toting the bricks from the lawn to give her a boost through an open first-floor window.
![Page 7: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/7.jpg)
7
Referential expressions removed
The Victorian house that Ms. Johnson is inspecting has been deemed unsafe by town officials. But she asks a workman toting the bricks from the lawn
to give her a boost through an open first-floor window.
![Page 8: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/8.jpg)
8
Referential expressions kept
The Victorian house that Ms. Johnson is inspecting has been deemed unsafe by town officials. But she asks a workman toting the bricks from the lawn to give her a boost through an open first-floor window.
![Page 9: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/9.jpg)
9
Types of referential devices: levels of granularity
We mostly concentrate
on the two upper levels
in this hierarchy
◘╕
REG tradition:
most attention
to varieties of descriptive
full NPs
![Page 10: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/10.jpg)
101010
Multi-factorial character of referential choice
Multiple factors of referential choice Distance to antecedent
Along the linear discourse structure (Givón) Along the hierarchical discourse structure
(Fox, Kibrik)
Antecedent role (Centering theory) Referent animacy (Dahl) Protagonisthood (Grimes)
.........................................
Properties of the discourse context
Properties of the referent
![Page 11: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/11.jpg)
11
Cognitive multi-factorial model of referential choice
Discourse context
Referent activation in working memory
Referent’s properties
Referential choice
Factors of referential
choice
![Page 12: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/12.jpg)
12
Rhetorical distance
Distance along the hierarchical discourse structure between the current point in discourse, where referential choice
is to be made the antecedent
Measured in elementary discourse units roughly equaling clauses
Rhetorical structure theory by Mann and Thompson (RST)
Very important factor RST Discourse Treebank corpus (Marcu et al.)
![Page 13: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/13.jpg)
13
Example of a rhetorical graph from RST Discourse Treebank
![Page 14: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/14.jpg)
14
RefRhet and MoRA
RST Discourse Treebank + our annotation = RefRhet corpus Subcorpus RefRhet 3 (2013-2014)
Annotation scheme MoRA (Moscow Referential Annotation)
![Page 15: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/15.jpg)
15
RefRhet 3
64 texts6294 markables1852 anaphor-antecedent pairs
475 pronouns 1377 full NPs
•706 descriptions•671 proper names
![Page 16: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/16.jpg)
16
Candidate factors of ref. choice
Some values are drawn from
MoRA annotation
Some other are computed
automatically
Factor-predicted variable
╕◘
Discourse context
![Page 17: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/17.jpg)
17
Windows of the MMAX2 program
![Page 18: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/18.jpg)
18
Some properties of the MoRA scheme
Wide range of activation factors and their values E.g. multiple values of the “grammatical role”
factor
Annotation of groups complex markables serving as antecedents
• and-coordinate• or-coordinate• prepositional (children with their parents)• discontinuous
![Page 19: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/19.jpg)
19
A discontinuous group
![Page 20: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/20.jpg)
20
Tasks for machine learning
Candidate factors: All potential parameters implemented in corpus annotation
Factor-predicted variable: Form of referential expression (np_form)
Two-way task: Full NP vs. pronoun
Three-way task: Definite description vs. proper name vs. pronoun
Accuracy maximization: Ratio of correct predictions to the overall number of
instances
![Page 21: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/21.jpg)
212121
Machine learning methods (Weka, a data mining system)
Logical algorithms • Decision trees (C4.5)• Decision rules (JRip)
Logistic regressionCompositions
Boosting Bagging
Quality control – the cross-validation method
![Page 22: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/22.jpg)
22
Results of machine learning on RefRhet 3 and MoRA
Algorithm Accuracy two-way
Accuracy two-way
(2012)
Accuracy three-way
Baseline (frequency of the most common ref. option)
74,4% 74,4% 37,9%
Logistic regression 87,2% 71,3%
Decision tree algorithm
93,7% 86,1% 74,0%
Bagging 89,4% 88,0% 76,1%
Boosting 89,5% 86,2% 74,0%
![Page 23: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/23.jpg)
23
Non-categorical referential choice (Kibrik 1999)
min Referent activation max
Cognitive plane: graded variable
Linguistic plane: binary variable
full NPPeter
pronounhe
![Page 24: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/24.jpg)
24
Non-categorical referential choice
In many instances, more than one referential options can be used
Referential choice is less than fully categorical (cf. Belz & Varges 2007, van Deemter et al. 2012: 173–179)
In the intermediate activation instances both the original text author and the algorithm: more or less randomly make a categorical decision at
the linguistic plane those decisions do not have to always coincide
Therefore, no model can predict the actual referential choice with 100% accuracy
![Page 25: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/25.jpg)
25
Experiment: Understanding (allegedly non-categorical) referential expressions
9 texts, in which the algorithms have diverged in their prediction from the original referential choice
9 original texts (proper name) and 9 altered texts (pronoun) distributed between 2 experimental lists
60 participants 1 experimental question + 2 control question If the instances of divergence are explained by
intermediate referent activation, the accuracy in experimental questions should not be lower than the accuracy in control questions
25
![Page 26: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/26.jpg)
26
Control questions – 84% Questions to proper names – 84% Questions to pronouns – 75% If we exclude questions #2 and #5, then the accuracy for
questions to pronouns is 80%, not differing significantly from control and PN questions
In general, the algorithm diverges from the original in the places where that is acceptable, that is, referent activation is intermediate
Experiment: results
26
![Page 27: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/27.jpg)
27
Non-categorical referential choice
Sometimes referential choice allows more than one option
A proper model of referential choice must account for this property of human speakers
Our modeling procedures actually conform to this requirement
![Page 28: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/28.jpg)
28
Further studies
Explore logistic regression’s ability to evaluate the certainty of prediction and attempt to correlate that with the human’s
assessment of non-categorical referential choice as well as with the theoretical notion of
intermediate referent activation Cheap data modeling Secondary referential options, such as
demonstrative descriptions Genres and referential choice
![Page 29: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/29.jpg)
29
Conclusions
Multi-factorial approach Corpus large enough for machine-learning
modeling Results of prediction close to theoretical
maximum Account of the non-deterministic character
of referential choice This approach can be applied to a wide
range of other linguistic choices
![Page 30: REFERENTIAL CHOICE: FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. Linnik aakibrik@gmail.com Night Whites](https://reader035.vdocuments.us/reader035/viewer/2022062619/5516d5a35503468e338b46db/html5/thumbnails/30.jpg)
30
Thank you
for your attention