stefan evert, ims - uni stuttgart brigitte krenn, Öfai wien ims the significance of result...
TRANSCRIPT
![Page 1: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/1.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
The Significance of Result Differences
![Page 2: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/2.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Why Significance Tests?
• everybody knows we have to test the significance of our results• but do we really?
• evaluation results are valid for• data from specific corpus• extracted with specific methods• for a particular type of collocations• according to the intuitions of one
particular annotator (or two)
![Page 3: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/3.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Why Significance Tests?
• significance tests are about generalisations
• basic question:"If we repeated the evaluation experiment (on similar data), would we get the same results?"
• influence of source corpus, domain, collocation type and definition, annotation guidelines, ...
![Page 4: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/4.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Evaluation of Association Measures
![Page 5: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/5.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Evaluation of Association Measures
![Page 6: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/6.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
A Different Perspective
• pair types are described by tables (O11, O12, O21, O22) coordinates in 4-D space
• O22 is redundant becauseO11 + O12 + O21 + O22 = N
• can also describe pair type by joint and marginal frequencies(f, f1, f2) = "coordinates" coordinates in 3-D space
![Page 7: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/7.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
A Different Perspective
• data set = cloud of points in three-dimensional space
• visualisation is "challenging"• many association measures
depend on O11 and E11 only(MI, gmean, t-score, binomial)
• projection to (O11, E11) coordinates in 2-D space(ignoring the ratio f1 / f2)
![Page 8: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/8.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
The Parameter Space of Collocation Candidates
![Page 9: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/9.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
The Parameter Space of Collocation Candidates
![Page 10: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/10.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
The Parameter Space of Collocation Candidates
![Page 11: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/11.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
The Parameter Space of Collocation Candidates
![Page 12: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/12.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
The Parameter Space of Collocation Candidates
![Page 13: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/13.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
N-best Lists in Parameter Space
• N-best List for AM includes all pair types where score c(threshold c obtained from data)
• { c} describes a subset of the parameter space
• for a sound association measure isoline { = c} is lower boundary(because scores should increase with O11 for fixed value of E11)
![Page 14: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/14.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
N-Best Isolines in the Parameter Space
MI
![Page 15: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/15.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
N-Best Isolines in theParameter Space
MI
![Page 16: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/16.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
N-Best Isolines in theParameter Space
t-score
![Page 17: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/17.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
N-Best Isolines in theParameter Space
t-score
![Page 18: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/18.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
95% Confidence Interval
![Page 19: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/19.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
99% Confidence Interval
![Page 20: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/20.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
95% Confidence Interval
![Page 21: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/21.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Comparing Precision Values
• number of TPs and FPs for 1000-best lists
tbl t-score frequency
TPs 322 283
FPs 678 717
![Page 22: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/22.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
McNemar's Test
+ = in 1000-best list – = not in 1000-best list• ideally: all TPs in 1000-best list (possible!)
• H0: differences between AMs are random
tbl – t-score + t-score
– freq 610 46
+ freq 7 276
![Page 23: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/23.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
McNemar's Test
+ = in 1000-best list – = not in 1000-best list> mcnemar.test(tbl)
• p-value < 0.001 highly significant
tbl – t-score + t-score
– freq 610 46
+ freq 7 276
![Page 24: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/24.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Significant Differences
![Page 25: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/25.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Significant Differences
![Page 26: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/26.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Significant Differences
= significant = relevant (2%)
![Page 27: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/27.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Lowest-Frequency Data: Samples
• Too much data for full manual evaluation random samples
• AdjN data• 965 pairs with f = 1 (15% sample)• manually identified 31 TPs (3.2%)
• PNV data• 983 pairs with f < 3 (0.35% sample)• manually identified 6 TPs (0.6%)
![Page 28: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/28.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Lowest-Frequency Data: Samples
• Estimate proportion p of TPs among all lowest-frequency data
• Confidence set from binomial test• AdjN: 31 TPs among 965 items
• p 5% with 99% confidence• at most 320 TPs
• PNV: 6 TPs among 983-items • p 1.5% with 99% confidence• there might still be 4200 TPs !!
![Page 29: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/29.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
N-best Lists for Lowest-Frequency Data
• evaluate 10,000-best lists• to reduce manual annotation work,
take 10% sample from each list(i.e. 1,000 candidates for each AM)
• precision graphs for N-best lists• up to N = 10,000 for the PNV data
• 95% confidence estimates for precision of best-performing AM (from binomial test)
![Page 30: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/30.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Random Sample Evaluation
![Page 31: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/31.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Random Sample Evaluation
![Page 32: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS The Significance of Result Differences](https://reader034.vdocuments.us/reader034/viewer/2022052321/55141078550346dd488b5049/html5/thumbnails/32.jpg)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Random Sample Evaluation