automated suggestions for miscollocations
DESCRIPTION
Automated Suggestions for Miscollocations. Anne Li-E Liu David Wible Nai-lung Tsao. Overview. Introduction Methodology Experimental Results Conclusion. Introduction. Our study focuses on how to find suggestions for miscollocations automatically. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/1.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
1
Automated Suggestions for Miscollocations
Anne Li-E Liu
David Wible
Nai-lung Tsao
![Page 2: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/2.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
2
Overview
• Introduction
• Methodology
• Experimental Results
• Conclusion
![Page 3: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/3.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
3
Introduction
• Our study focuses on how to find suggestions for miscollocations automatically.
• In this paper, only verb-noun collocations and miscollocations are considered.
![Page 4: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/4.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
4
Introduction
• Howarth’s (1998) investigation of collocations fo
und in L1 and L2 writers’ writing.
• Granger’s analysis on adverb-adjective collocati
on (1998).
• Liu’s (2002) lexical semantic analysis on the ver
b-noun miscollocations in English Taiwanese Le
arner Corpus.
![Page 5: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/5.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
5
Introduction
Projects using learner corpora in analyzing and
categorizing learner errors:
• NICT JLE (Japanese Learner English) Corpus
• The Chinese Learner English Corpus (CLEC)
• English Taiwan Learner Corpus (or TLC) (Wible
et al., 2003).
![Page 6: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/6.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
6
An example
• She tries to improve her students’ problems.
1. solve
2. pose
3. tackle
4. grapple
5. alleviate
6. overcome
7. exacerbate
8. compound
9. beset
10. resolve
reduce
V collocates from Collocation Explorer
![Page 7: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/7.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
7
Method
• Three features of collocate candidates are used:
1. Word association strength,
2. Semantic similarity
3. Intercollocability (Cowie and Howarth, 1996).
![Page 8: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/8.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
8
Resource
• 84 VN miscollocations in TLC (Liu, 2002).
Training data: 42 Testing data: 42
• Two knowledge resources: BNC, WordNet
• Two human evaluators.
![Page 9: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/9.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
9
Word Association Strength
• Mutual Information (Church et al. 1991)
• Two purposes:
1. All suggested correct collocations have to be
identified as collocations.
2. The higher the word association strength the
more likely it is to be a correct substitute for
the wrong collocate.
![Page 10: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/10.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
10
Semantic Similarity
• A semantic relation holds between a miscollocate and its correct counterpart (Gitsaki et al., 2000; Li
u 2002)
• The synsets of WordNet to be nodes in a graph. measure graph-theoretic distance
*say a story tell a story
Synonymous relation
*say a story think of a story
Hypernymy relation
![Page 11: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/11.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
11
Semantic Similarity
)),max(2
),(1(max),(
)(),(21
21ji
ji ss
ji
wsynsetswsynsets LL
ssdiswwsim
![Page 12: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/12.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
12
Intercollocability
• Cowie and Howarth (1996) propose that certain collocations form clusters on the basis of the shared meaning.
convey point get across the message
express concern convey feeling
communicate concern
convey message get across point express concern communicate feeling
![Page 13: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/13.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
13
Intercollocability
• Collocations in a cluster show a certain degree
of intercollocability.
express one’s concern
condolences
convey messageget across pointexpress concern communicate feeling
express
communicate
concern
feeling
?
![Page 14: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/14.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
14
Intercollocability
She tries to *improve her students’ problems.
*improve problem
52 noun collocates improve
problem 86 verb collocates
resolve/improve + situation
+ matter
+ way
reduce/improve
+ quality
+ efficiency
+ effectiveness
resolve reduce
Starting point.
Does any of the 86 verbs co-occur with the 52 nouns?
problem problem
![Page 15: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/15.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
15
situation
matter
problem
way
quality
efficiency
effectiveness
Intercollocability
• The cluster is partially created and the link between
improve, resolve and reduce is developed by virtue of
the overlapping noun collocates.
situation
matter
problem
wayimprove
problemresolve
reduce
![Page 16: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/16.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
16
Intercollocability
Quantify intercollocability
The number of shared collocates
![Page 17: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/17.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
17
shared collocate (resolve, improve) = 3shared collocate (reduce, improve) = 3
The more shared collocates a verb has with the wrong verb, the more likely this verb is a good candidate
situation
matter
problem
way
quality
efficiency
effectiveness
situation
matter
problem
wayimprove
problemresolve
reduce
![Page 18: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/18.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
18
Integrate the 3 features
• The probabilistic model
mc
mc
Ff
cFf
c
mc
ccmcmcc fP
SPSfP
FP
SPSFPFSP
,
,
,
,,
![Page 19: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/19.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
19
Training
• Probability distribution of word association strength
MI value to 5 levels (<1.5, 1.5~3.0, 3.0~4.5, 4.5~6, >6)
P( MI level )
P(MI level | Sc)
![Page 20: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/20.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
20
Training
• Probability distribution of semantic similarity
Similarity score to 5 levels(0.0~0.2, 0.2~0.4, 0.4~0.6, 0.6~0.8 and 0.8 ~1.0 )
P(SS level )
P(SS level | Sc)
![Page 21: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/21.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
21
Training
• Probability distribution of intercollocability
Normalized shared collocates number to 5 levels
(0.0~0.2, 0.2~0.4, 0.4~0.6, 0.6~0.8 and 0.8 ~1.0 )
P(SC level )
P(SC level | Sc)
![Page 22: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/22.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
22
Experiments• Different combinations of the three features.
Models Feature (s) considered
M1 MI (Mutual Information)
M2 SS (Semantic Similarity)
M3 SC (Shared Collocates)
M4 MI + SS
M5 MI + SC
M6 SS + SC
M7 MI+ SS + SC
![Page 23: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/23.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
23
Results K-
BestM1 M2
(SS)M3 M4 M5 M6
(SS+SC)M7
(MI+SS+SC)
1 16.67 40.48 22.62 48.81 29.76 55.95 53.75
2 36.90 53.45 38.10 60.71 44.05 63.1 67.86
3 47.62 64.29 50.00 71.43 59.52 77.38 78.57
4 52.38 67.86 63.10 77.38 72.62 80.95 82.14
5 64.29 75.00 72.62 83.33 78.57 83.33 85.71
6 65.48 77.38 75.00 85.71 83.33 84.52 88.10
7 67.86 77.38 77.38 86.90 86.90 86.90 89.29
8 70.24 80.95 82.14 86.90 89.29 88.10 91.67
9 72.62 83.33 85.71 88.10 92.86 90.48 92.86
10 76.19 86.90 88.10 88.10 94.05 90.48 94.05
![Page 24: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/24.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
24
Results (cont.)
The K-Best suggestions for “get knowledge”.
K-Best M2 M6 M7
1 aim obtain acquire
2 generate share share
3 draw develop obtain
4 obtain generate develop
5 develop acquire gain
![Page 25: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/25.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
25
The K-Best suggestions for *reach purpose.
K-Best M2 M6 M7
1 achieve achieve achieve
2 teach account account
3 explain trade trade
4 account treat fulfill
5 trade allocate serve
![Page 26: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/26.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
26
The K-Best suggestions for *pay time.
K-Best M2 M6 M7
1 devote spend spend
2 spend invest waste
3 expend devote devote
4 spare date invest
5 invest waste date
![Page 27: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/27.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
27
Conclusion
• A probabilistic model to integrate features.
• The early experimental result shows the
potential of this research.
![Page 28: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/28.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
28
Future works
• Applying such mechanisms to other types of mis
collocations.
• Miscollocation detection will be one of the main
points of this research.
• A larger amount of miscollocations should be inc
luded in order to verify our approach.
![Page 29: Automated Suggestions for Miscollocations](https://reader035.vdocuments.us/reader035/viewer/2022062520/56815b17550346895dc8c897/html5/thumbnails/29.jpg)
June 5, 2009 Automated Suggestions for Miscollocations
29
Thank you!
Q & A
Anne Li-E Liu [email protected]
David Wible [email protected]
Nai-Lung Tsao [email protected]