june 5, 2009automated suggestions for miscollocations 1 anne li-e liu david wible nai-lung tsao
Post on 21-Dec-2015
217 views
TRANSCRIPT
June 5, 2009 Automated Suggestions for Miscollocations
1
Automated Suggestions for Miscollocations
Anne Li-E Liu
David Wible
Nai-lung Tsao
June 5, 2009 Automated Suggestions for Miscollocations
2
Overview
• Introduction
• Methodology
• Experimental Results
• Conclusion
June 5, 2009 Automated Suggestions for Miscollocations
3
Introduction
• Our study focuses on how to find suggestions for miscollocations automatically.
• In this paper, only verb-noun collocations and miscollocations are considered.
June 5, 2009 Automated Suggestions for Miscollocations
4
Introduction
• Howarth’s (1998) investigation of collocations fo
und in L1 and L2 writers’ writing.
• Granger’s analysis on adverb-adjective collocati
on (1998).
• Liu’s (2002) lexical semantic analysis on the ver
b-noun miscollocations in English Taiwanese Le
arner Corpus.
June 5, 2009 Automated Suggestions for Miscollocations
5
Introduction
Projects using learner corpora in analyzing and
categorizing learner errors:
• NICT JLE (Japanese Learner English) Corpus
• The Chinese Learner English Corpus (CLEC)
• English Taiwan Learner Corpus (or TLC) (Wible
et al., 2003).
June 5, 2009 Automated Suggestions for Miscollocations
6
An example
• She tries to improve her students’ problems.
1. solve
2. pose
3. tackle
4. grapple
5. alleviate
6. overcome
7. exacerbate
8. compound
9. beset
10. resolve
reduce
V collocates from Collocation Explorer
June 5, 2009 Automated Suggestions for Miscollocations
7
Method
• Three features of collocate candidates are used:
1. Word association strength,
2. Semantic similarity
3. Intercollocability (Cowie and Howarth, 1996).
June 5, 2009 Automated Suggestions for Miscollocations
8
Resource
• 84 VN miscollocations in TLC (Liu, 2002).
Training data: 42 Testing data: 42
• Two knowledge resources: BNC, WordNet
• Two human evaluators.
June 5, 2009 Automated Suggestions for Miscollocations
9
Word Association Strength
• Mutual Information (Church et al. 1991)
• Two purposes:
1. All suggested correct collocations have to be
identified as collocations.
2. The higher the word association strength the
more likely it is to be a correct substitute for
the wrong collocate.
June 5, 2009 Automated Suggestions for Miscollocations
10
Semantic Similarity
• A semantic relation holds between a miscollocate and its correct counterpart (Gitsaki et al., 2000; Li
u 2002)
• The synsets of WordNet to be nodes in a graph. measure graph-theoretic distance
*say a story tell a story
Synonymous relation
*say a story think of a story
Hypernymy relation
June 5, 2009 Automated Suggestions for Miscollocations
11
Semantic Similarity
)),max(2
),(1(max),(
)(),(21
21ji
ji ss
ji
wsynsetswsynsets LL
ssdiswwsim
June 5, 2009 Automated Suggestions for Miscollocations
12
Intercollocability
• Cowie and Howarth (1996) propose that certain collocations form clusters on the basis of the shared meaning.
convey point get across the message
express concern convey feeling
communicate concern
convey message get across point express concern communicate feeling
June 5, 2009 Automated Suggestions for Miscollocations
13
Intercollocability
• Collocations in a cluster show a certain degree
of intercollocability.
express one’s concern
condolences
convey messageget across pointexpress concern communicate feeling
express
communicate
concern
feeling
?
June 5, 2009 Automated Suggestions for Miscollocations
14
Intercollocability
She tries to *improve her students’ problems.
*improve problem
52 noun collocates improve
problem 86 verb collocates
resolve/improve + situation
+ matter
+ way
reduce/improve
+ quality
+ efficiency
+ effectiveness
resolve reduce
Starting point.
Does any of the 86 verbs co-occur with the 52 nouns?
problem problem
June 5, 2009 Automated Suggestions for Miscollocations
15
situation
matter
problem
way
quality
efficiency
effectiveness
Intercollocability
• The cluster is partially created and the link between
improve, resolve and reduce is developed by virtue of
the overlapping noun collocates.
situation
matter
problem
wayimprove
problemresolve
reduce
June 5, 2009 Automated Suggestions for Miscollocations
16
Intercollocability
Quantify intercollocability
The number of shared collocates
June 5, 2009 Automated Suggestions for Miscollocations
17
shared collocate (resolve, improve) = 3shared collocate (reduce, improve) = 3
The more shared collocates a verb has with the wrong verb, the more likely this verb is a good candidate
situation
matter
problem
way
quality
efficiency
effectiveness
situation
matter
problem
wayimprove
problemresolve
reduce
June 5, 2009 Automated Suggestions for Miscollocations
18
Integrate the 3 features
• The probabilistic model
mc
mc
Ff
cFf
c
mc
ccmcmcc fP
SPSfP
FP
SPSFPFSP
,
,
,
,,
June 5, 2009 Automated Suggestions for Miscollocations
19
Training
• Probability distribution of word association strength
MI value to 5 levels (<1.5, 1.5~3.0, 3.0~4.5, 4.5~6, >6)
P( MI level )
P(MI level | Sc)
June 5, 2009 Automated Suggestions for Miscollocations
20
Training
• Probability distribution of semantic similarity
Similarity score to 5 levels(0.0~0.2, 0.2~0.4, 0.4~0.6, 0.6~0.8 and 0.8 ~1.0 )
P(SS level )
P(SS level | Sc)
June 5, 2009 Automated Suggestions for Miscollocations
21
Training
• Probability distribution of intercollocability
Normalized shared collocates number to 5 levels
(0.0~0.2, 0.2~0.4, 0.4~0.6, 0.6~0.8 and 0.8 ~1.0 )
P(SC level )
P(SC level | Sc)
June 5, 2009 Automated Suggestions for Miscollocations
22
Experiments• Different combinations of the three features.
Models Feature (s) considered
M1 MI (Mutual Information)
M2 SS (Semantic Similarity)
M3 SC (Shared Collocates)
M4 MI + SS
M5 MI + SC
M6 SS + SC
M7 MI+ SS + SC
June 5, 2009 Automated Suggestions for Miscollocations
23
Results K-
BestM1 M2
(SS)M3 M4 M5 M6
(SS+SC)M7
(MI+SS+SC)
1 16.67 40.48 22.62 48.81 29.76 55.95 53.75
2 36.90 53.45 38.10 60.71 44.05 63.1 67.86
3 47.62 64.29 50.00 71.43 59.52 77.38 78.57
4 52.38 67.86 63.10 77.38 72.62 80.95 82.14
5 64.29 75.00 72.62 83.33 78.57 83.33 85.71
6 65.48 77.38 75.00 85.71 83.33 84.52 88.10
7 67.86 77.38 77.38 86.90 86.90 86.90 89.29
8 70.24 80.95 82.14 86.90 89.29 88.10 91.67
9 72.62 83.33 85.71 88.10 92.86 90.48 92.86
10 76.19 86.90 88.10 88.10 94.05 90.48 94.05
June 5, 2009 Automated Suggestions for Miscollocations
24
Results (cont.)
The K-Best suggestions for “get knowledge”.
K-Best M2 M6 M7
1 aim obtain acquire
2 generate share share
3 draw develop obtain
4 obtain generate develop
5 develop acquire gain
June 5, 2009 Automated Suggestions for Miscollocations
25
The K-Best suggestions for *reach purpose.
K-Best M2 M6 M7
1 achieve achieve achieve
2 teach account account
3 explain trade trade
4 account treat fulfill
5 trade allocate serve
June 5, 2009 Automated Suggestions for Miscollocations
26
The K-Best suggestions for *pay time.
K-Best M2 M6 M7
1 devote spend spend
2 spend invest waste
3 expend devote devote
4 spare date invest
5 invest waste date
June 5, 2009 Automated Suggestions for Miscollocations
27
Conclusion
• A probabilistic model to integrate features.
• The early experimental result shows the
potential of this research.
June 5, 2009 Automated Suggestions for Miscollocations
28
Future works
• Applying such mechanisms to other types of mis
collocations.
• Miscollocation detection will be one of the main
points of this research.
• A larger amount of miscollocations should be inc
luded in order to verify our approach.
June 5, 2009 Automated Suggestions for Miscollocations
29
Thank you!
Q & A
Anne Li-E Liu [email protected]
David Wible [email protected]
Nai-Lung Tsao [email protected]