coh-metrix: an automated measure of text cohesion danielle s. mcnamara, yasuhiro ozuru, max...
TRANSCRIPT
Coh-Metrix: An Automated Measure of Text Cohesion
Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser
Coh-Metrix Investigators
Co-PIs and Senior Researchers: Max Louwerse, Art Graesser, Zhiqiang Cai, Randy Floyd, Xiangen Hu, Vasili Rus
Postdocs & Staff: Rachel Best, David Dufty, Christian Hempelman, Tenaha O’Reilly, Yasuhiro Ozuru
Many students
Coh-Metrix
• Coh-Metrix v1.2 Analyzes texts on many different dimensions of cohesion and language– Input text on a web site– Outputs 12 primary measures and over 200
additional measures
Graesser, McNamara, Louwerse, & Cai, 2004
Prior Research• Increasing text cohesion improves
memory for text content.– Increasing argument overlap between sentences.
• Most plastics are good insulators. So are clothes you wear, like sweaters and coats.
• Most plastics are good insulators. Other good insulators are the clothes you wear, like sweaters and coats.
– Adding connectives• For example, most plastics are good insulators.• because, consequently, so that, in addition, however
– Adding headers and topic sentences
Prior Research• Increasing text cohesion improves memory for
text content.• Text cohesion is particularly crucial for low-
knowledge readers.• Decreasing text cohesion helps high-
knowledge readers process the text more actively and understand it at a deeper level.– McNamara, Kintsch, Songer, & Kintsch (1996, C&I)– McNamara & Kintsch (1996, DP)– McNamara (2001, CJEP)
Cohesion and Coherence
• Research points to the need to consider text difficulty in terms of text cohesion and coherence. – Cohesion is a property of the text.– Coherence is a property of the reader’s mental
representation.
• We need automated measures of cohesion and coherence.
Current Method:Readability Measures
• E.g., Flesch-Kincaid Grade Level• Based on the work of Rudolph Flesch in the
1940’s• Scores range from 0-12 to predict grade
appropriateness• Measure based on surface characteristics
– sentence length – word length
Goals of Coh-Metrix Tool
• Analyze texts on many different dimensions of cohesion and language– Input text on a web site– Outputs over 200 measures
• Focus primarily on deeper levels of meaning and cohesion, unlike standard readability formulas
• Tailor texts to students (K12, college) with different world knowledge and abilities
Computational Linguistics Modules
Lexicons
Morpho-semantics
Part-of-speech tagging
Syntactic parsing
Latent Semanticanalysis
Pattern classifiers
Corpora norms
Heart Disease Text (McNamara et al., 1996)
5.8
6.0
6.2
6.4
6.6
6.8
7.0
7.2
7.4
High local High global
High local low global
Low local high global
Low locallow global
Cohesion
F-K
Gra
de L
evel
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Arg
umen
t ove
rlap
xx
Flesch-Kincaid Grade Level Argument overlap
Argument overlap
F-K
easy
hard
Any disorder that stops the heart from supplying blood to the body is a threat to life. Heart disease is such a disorder.
Any disorder that stops the blood supply is a threat to life. Heart disease is very common
7.8
8.4
0.26
0.45
7.4
7.6
7.8
8.0
8.2
8.4
8.6
High Cohesion Low Cohesion
Cohesion
F-K
Gra
de L
evel
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Arg
umen
t ove
rlap
xx
Flesch-Kincaid Grade Level Argument overlap
Argument overlap
F-K
easy
hard
Cohesion and Readability Scores for 19 pairs of passages examined in 12 published studies
Beck et al. (1984)Beck et al. (1991)Britton and Gulgoz (1989)Cataldo & Oakhill (2000) Kintsch (1990)Lehman & Schraw (2002)Linderholm et al. (2000) Loxterman et al. (1994) McNamara (2001) McNamara et al. (1996) Vidal-Abarca et al. (2000) Voss & Silfies (1996)
List of Cohesion Publications
Text code
Arg
um
en
t O
verl
ap,
ad
jace
nt,
un
weig
hte
dja
cen
t, u
nw
eig
hte
d
1.0
.8
.6
.4
.2
0.0
Low vs High Cohesion
1.00
2.00
Text code
Arg
um
en
t O
verl
ap,
ad
jace
nt,
un
weig
hte
dja
cen
t, u
nw
eig
hte
d
1.0
.8
.6
.4
.2
0.0
Low vs High Cohesion
1.00
2.00
Linderholm et al. 2000 Mademoiselle Germaine (Easy Text)
McNamara et al. 1996Mammal Text, Exp. 1
Lehman & Schraw 2002The Quest for the Northwest Passage
No differences
causal, particle to verb ratiocausal connectivesLSA Sentence to Sentencenoun overlap
clarification connectivescausal, particle to verb ratiocausal connectivespronoun incidence
What variables showed a greater than 50% difference in favor of the cohesive text?
Overall Results
• The 20 variables showing the largest differences were co-reference measures.
• Argument overlap measures showed the largest differences in comparison to noun and stem overlap measures– Argument overlap includes pronouns
• They skied all day. They were tired.
– Regardless of whether overlap was counted at distances of 1, 2, or 3 sentences
– Adjacent overlap showed the largest difference
Other Significant Variables
• Type-Token Ratio for Nouns (L>H) • Higher level constituents per sentence (H>L)• Ratio of causal particles and causal verbs
(p<.06; H>L)• Causal connectives (p<.07; H>L)• Celex, log Freq, min in sentence (p<.08; L>H)• Average Words per Sentence (p<.08; H>L)• LSA, sentence to sentence (p<.11; H>L)
0.650.66
0.670.680.69
0.700.710.72
0.730.74
Type Token
Low High
Indicates that the high-cohesion texts did not add new information
Constituents
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
Constituents
Low High
Causal Ratio
0
0.2
0.4
0.6
0.8
1
Causal Ratio
Low High
Connectives
0
5
10
15
20
25
Causal Connectives
Low High
Celex
0.80
0.85
0.90
0.95
1.00
1.05
Celex Min Log
Low High
Number of Words
0
100
200
300
400
500
600
700
800
Number of Words
Low High
Descriptive StatisticsN Minimum Maximum Mean Std. Deviation
38 101.0 1390.0 590.8 381.9
LSA
0.210.27
0.00
0.05
0.10
0.15
0.20
0.25
0.30
LSA
Low High
ANNOUNCING THE RELEASEOF
Coh-Metrix 1.1
Current Goals
• Examine cohesion measures by grade level for TASA and complete textbooks.
• Conducting empirical studies to further examine the effects of text cohesion for adults
• Conducting experiments to establish the effects of cohesion for young children.– e.g., currently conducting comprehension and
eye-tracking studies with 3rd-5th grade children.
What will Coh-Metrix achieve?
• Enhance education by giving educators better tools for choosing textbooks
• Help publishers more appropriately tailor books to target age groups
• Help writers improve the cohesion of their writing
• Help researchers better understand the hidden properties of text