coh-metrix: an automated measure of text cohesion danielle s. mcnamara, yasuhiro ozuru, max...

28
Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

Upload: marisa-ipson

Post on 16-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

Coh-Metrix: An Automated Measure of Text Cohesion

Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

Page 2: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

Coh-Metrix Investigators

Co-PIs and Senior Researchers: Max Louwerse, Art Graesser, Zhiqiang Cai, Randy Floyd, Xiangen Hu, Vasili Rus

Postdocs & Staff: Rachel Best, David Dufty, Christian Hempelman, Tenaha O’Reilly, Yasuhiro Ozuru

Many students

Page 3: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

Coh-Metrix

• Coh-Metrix v1.2 Analyzes texts on many different dimensions of cohesion and language– Input text on a web site– Outputs 12 primary measures and over 200

additional measures

Graesser, McNamara, Louwerse, & Cai, 2004

Page 4: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

Prior Research• Increasing text cohesion improves

memory for text content.– Increasing argument overlap between sentences.

• Most plastics are good insulators. So are clothes you wear, like sweaters and coats.

• Most plastics are good insulators. Other good insulators are the clothes you wear, like sweaters and coats.

– Adding connectives• For example, most plastics are good insulators.• because, consequently, so that, in addition, however

– Adding headers and topic sentences

Page 5: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

Prior Research• Increasing text cohesion improves memory for

text content.• Text cohesion is particularly crucial for low-

knowledge readers.• Decreasing text cohesion helps high-

knowledge readers process the text more actively and understand it at a deeper level.– McNamara, Kintsch, Songer, & Kintsch (1996, C&I)– McNamara & Kintsch (1996, DP)– McNamara (2001, CJEP)

Page 6: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

Cohesion and Coherence

• Research points to the need to consider text difficulty in terms of text cohesion and coherence. – Cohesion is a property of the text.– Coherence is a property of the reader’s mental

representation.

• We need automated measures of cohesion and coherence.

Page 7: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

Current Method:Readability Measures

• E.g., Flesch-Kincaid Grade Level• Based on the work of Rudolph Flesch in the

1940’s• Scores range from 0-12 to predict grade

appropriateness• Measure based on surface characteristics

– sentence length – word length

Page 8: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

Goals of Coh-Metrix Tool

• Analyze texts on many different dimensions of cohesion and language– Input text on a web site– Outputs over 200 measures

• Focus primarily on deeper levels of meaning and cohesion, unlike standard readability formulas

• Tailor texts to students (K12, college) with different world knowledge and abilities

Page 9: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

Computational Linguistics Modules

Lexicons

Morpho-semantics

Part-of-speech tagging

Syntactic parsing

Latent Semanticanalysis

Pattern classifiers

Corpora norms

Page 10: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

Heart Disease Text (McNamara et al., 1996)

5.8

6.0

6.2

6.4

6.6

6.8

7.0

7.2

7.4

High local High global

High local low global

Low local high global

Low locallow global

Cohesion

F-K

Gra

de L

evel

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Arg

umen

t ove

rlap

xx

Flesch-Kincaid Grade Level Argument overlap

Argument overlap

F-K

easy

hard

Any disorder that stops the heart from supplying blood to the body is a threat to life. Heart disease is such a disorder.

Any disorder that stops the blood supply is a threat to life. Heart disease is very common

Page 11: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

7.8

8.4

0.26

0.45

7.4

7.6

7.8

8.0

8.2

8.4

8.6

High Cohesion Low Cohesion

Cohesion

F-K

Gra

de L

evel

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Arg

umen

t ove

rlap

xx

Flesch-Kincaid Grade Level Argument overlap

Argument overlap

F-K

easy

hard

Cohesion and Readability Scores for 19 pairs of passages examined in 12 published studies

Page 12: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

Beck et al. (1984)Beck et al. (1991)Britton and Gulgoz (1989)Cataldo & Oakhill (2000) Kintsch (1990)Lehman & Schraw (2002)Linderholm et al. (2000) Loxterman et al. (1994) McNamara (2001) McNamara et al. (1996) Vidal-Abarca et al. (2000) Voss & Silfies (1996)

List of Cohesion Publications

Page 13: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

Text code

Arg

um

en

t O

verl

ap,

ad

jace

nt,

un

weig

hte

dja

cen

t, u

nw

eig

hte

d

1.0

.8

.6

.4

.2

0.0

Low vs High Cohesion

1.00

2.00

Page 14: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

Text code

Arg

um

en

t O

verl

ap,

ad

jace

nt,

un

weig

hte

dja

cen

t, u

nw

eig

hte

d

1.0

.8

.6

.4

.2

0.0

Low vs High Cohesion

1.00

2.00

Linderholm et al. 2000 Mademoiselle Germaine (Easy Text)

McNamara et al. 1996Mammal Text, Exp. 1

Lehman & Schraw 2002The Quest for the Northwest Passage

No differences

causal, particle to verb ratiocausal connectivesLSA Sentence to Sentencenoun overlap

clarification connectivescausal, particle to verb ratiocausal connectivespronoun incidence

What variables showed a greater than 50% difference in favor of the cohesive text?

Page 15: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

Overall Results

• The 20 variables showing the largest differences were co-reference measures.

• Argument overlap measures showed the largest differences in comparison to noun and stem overlap measures– Argument overlap includes pronouns

• They skied all day. They were tired.

– Regardless of whether overlap was counted at distances of 1, 2, or 3 sentences

– Adjacent overlap showed the largest difference

Page 16: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

Other Significant Variables

• Type-Token Ratio for Nouns (L>H) • Higher level constituents per sentence (H>L)• Ratio of causal particles and causal verbs

(p<.06; H>L)• Causal connectives (p<.07; H>L)• Celex, log Freq, min in sentence (p<.08; L>H)• Average Words per Sentence (p<.08; H>L)• LSA, sentence to sentence (p<.11; H>L)

Page 17: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

0.650.66

0.670.680.69

0.700.710.72

0.730.74

Type Token

Low High

Indicates that the high-cohesion texts did not add new information

Page 18: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

Constituents

1.8

1.9

2.0

2.1

2.2

2.3

2.4

2.5

Constituents

Low High

Page 19: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

Causal Ratio

0

0.2

0.4

0.6

0.8

1

Causal Ratio

Low High

Page 20: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

Connectives

0

5

10

15

20

25

Causal Connectives

Low High

Page 21: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

Celex

0.80

0.85

0.90

0.95

1.00

1.05

Celex Min Log

Low High

Page 22: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

Number of Words

0

100

200

300

400

500

600

700

800

Number of Words

Low High

Descriptive StatisticsN Minimum Maximum Mean Std. Deviation

38 101.0 1390.0 590.8 381.9

Page 23: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

LSA

0.210.27

0.00

0.05

0.10

0.15

0.20

0.25

0.30

LSA

Low High

Page 24: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

ANNOUNCING THE RELEASEOF

Coh-Metrix 1.1

Page 25: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser
Page 26: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser
Page 27: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

Current Goals

• Examine cohesion measures by grade level for TASA and complete textbooks.

• Conducting empirical studies to further examine the effects of text cohesion for adults

• Conducting experiments to establish the effects of cohesion for young children.– e.g., currently conducting comprehension and

eye-tracking studies with 3rd-5th grade children.

Page 28: Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser

What will Coh-Metrix achieve?

• Enhance education by giving educators better tools for choosing textbooks

• Help publishers more appropriately tailor books to target age groups

• Help writers improve the cohesion of their writing

• Help researchers better understand the hidden properties of text