computational models of text quality

106
Computational Models of Text Quality Ani Nenkova University of Pennsylvania ESSLLI 2010, Copenhagen 1

Upload: jun-wang

Post on 18-Jul-2015

107 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Computational Models of Text Quality

Computational Models of Text Quality

Ani NenkovaUniversity of Pennsylvania

ESSLLI 2010, Copenhagen

1

Page 2: Computational Models of Text Quality

The ultimate text quality application Imagine your favorite text editor

With spell-checker and grammar checker But also functions that tell you

``Word W is repeated too many times” ``Fill the gap is a cliché” ``You might consider using this more figurative

expression” ``This sentence is unclear and hard to read’’ ``What is the connection between these two

sentences?” ……..

2

Page 3: Computational Models of Text Quality

Currently

It is our friends who give such feedback Often conflicting We might agree that a text is good, but find it

hard to explain exactly why

Computational linguistics should have some answers

Though far from offering a complete solution yet

3

Page 4: Computational Models of Text Quality

In this course

We will overview research dealing with various aspects of text quality

A unified approach does not yet exist, but many proposals have been tested on corpus data integrated in applications

4

Page 5: Computational Models of Text Quality

Current applications: education

Grading student writing Is this a good essay? One of the graders of SAT and GRE essays is in

fact a machine! [1]http://www.ets.org/research/capabilities/automated_scoring

Providing appropriate reading material Is this text good for a particular user? Appropriate grade level Appropriate language competency in L2 [2,3]http://reap.cs.cmu.edu/ 5

Page 6: Computational Models of Text Quality

Current applications: information retrieval Particularly user generated content

Questions and answers on the web Blogs and comments

Searching over such content poses new problems [4] What is a good question/answer/comment?http://answers.yahoo.com/

Relevant for general IR as well Of the many relevant document some, are better

written

6

Page 7: Computational Models of Text Quality

Current applications: NLP

Models of text quality lead to improved systems [5] offer possibilities for automatic evaluation [6]

Automatic summarization Select important content and organize it in as well-

written text

Language generation Select, organize and present content on

document, paragraph, sentence and phrase level

Machine translation7

Page 8: Computational Models of Text Quality

Text quality factors

Interesting Style (clichés, figurative language)

Vocabulary use Grammatical and fluent sentences Coherent and easy to understand

In most types of writing, well-written means clear and easy to understand. Not necessarily so in literary works.

Problems with clarity of instructions motivated a fair amount of early work.

8

Page 9: Computational Models of Text Quality

Early work: keep in mind these predate modern computers! Common words are easier to understand

stentorian vs. loud myocardial infarction vs. heart attack

Common words are shorto Standard readability metrics o percentage of words not among the N most frequento average numbers of syllables per word

Syntactically simple sentences are easier to understand o average number of words per sentence

[Flesch-Kincaid, Automated Readability Index, Gunning-Fog, SMOG, Coleman-Liau]

9

Page 10: Computational Models of Text Quality

Modern equivalents

Language models Word probabilities from a large collection

http://www.speech.cs.cmu.edu/SLM_info.html

Features derived from syntactic parse [2,7,8,9] Parse tree height Number of subordinating conjunctions Number of passive voice constructions Number of noun and verb phrases

10

Page 11: Computational Models of Text Quality

Language models

Unigram and bigram language models Really, just huge tables Smoothing necessary to account for unseen

words

p(w) =nw

N

p(w1 | w2) =nw2w1

nw2

11

Page 12: Computational Models of Text Quality

Features from language models

Assessing the readability of text t consisting of m words, for intended audience class c

Number of out of vocabulary words in the text with respect to the language model for c

Text likelihood and perplexity

L(t) = P(c)P(w1 | c)....P(wm | c)

PP = 2H (t |c )

H(t | c) = − 1m

log2 P(t | c)12

Page 13: Computational Models of Text Quality

Application to grade level predictionCollins-Thompson and Callan, NAACL 2004 [10]

13

Page 14: Computational Models of Text Quality

Application to grade level predictionCollins-Thompson and Callan, NAACL 2004 [10]

14

Page 15: Computational Models of Text Quality

Results on predicting grade levelSchwarm and Ostendorf, ACL 2005 [11]

Flesch-Kincaid Grade Level index number of syllables per word sentence length

Lexile word frequency sentence length

SVM features language models and syntax

15

Page 16: Computational Models of Text Quality

Models of text coherence

Global coherence Overall document organization

Local coherence Adjacent sentences

16

Page 17: Computational Models of Text Quality

Text structure can be learnt in an unsupervised manner

Human-written examples from a domain

Location, time

relief efforts

magnitude

damage

17

Page 18: Computational Models of Text Quality

Content model Barzilay & Lee’04 [12]

Hidden Markov Model (HMM)-based States - clusters of related sentences “topics” Transition prob. - sentence precedence in corpus Emission prob. - bigram language model

location, magnitude casualties

relief efforts

)|()|(),|,( 11111 +++++ ⋅=><>< iieiitiiii hsphhphshsp

Earthquake reportsTransition from previous topic

Generating sentence in current topic

18

Page 19: Computational Models of Text Quality

Generating Wikipedia articlesSauper and Barzilay, 2009 [12]

Articles on diseases and American film actors

Create templates of subtopics

Focus only on subtopic level structure◦ Use paragraphs from documents on the web

19

Page 20: Computational Models of Text Quality

Template creation

Cluster similar headings signs and symptoms, symptoms, early

symptoms…

Choose k clusters average number of subtopics in that domain

Find majority ordering for the clusters

Biography

Early life

Career

Personal life

Death

Diseases

Symptoms

Causes

Diagnosis

Treatment 20

Page 21: Computational Models of Text Quality

Extraction of excerpts and ranking

Candidates for a subtopic Paragraphs from top 10 pages of search

results

Measure relevance of candidates for that subtopic Features ~ unigrams, bigrams, number of

sentences…21

Page 22: Computational Models of Text Quality

Need to control redundancy across subtopics

Integer Linear Program

Variables One per excerpt (value 1-chosen or 0)

Objective Minimize sum of the ranks of the

excerpts chosen

causes

symptoms

diagnosis

treatment

Constraints◦ Cosine similarity between any selected pair <= 0.5◦ One excerpt per subtopic

1 2 3 4 5

22

Page 23: Computational Models of Text Quality

Linguistic models of coherence[Halliday and Hasan, 1976] [13]

Coherent text is characterized by the presence of various types of cohesive links that facilitate text comprehension

Reference and lexical reiteration Pronouns, definite descriptions, semantically

related words

Discourse relations (conjunction) I closed the window because it started raining.

Substitution (one) or ellipses (do)23

Page 24: Computational Models of Text Quality

Referential coherence

Centering theory tracking focus of attention across adjacent

sentences [14, 15, 16, 17]

Syntactic form of references Particularly first and subsequent mention [18,

19], pronominalization

Lexical chains Identifying and tracking topics within a text [20,

21, 22, 23]

24

Page 25: Computational Models of Text Quality

Discourse relations

Explicit vs. implicit I stayed home because I had a headacheo Signaled by a discourse connective

o Inferred without the presence of a connective I took my umbrella. [Because] The forecast was

for rain in the afternoon.

25

Page 26: Computational Models of Text Quality

Lexical chains

Often discussed as cohesion indicator, implemented systems, but not used in text quality tasks Find all words that refer to the same topic Find the correct sense of the words

LexChainer Tool: http://www1.cs.columbia.edu/nlp/tools.cgi [23]

Applications: summarization, IR, spell checking, hypertext construction

John bought a Jaguar. He loves the car.

LC = {jaguar, car, engine, it}

26

Page 27: Computational Models of Text Quality

Centering theory ingredients(Grosz et al, 1995)

Deals with local coherence What happens to the flow from sentence to

sentence Does not deal with global structuring of the

text (paragraphs/segments)

Defines coherence as an estimate of the processing load required to “understand” the text

27

Page 28: Computational Models of Text Quality

Processing load

Upon hearing a sentence a person Cognitive effort to interpret the expressions in

the utterance Integrates the meaning of the utterance with

that of the previous sentence Creates some expectations on what might

come next

28

Page 29: Computational Models of Text Quality

Example

(1) John met his friend Mary today.

(2) He was surprised to see her.

(3) He thought she is still in Italy.

Form of referring expressions Anaphora needs to be resolved “Create” a discourse entity at first mention with

full noun phrase

Creating expectations

29

Page 30: Computational Models of Text Quality

Creating and meeting expectations

(1) a. John went to his favorite music store to buy a piano. b. He had frequented the store for many years. c. He was excited that he could finally buy a piano. d. He arrived just as the store was closing for the day.

(2) a. John went to his favorite music store to buy a piano. b. It was a store John had frequented for many years. c. He was excited that he could finally buy a piano. d. It was closing just as John arrived.

30

Page 31: Computational Models of Text Quality

Interpreting pronouns

a. Terry really goofs sometimes.

b. Yesterday was a beautiful day and he was excited about trying out his new sailboat.

c. He wanted Tony to join him on a sailing expedition.

d. He called him at 6am.

e. He was sick and furious at being woken up so early.

31

Page 32: Computational Models of Text Quality

Basic centering definitions

Centers of an utterance Set of entities serving to link that utterance to

the other utterances in the discourse segment that contains it

Not words or phrases themselves Semantic interpretations of noun phraes

32

Page 33: Computational Models of Text Quality

Types of centers

Forward looking centers An ordered set of entities What could we expect to hear about next Ordered by salience as determined by grammatical function Subject > Indirect object > Object > Others

John gave the textbook to Mary. Cf = {John, Mary, textbook}

Preferred center Cp

The highest ranked forward looking center High expectation that the next utterance in the segment will

be about Cp

33

Page 34: Computational Models of Text Quality

Backward looking center

Single backward looking center, Cb (U) For each utterance other than the segment-

initial one

The backward looking center of utterance Un+1

connects with one of the forward looking centers of Un

Cb (U+1) is the most highly ranked element from Cf (Un) that is also realized in U+1

34

Page 35: Computational Models of Text Quality

Centering transitions ordering

Cb(Un+1)=Cb(Un) ORCb(Un)=[?]

Cb(Un+1) != Cb(Un)

Cb(Un+1) = Cp(Un+1) continue smooth-shift

Cb(Un+1) != Cp(Un+1) retain rough-shift

35

Page 36: Computational Models of Text Quality

Centering constraints

There is precisely one backward-looking center Cb(Un)

Cb(Un+1) is the highest-ranked element of Cf(Un) that is realized in Un+1

36

Page 37: Computational Models of Text Quality

Centering rules

If some element of Cf(Un) is realized as a pronoun in Un+1 then so is Cb(Un+1)

Transitions not equal continue > retain > smooth-shift > rough-shift

37

Page 38: Computational Models of Text Quality

Centering analysis

Terry really goofs sometimes. Cf={Terry}, Cb=?, undef

Yesterday was a beautiful day and he was excited about trying out his new sailboat. Cf={Terry,sailboat}, Cb=Terry, continue

He wanted Tony to join him in a sailing expedition. Cf={Terry, Tony, expedition}, Cb=Terry, continue

He called him at 6am. Cf={Terry,Tony}, Cb=Terry, continue

38

Page 39: Computational Models of Text Quality

He called him at 6am. Cf={Terry,Tony}, Cb=Terry, continue

Tony was sick and furious at being woken up so early. Cf={Tony}, Cb=Tony, smooth shift

He told Terry to get lost and hung up. Cf={Tony,Terry}, Cb=Tony, continue

Of course, Terry hadn’t intended to upset Tony. Cf={Terry,Tony}, Cb = Tony, retain

39

Page 40: Computational Models of Text Quality

Rough shifts in evaluation of writing skills (Miltsakaki and Kukich, 2002)

Automatic grading of essays by E-rater Syntactic variety

Represented by features that quantify the occurrence of clause types

Clear transitions Cue phrases in certain syntactic constructions

Existence of main and supporting points Appropriateness of the vocabulary content of the

essay What about local coherence?

40

Page 41: Computational Models of Text Quality

Essay score model

Human score available E-rater prediction available Percentage of rough-shifts in each essay:

analysis done manually

Negative correlation between the human score and the percentage of rough-shifts

41

Page 42: Computational Models of Text Quality

Linear multi-factor regression Approximate the human score as a linear function

of the e-rater prediction and the percentage of rough-shifts

Adding rough shifts significantly improves the model of the score 0.5 improvement on 1—6 scale

How easy/difficult would it be to fully automate the rough-shift variable?

42

Page 43: Computational Models of Text Quality

Variants of centering and application to information ordering

Karamanis et al, 09 is the most comprehensive overview of variants of centering theory and an evaluation of centering in a specific task related to text quality

43

Page 44: Computational Models of Text Quality

Information ordering task

Given a set of sentences/clauses, what is the best presentation? Take a newspaper article and jumble the

sentences---the result will be much more difficult to read than the original

Negative examples constructed by randomly permuting the original

Criteria for deciding which of two orderings is better Centering would definitely be applicable

44

Page 45: Computational Models of Text Quality

Centering variations

Continuity (NOCB=lack of continuity) Cf(Un) and Cf(Un+1) share at least one element

Coherence Cb(Un) = Cb(Un+1)

Salience Cb(U) = Cp(U)

Cheapness (fulfilled expectations) Cb (Un+1) = Cp(Un)

45

Page 46: Computational Models of Text Quality

Metrics of coherence

M.NOCB (no continuity)

M.CHEAP (expectations not met)

M.KP sum of the violations of continuity, cheapness, coherence and salience

M. BFP seeks to maximize transitions according to Rule 2

46

Page 47: Computational Models of Text Quality

Experimental methodology

Gold-standard ordering The original order of the text (object description,

news article) Assume that other orderings are inferior

Classification error rate Percentage orderings that score better than the

gold-standard + 0.5*percentage of the orderings that score the same

47

Page 48: Computational Models of Text Quality

Results

NOCB gives best results Significantly better than the other metrics Consistent results for three different corpora

Museum artifact descriptions (2) News Airplane accidents

M.BFP is the second best metric

48

Page 49: Computational Models of Text Quality

49

Page 50: Computational Models of Text Quality

Entity grid(Barzilay and Lapata, 2005, 2008)

Inspired by centering Tracks entities across adjacent sentences, as

well as their syntactic positions

Much easier to compute from raw textBrown Coherence Toolkit

http://www.cs.brown.edu/~melsner/manual.html

50

Page 51: Computational Models of Text Quality

Several applications , with very good results

Information ordering Comparing the coherence of pairs of

summaries Distinguishing readability levels

Child vs. adult Improves over Petersen&Ostendorf

Entity grid: applications

51

Page 52: Computational Models of Text Quality

Entity grid example1 [The Justice Department]S is conducting an [anti-trust trial]O against [Microsoft

Corp.]X with [evidence]X that [the company]S is increasingly attempting to crush [competitors]O.

2 [Microsoft]O is accused of trying to forcefully buy into [markets]X where [its own products]S are not competitive enough to unseat [established brands]O.

3 [The case]S revolves around [evidence]O of [Microsoft]S aggressively pressuring [Netscape]O into merging [browser software]O.

4 [Microsoft]S claims [its tactics]S are commonplace and good economically.

5 [The government]S may file [a civil suit]O ruling that [conspiracy]S to curb [competition]O through [collusion]X is [a violation of the Sherman Act]O.

6 [Microsoft]S continues to show [increased earnings]O despite [the trial]X.

52

Page 53: Computational Models of Text Quality

Entity grid representation

53

Page 54: Computational Models of Text Quality

16 entity grid features

The probability of each type of transition in the text Four syntactic distinctions S, O, X, _

54

Page 55: Computational Models of Text Quality

Type of reference and info ordering(Elsner and Charniak, 2008)

Entity grid features not concerned with how an entity is mentioned

Discourse old vs. discourse new

Kent Wells, a BP senior vice president said on Saturday during a technical briefing that the current cap, which has a looser fit and has been diverting about 15,000 barrels of oil a day to a drillship, will be replaced with a new one in 4 to 7 days.

The new cap will take 4 to 7 days to be installed, and in case the new cap is not effective, Mr. Wells said engineers were prepared to replace it with an improved version of the current cap.

55

Page 56: Computational Models of Text Quality

The probability of a given sequence of discourse new and old realizations gives a further indication about ordering

Similarly, pronouns should have reasonable antecedents

Adding both models to the entity grid improves performance on the information ordering task

56

Page 57: Computational Models of Text Quality

Sentence Ordering

n sentences Output from a generation or summarization

system

Find most coherent ordering n! permutations

With local coherence metrics

◦ Adjacent sentence flow◦ Finding best ordering is NP complete

Reduction from Traveling Salesman Problem

57

Page 58: Computational Models of Text Quality

Word co-occurrence model(Lapata, ACL 2003; Soricut and Marcu, 2005) [23,24]

Idea from statistical machine translation Alignment models

John went to a restaurant.He ordered fish.The waiter was very attentive.……

John est allé à un restaurant.Il ordonna de poisson.Le garçon était très attentif.……

P(fish | poisson)

John went to a restaurant.He ordered fish.The waiter was very attentive.……

He ordered fish.The waiter was very attentive.John gave him a huge tip.……

P(ordered | restaurant)

P(waiter | ordered)

P(tip | waiter)

We ate at a restaurant yesterday.

We also ordered some take away.

58

Page 59: Computational Models of Text Quality

Discourse (coherence) relations

Only recently empirically results have shown that discourse relations are predictive of text quality (Pitler and Nenkova, 2008)

59

Page 60: Computational Models of Text Quality

PDTB discourse relations annotations

Largest corpus of annotated discourse relations

http://www.seas.upenn.edu/~pdtb/

Four broad classes of relations Contingency Comparison Temporal Expansion

Explicit and implicit

60

Page 61: Computational Models of Text Quality

Implicit and explicit relations

(E1) He is very tired because he played tennis all morning.

(E2) He is not very strong but he can run amazingly fast.

(E3) We had some tea in the afternoon and later went to a restaurant for a big dinner

(I1) I took my umbrella this morning. [because] The forecast was for rain.

(I2) She is never late for meetings. [but] He always arrives 10 minutes late.

(I3) She woke up early. [afterwards] She had breakfast and went for a walk in the park.

61

Page 62: Computational Models of Text Quality

What is the relative importance of factors in determining text quality? Competent readers (native English speaker)

graduate students at Penn Wall Street Journal texts

30 texts ranked on scale 1 to 5 How well-written is this article? How well does the text fit together? How easy was it to understand? How interesting is the article?

62

Page 63: Computational Models of Text Quality

Several judgments for each text Final quality score was the average

Scores range from 1.5 to 4.33 Mean 3.2

63

Page 64: Computational Models of Text Quality

Which of the many indicators will work best? Usually research study focus on only one or

two

How do indicators combine?

Metrics Correlation coefficient Accuracy of pair-wise ranking prediction

64

Page 65: Computational Models of Text Quality

Correlation coefficients between assessor ratings and different features

65

Page 66: Computational Models of Text Quality

Baseline measures

Average Characters/Word r = -.0859 (p = .6519)

Average Words/Sentence r = .1637 (p = .3874)

Max Words/Sentence r = .0866 (p = .6489)

Article length r = -.3713 (p = .0434)

66

Page 67: Computational Models of Text Quality

Vocabulary factors

Language model probability of the article

M estimated from PTB (WSJ) M estimated from general news (NEWS)

∏w

wCMwp )()|(

∑w

Mwpwc ))|(log()(

67

Page 68: Computational Models of Text Quality

Correlations with ‘well-written’ assessment Log likelihood, WSJ

r = .3723 (p = .0428)

Log likelihood, NEWS r= .4497 (p = .0127)

Log likelihood with length, WSJ r = .3732 (p = .0422)

Log likelihood with length, NEWS r = .6359, p = .0002

68

Page 69: Computational Models of Text Quality

Syntactic features

Average parse tree height r = -.0634 (p = .7439)

Avr. number of noun phrases per sentence r = .2189 (p = .2539)

Average SBARs r = .3405 (p = .0707)

Avr. number of verb phrases per sentence r = .4213 (p = .0228)

69

Page 70: Computational Models of Text Quality

Elements of lexical cohesion

Avr. cosine similarity between adjacent sents r = -.1012 (p = .5947)

Avr. word overlap between adjacent sentences r = -.0531, p = .7806

Avr. Noun+Pronoun Overlap r = .0905, p = .6345

Avr. # Pronouns/Sent r = .2381, p = .2051

Avr # Definite Articles r = .2309, p = .2196

70

Page 71: Computational Models of Text Quality

Correlation with ‘well-written” score

Prob. of S-S transition r = -.1287 (p = .5059)

Prob. of S-O transition r = -.0427 (p = .8261)

Prob. of S-X transition r = -.1450 (p = .4529)

Prob. of S-N transition r = .3116 (p = .0999)

Prob. of O-S transition r = .1131 (p = .5591)

Prob. of O-O transition r = .0825 (p = .6706)

Prob. of O-X transition r = .0744 (p = .7014)

Prob. of O-N transition r = .2590 (p = .1749)

71

Page 72: Computational Models of Text Quality

Prob. of X-S transition r = .1732 (p = .3688)

Prob. of X-O transition r = .0098 (p = .9598)

Prob. of X-X transition r = -.0655 (p = .7357)

Prob. of X-N transition r = .1319 (p = .4953)

Prob. of N-S transition r = .1898 (p = .3242)

Prob. of N-O transition r = .2577 (p = .1772)

Prob. of N-X transition r = .1854 (p = .3355)

Prob. of N-N transition r = -.2349 (p = .2200)

72

Page 73: Computational Models of Text Quality

Well-writteness and discourse

Log likelihood of discourse rels r = .4835 (p = .0068)

# of discourse relations r = -.2729 (p = .1445)

Log likelihood of rels with # of rels r = .5409 (p = .0020)

# of relations with # of words r = .3819 (p = .0373)

Explicit relations only r = .1528 (p = .4203)

Implicit relations only r = .2403 (p = .2009)

73

Page 74: Computational Models of Text Quality

Summary: significant factors

Log likelihood of discourse relations r = .4835

Log likelihood , NEWS r = .4497

Average verb phrases per sentence r = .4213

Log likelihood, WSJ r = .3723

Number of words r = -.3713

74

Page 75: Computational Models of Text Quality

Text quality prediction as ranking

Every pair of texts with ratings differing by 0.5

Features are the difference of feature values for each text

Task: predict which of the two articles has higher text quality score

75

Page 76: Computational Models of Text Quality

Prediction accuracy (10-fold cross validation) None (Majority Class) 50.21% number of words 65.84%

ALL 88.88%

Grid only 79.42% log l discourse rels 77.77% Avg VPs sen 69.54% log l NEWS 66.25%

76

Page 77: Computational Models of Text Quality

Findings

Complex interplay between features

Entity grid features not significantly correlated with ‘well-written score’ but very useful for the ranking task

Discourse information is very helpful But here we used gold-standard annotations Developing automatic classifier underway

77

Page 78: Computational Models of Text Quality

Implicit and explicit discourse relationsClass Explicit Implicit

Comparison 69% 31%

Contingency 47% 53%

Temporal 80% 20%

Expansion 42% 58%

78

Page 79: Computational Models of Text Quality

Sense classification based on connectives only Four-way classification

Explicit relations only 93% accuracy

All relations (implicit+explicit) 75% accuracy

Implicit relations are the real challenge79

Page 80: Computational Models of Text Quality

Explicit discourse relations, tasksPitler and Nenkova, 2009 [25]

Discourse vs. non-discourse use I will be happier once the semester is over. I have been to Ohio once.

Relation sense Contingency, comparison, temporal,

expansion I haven’t been to Paris since I went there on a

school trip in 1998. [Temporal] I haven’t been to Antarctica since it is very far

away. [Contingency]

80

Page 81: Computational Models of Text Quality

Largest available annotated corpus of discourse relations Penn Treebank WSJ articles 18,459 explicit discourse relations 100 connectives

“although” vs. “or”91% discourse 3% discourse

Penn Discourse Treebank

81

Page 82: Computational Models of Text Quality

Positive examples: discourse connectives Negative examples: same strings in PTDB,

unannotated

10-fold cross validation Maximum Entropy classifier

Discourse Usage Experiments

82

Page 83: Computational Models of Text Quality

Discourse Usage Results

83

Page 84: Computational Models of Text Quality

Discourse Usage Results

84

Page 85: Computational Models of Text Quality

Sense Disambiguation: Comparison, Contingency, Expansion, or Temporal?

Features Accuracy

Connective 93.67%

Connective + Syntax 94.15%

Interannotator Agreement 94%

85

Page 86: Computational Models of Text Quality

Automatic annotation of discourse use and sense of discourse connectives

Discourse Connectives Taggerhttp://www.cis.upenn.edu/~epitler/discourse.html

Tool

86

Page 87: Computational Models of Text Quality

Is there hope to have a usable tool soon?

Early studies on unannotated data gave reason for optimism

But when recently tested on the PDTB, their performance is poor Accuracy of contingency, comparison and

temporal is below 50%

What about implicit relations?

87

Page 88: Computational Models of Text Quality

Not easy to infer from combined results how early systems performed on implicits As we saw, one can get reasonable overall

performance by doing nothing for explicts

Same sentence [26]

Graphbank corpus: doesn’t distinguish implicit and explicit [27]

Classify implicits and explicits together

88

Page 89: Computational Models of Text Quality

Classify on large unannotated corpus

89

Page 90: Computational Models of Text Quality

Pitler et al, ACL 2009 [31] Wide variety of features to capture semantic opposition

and parallelism

Lin et al, EMNLP 2009 [32] (Lexicalized) syntactic features

Results improve over baselines, better understanding of features, but the classifiers are not suitable for application in real tasks

Experiments with PDTB

90

Page 91: Computational Models of Text Quality

Most basic feature for implicits

I_there, I_is, …, tired_time, tired_difference

Word pairs as features

I am a Iittle tired

there is a 13 hour time difference

Marcu and Echihabi , 2002

91

Page 92: Computational Models of Text Quality

The recent explosion of country funds mirrors the “closed-end fund mania of the 1920s, Mr. Foot says, when narrowly focused funds grew wildly popular.

They fell into oblivion after the 1929 crash.

Intuition: with large amounts of data, will find semantically-related pairs

92

Page 93: Computational Models of Text Quality

Using just content words reduces performance (but has steeper learning curve) Marcu and Echihabi, 2002

Nouns and adjectives don’t help at all Lapata and Lascarides, 2004 [33]

Filtering out stopwords lowers results Blair-Goldensohn et al., 2007

Meta error analysis of prior work

93

Page 94: Computational Models of Text Quality

Word pairs experimentsPitler et al 2009

94

Page 95: Computational Models of Text Quality

Function words have highest information gain

But…Didn’t we remove the connective?

95

Page 96: Computational Models of Text Quality

“but” signals “Not-Comparison” in synthetic data

96

Page 97: Computational Models of Text Quality

Pairs of words from the two text spans

What doesn’t work Training on synthetic implicits

What really works Use synthetic implicits for feature selection Train on PDTB

Results: Word pairs

97

Page 98: Computational Models of Text Quality

Comparison

21.96 (17.13)

Contingency

47.13 (31.10)

Expansion

76.41 (63.84)

Temporal

16.76 (16.21)

Best Results: f-scores

98

Comparison/Contingency baseline: synthetic implicits word pairsExpansion/Temporal baseline: real implicits word pairs

Page 99: Computational Models of Text Quality

Results from classifying each relation independently Naïve Bayes, MaxEnt, AdaBoost

Since context features were helpful, tried CRF

6-way classification, word pairs as features Naïve Bayes accuracy: 43.27% CRF accuracy: 44.58%

Further experiments using context

99

Page 100: Computational Models of Text Quality

If we had perfect co-reference and discourse relation information, would we be able to explain local discourse coherence

Our recent corpus study indicates the answer is NO

30% of adjacent sentences in the same paragraph in PDTB Neither share an entity nor have an implicit

comparison contingency or temporal relation

Lexical chains?

Do we need more coherence factors?Louis and Nenkova, 2010 [34]

100

Page 101: Computational Models of Text Quality

References

[1] Burstein, J. & Chodorow, M. (in press). Progress and new directions in technology for automated essay evaluation. In R. Kaplan (Ed.), The Oxford handbook of applied linguistics (2nd Ed.). New York: Oxford University Press.

[2] Heilman, M., Collins-Thompson, K., Callan, J., and Eskenazi, M. (2007). Combining Lexical and Grammatical Features to Improve Readability Measures for First and Second Language Texts. Proceedings of the Human Language Technology Conference. Rochester, NY.

[3] S. Petersen and M. Ostendorf, “A machine learning approach to reading level assessment,” Computer, Speech and Language, vol. 23, no. 1, pp. 89-106, 2009

[4] Finding High Quality Content in Social Media, Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis, Gilad Mishne, ACM Web Search and Data Mining Conference (WSDM), 2008

[5] Regina Barzilay and Lillian Lee, Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization, HLT-NAACL 2004: Proceedings of the Main Conference, pp113—120, 2004

101

Page 102: Computational Models of Text Quality

References

[6] Emily Pitler, Annie Louis and Ani Nenkova, Automatic Evaluation of Linguistic Quality in Multi-Document Summarization, Proceedings of ACL 2010

[7] Schwarm, S. E. and Ostendorf, M. 2005. Reading level assessment using support vector machines and statistical language models. In Proceedings of ACL 2005.

[8] Jieun Chae, Ani Nenkova: Predicting the Fluency of Text with Shallow Structural Features: Case Studies of Machine Translation and Human-Written Text. In proceedings of EACL 2009: 139-147

[9] Charniak, E. and Johnson, M. 2005. Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In Proceedings of ACL 2005.

[10] K. Collins-Thompson and J. Callan. (2004). A language modeling approach to predicting reading difficulty. Proceedings of HLT/NAACL 2004.

[11] Sarah E. Schwarm and Mari Ostendorf. Reading Level Assessment Using Support Vector Machines and Statistical Language Models. In Proceedings of ACL, 2005.

102

Page 103: Computational Models of Text Quality

References

[12] Automatically generating Wikipedia articles: A structure-aware approach, C. Sauper and R. Barzilay, ACL-IJCNLP 2009

[13] Halliday, M. A. K., and Ruqaiya Hasan. 1976.Cohesion in English. London: Longman

[14] B. Grosz, A. Joshi, and S. Weinstein. 1995. Centering: a framework for modelling the local coherence of dis- course. Computational Linguistics, 21(2):203–226

[15] E. Miltsakaki and K. Kukich. 2000. The role of centering theory’s rough-shift in the teaching and evaluation of writing skills. In Proceedings of ACL’00, pages 408– 415.

[16] Karamanis, N., Mellish, C., Poesio, M., and Oberlander, J. 2009. Evaluating centering for information ordering using corpora. Comput. Linguist. 35, 1 (Mar. 2009), 29-46.

[17] Regina Barzilay, Mirella Lapata, "Modeling Local Coherence: An Entity-based Approach”, Computational Linguistics, 2008.

[18] Ani Nenkova, Kathleen McKeown: References to Named Entities: a Corpus Study. HLT-NAACL 2003

103

Page 104: Computational Models of Text Quality

References

[19] Micha Elsner, Eugene Charniak: Coreference-inspired Coherence Modeling. ACL (Short Papers) 2008: 41-44

[20] Morris, J. and Hirst, G. 1991. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Comput. Linguist. 17, 1 (Mar. 1991), 21-48.

[21] Regina Barzilay and Michael Elhadad, "Text summarizations with lexical chains”, In Inderjeet Mani and Mark Maybury, editors, Advances in Automatic Text Summarization. MIT Press, 1999.

[22] Silber, H. G. and McCoy, K. F. 2002. Efficiently computed lexical chains as an intermediate representation for automatic text summarization. Comput. Linguist. 28, 4 (Dec. 2002), 487-496.

[23] Mirella Lapata, Probabilistic Text Structuring: Experiments with Sentence Ordering, Proceedings of ACL 2003.

[24] Discourse generation using utility-trained coherence models, R. Soricut & D. Marcu, COLING-ACL 2006

104

Page 105: Computational Models of Text Quality

References

[25] Emily Pitler and Ani Nenkova. Using Syntax to Disambiguate Explicit Discourse Connectives in Text. Proceedings of ACL, short paper, 2009

[26] Radu Soricut and Daniel Marcu. 2003. Sentence Level Discourse Parsing using Syntactic and Lexical Information. Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference (HLT/NAACL-2003)

[27] Ben Wellner, James Pustejovsky, Catherine Havasi, Roser Sauri and Anna Rumshisky. Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources. In Proceedings of the 7th SIGDIAL Workshop on Discourse and Dialogue

[28] Daniel Marcu and Abdessamad Echihabi (2002). An Unsupervised Approach to Recognizing Discourse Relations. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL-2002)

[29] Sasha Blair-Goldensohn, Kathleen McKeown, Owen Rambow: Building and Refining Rhetorical-Semantic Relation Models. HLT-NAACL 2007: 428-435

105

Page 106: Computational Models of Text Quality

References

[30] Sporleder, C. and Lascarides, A. 2008. Using automatically labelled examples to classify rhetorical relations: An assessment. Nat. Lang. Eng. 14, 3 (Jul. 2008), 369-416.

[31] Emily Pitler, Annie Louis, and Ani Nenkova. Automatic Sense Prediction for Implicit Discourse Relations in Text. Proceedings of ACL, 2009.

[32] Ziheng Lin, Min-Yen Kan and Hwee Tou Ng (2009). Recognizing Implicit Discourse Relations in the Penn Discourse Treebank. In Proceedings of EMNLP

[33] Lapata, Mirella and Alex Lascarides. 2004. Inferring Sentence-internal Temporal Relations. In Proceedings of the North American Chapter of the Assocation of Computational Linguistics, 153-160.

[34] Annie Louis and Ani Nenkova, Creating Local Coherence: An Empirical Assessment, Proceedings of NAACL-HLT 2010

106