a subtask of text simplification replacing words or short phrases by simpler variants in a context...

19
Lexical Simplification

Upload: eric-carr

Post on 17-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Lexical Simplification

Lexical Simplification

A subtask of text simplification Replacing words or short phrases by

simpler variants in a context aware fashion

Motivation To reach out to wider range of readers

having limited vocabulary▪ Children▪ People with low literacy level or cognitive

disability▪ Second language learners

Involved Processes

Identification of complex words or phrases

Substitute lookup Synonyms from thesaurus Distributional similarity

Context-based ranking

Examples

Technical Medical Language Hypertension risk factors include obesity,... High blood pressure risk factors include excessive

weight,... Legal Language

The Products transacted through the Service are... The Products managed through the Service are...

Low Literacy Readers Hitler committed terrible atrocities during the

second World War Hitler committed terrible cruelties during the

second World War

Related Approaches

Knowledge-based approach Using thesaurus, Wordnet Hard to capture all simplification contexts

Lexical simplification as paraphrasing Paraphrasing does not deal with complexity

reduction specifically Lexical simplification as machine

translation Requires a complex-simple parallel corpora Wikipedia-Simple Wikipedia corpora▪ Not comparable

Wikipedia: Resource for Lexical Simplification

Simple English Wikipedia (SEW) Edition of normal or Complex English

Wikipedia (CEW) written in simpler constructs with restricted vocabulary

Wikipedia for children, low literacy readers, second language readers etc.

121,095 content pages Semi-parallel to it’s complex counterpart

Resource: For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia, Yatskar et al.

Wikipedia: Evolution of an Article

Version 1

Version 2 Edits

Version n Edits

Edit Model

An Wikipedia article evolves from one version to other with different types of edits fix edits (): correction of grammar or

factual contents simplify (): simplification of lexical items

or phrases no-op (): no edit spam ():removal of spam

Edit Model

Edits in SEW versions are mix of different types of edits

The task Separate out only simple edits from

other edits

Edit Model

Definitions article in Wikipedia correspond to a title sequence of article versions caused

by successive edits for article A word or phrase if there is version in

that contains Lexical edit instances: ▪ in one version was changed to in the next

Edit Model

probability that is applied to probability of being modified to

under operation Probability that a phrase is edited to

Our interest Probability of for simplification edit

operation () ▪ Estimate

Edit Model: Simplifying Assumptions

For the sake of simplicity, discard spam edits ()

For no-op edit ()

Edit Model: Probability Estimates

Assumption occurrences of simplification in

ComplexEW are negligible in comparison to fixes▪ Only edits occur in ComplexEW

fraction of in containing modifications in

Probability estimation of fix edit

Edit Model: Probability Estimates

fraction of in containing modifications in

Assumption: probability of any particular fix operation being applied in SimpleEW is proportional to that in ComplexEW SimpleEW fix rate might be dampened because

already-edited ComplexEW articles are copied over

fix + simple edit

Edit Model: Probability Estimates

probability that A is changed to a different word in SimpleEW

Estimate of

Estimate of

Edit Model: Probability Estimates

Estimate of Fix operations are estimated from

ComplexEW

Edit Model: Probability Estimates

Estimate of Both and considered to occur in

SimpleEW

Edit Model: Probability Estimates

Lexical Simplification is Contextual

Resource: Putting it Simply: a Context-Aware Approach to Lexical Simplification, Biran et al.

Self Study