cloze procedure : a new tool for measuring readability · 2021. 1. 12. · test is for gauging a...

DEVOTED TO RESEARCH STUDIES IN THE FIELD OF MASS COMMUNICATIONS

FALL 1953I

"Cloze Procedure": A New ToolFor Measuring Readability

BY WILSON L. TAYLOR*

Here is the first comprehensive statement of a research methodand its theory which were introduced briefly during a workshopat the i953 AEI convention. included are findings from threepilot studies and two experiments in which "doze procedure"results are compared with those oj two readability formulas.

~ "CLOZE PROCEDURE" IS A NEW PSY-

chological tool for measuring the effec-tiveness of communication. The meth-od is straightforward; the data areeasily quantifiable; the findings seem tostand up.

At the outset, this tool was lookedon as a new approach to "readability."It was so used in three pilot studies andtwo experiments, the main findings ofwhich are reported here.

*The writer is particularly obligated to Prof.Charles E. Osgood, University of I11inois, andMelvin R. Marks, Personnel Research Section,A.G.O., Department of the Army, for instigatingand assisting in the series of efforts that yieldedthe notion of "cloze procedure." Both are ex-perimental psychologists. Among others who haveadvised, encouraged or otherwise aided are theseof the University of 11IInols: Prof. Lee J. Cron-bach, educational psychologist and statistician;Dean Wilbur Schramm, Division of Communica-tions; Prof. Charles E. Swanson, Institute ofCommunications Research, and George R. Klare,psychologist, both of whom have authored arti-cles on readability; and several journalism teach-ers who lent their classes. Kalmer E. Stordahland Clifford M. Christensen, until recently. re-search associates of the Institute, also contributed.

415

First, the results of the new methodwere repeatedly shown to conform withthe results of the Flesch and Dale-Challdevices for estimating, readability. Thenthe scope broadened, and cloze proce-dure was pitted against those standardformulas.

If future research substantiates theresults so far, this tool seems likely tohave a variety of applications, boththeoretical and practical, in other fieldsinvolving communication functions.

THE "CLOZE UNIT"

At the heart of the procedure is afunctional unit of measurement tenta-tively dubbed a "cloze." It is pro-nounced like the verb "close" and is de-rived from "closure." The last term isone gestalt psychology applies to thehuman tendency to complete a familiarbut not-quite-finished pattern-to "see"a broken circle as a whole one, for ex-ample, by mentally closing up the gaps.

416 JOURNALISM QUARTERLY

One can complete the broken circlebecause its shape or pattern is so famil-iar that, although much of it actually ismissing, it can be recognized anyway.

The same principle applies to lan-guage. Given "Chickens cackle and--- quack," almost anyone can in-stantly supply "ducks." If that wordreally is the same as the one omitted,the person scores one cloze unit forcorrectly closing the gap in the lan-guage pattern.

Note that the sentence pattern is acomplex one made up of many sub-patterns. One must know not only themeanings (i.e., patterns of symbol-meaning relationships) and forms (pat-terns of letters) of all the five words,but' also the meanings of given combi-nations of them-plus the fact that thesentence structure seems to demand aterm parallel to "cackle" but associatedwith ducks instead of chickens. In otherwords, one must guess what the muti-lated sentence means as a whole, thencomplete its pattern to fit that wholemeaning.

A cloze unit may be defined as: Anysingle occurrence of a successful at-tempt to reproduce accurately a partdeleted from a "message" (any lan-guage product) by deciding, from thecontext that remains, what the missingpart should be.

Cloze procedure may be defined as:A method of intercepting a messagefrom a "transmitter" (writer or speak-er), mutilating its language patterns bydeleting parts, and so administering itto "receivers" (readers or listeners)that their attempts to make the patternswhole again potentially yield a consid-erable number of cloze units.

HOW THE METHOD WORKS

As defined, the concept of cloze pro-cedure involves both oral and writtencommunication and does not specifyany particular kind of "part" for dele-tion. The research on which this report

is based, however, employed only read-ing materials and deleted only words.

In practice, the readabilities of twoor more passages of about equal lengthwere contrasted, for any given popula-tion, by:

1. Deleting an equal number ofwords from each passage by some es-sentially random counting-out system.Such a system was based on a table ofrandom numbers or else it simplycounted out every nth word (every fifthone, for example) without any regardfor the functions or meanings of spe-cific words.

2. Reproducing each mutilated pas-sage with a blank of some standardlength (so the length would not influ-ence the guessing) in place of everymissing word.

3. Giving copies of all reproducedpassages to all subjects-or to equalnumbers of randomly selected subjects-in a sample group representative ofthe population in question.

4. Asking all subjects to try to fill inall blanks by guessing, from the contextof remaining words, what the missingwords should be.

5. Totaling for each passage sepa-rately the number of times originalwords were correctly replaced, and con-sidering these totals as readabilityscores.

6. Contrasting the doze totals of thevarious passages. The passage with thehighest score was considered "mostreadable," the one with the second-highest score next-most readable, etc.-pending the outcome of statistical testsof the significance of the differences ob-served.

SOME DISTINCTIONS

Cloze procedure is neither just an-other readability formula nor just an-other form of the familiar sentence-completion test.

Not a FormulaThe cloze method is not a formula

at all.Neither in theory nor practice does it

resemble current "element counting"

"Cloze Procedure" 417

devices (Flesch, Dale-Chall, etc.) whichassume a high correlation between easeof comprehension and the frequency ofoccurrence of selected kinds of lan-guage elements-short or commonwords, short or simple sentences, cer-tain parts of speech, the active voice,"concrete" terms and such.

Cloze procedure counts no such ele-ments. It seems, however, to measurewhatever effects elements actually mayhave on readability. And it does so atthe same time that it is also taking ac-count of the influences of many otherfactors readability formulas ignore.

Typically, the formulas are insensi-tive to a particular population's previ-ous knowledge of the topic being dis-cussed. They cannot allow for the ef-fects of non-idiomatic uses of commonwords, nonsense combinations ofwords, awkward and confusing sen-tence structure or pronouns withoutdefinite antecedents. And the basic as-sumptions of formulas may be directlycontradicted.

"Respectability," despite its six syl-lables and high level of abstraction, ismuch easier for the average reader than"erg."

"He came in smiling and sat down"is not approximately two or three timesas difficult as "He came in. He wassmiling. He sat down."

"I came like Water, and like Wind 1go" from the Rubaiyat makes no senseto second-graders even if they do knowall the words.

One can think of cloze procedureas throwing all potential readability in-fluences in a pot, letting them interact,then sampling the result.

The procedure also might be likenedto a polling method with experimentalcontrols. It asks members of a popula-tion sample to demonstrate how wellthey understand the meaning of a muti-lated version of what some writer wroteby having them "vote" on what the

missing words should be. The passagewhose deleted words are most often"written in" on the "ballot" is electedmost readable.

More precisely, the cloze methodseems to deal with more-or-Iess parallelsets of meaning-pattern relationships.Different persons may express the samemeaning in somewhat differing ways,and the same language patterns mayhave differing meanings for differentpeople. Cloze procedure takes a meas-ure of the likeness between the pat-terns a writer has used and the patternsthe reader is anticipating while he isreading.

Not a Sentence-Completion TestObviously, cloze procedure is some-

thing like this familiar form of exami-nation. It is similar in that the subjectis presented with incomplete sentencesand there are blanks to be filled in fromcontext.

But the typical sentence-completiontest is for gauging a person's knowl-edge of specific and more or less inde-pendent points of information, hencethe words to be deleted are pre-evalu-ated and selected accordingly. And forevery new topic, some well-versed per-son must construct and tryout a newtest based on another set of informa-tion.

For one thing, cloze procedure dealswith contextually interrelated series ofblanks, not isolated ones.

For another, the cloze method doesnot deal directly with specific meaning.Instead, it. repeatedly samples the ex-tent of likeness between the languagepatterns used by the writer to expresswhat he meant and those possibly dif-ferent patterns which represent read-ers' guesses at what they think thewriter meant.

However, because it counts instancesof language-usage correspondence rath-


er than meanings themselves, the clozeunit seems to classify as a common de-nominator of communication success;and with it the readabilities of materialson totally different topics can be com-pared directly.

For this sort of contrast, an essen-tially random deletion of words seemsrequired. And this makes the task ofactually picking out what words to de-lete purely clerical-and so simple thatanyone who can count to ten can do itfor any sort of material, regardless ofits topic or difficulty.

SOME THEORETICAL CONSIDERATIONS

The main contributions to the notionof cloze procedure have come from theconcepts of "total language context,"Osgood's "dispositional mechanisms"and statistical random sampling.

1. Total Language ContextFor more than half a century, experi-

menters have been reporting findingsthat may be interpreted as showing thatlanguage behavior depends on "totalcontext."

The results indicate that the ability toidentify, learn, recognize, remember orproduce any language "symbol" (ele-ment or pattern) depends heavily on thevariable degrees to which it is associatedwith everything else by larger andmeaningful (familiar) overall combina-tions.t

The total context of any language be-havior includes everything that tends tomotivate, guide, assist or hinder thatbehavior. It includes verbal factors-grammatical skills and multitudes ofsymbols-and non-verbal ones such asfears, desires, past experience and in-telligence.

"I heard a -- bark" is likely toelicit "dog" both because that word ishabitually associated with "bark" and

1 For a comprehensive summary of these con-tributions to communication theory, the reader isreferred to George A. MUler's Language andCommunication (New York: McGraw-Hili, 1951).

because it fits in with past experiencewith noisy dogs. If the verbal context isenlarged to "For the first time, I hearda --- bark," the impulse to supply"dog" may be reduced by common-sense; the subject may ask himself:"Who is this guy that has never heard adog? Could he be referring to someother animal?" And if the precedingsentence has mentioned a voyage to thePribilof Islands, the reader may drawon past knowledge to produce "seal."Quite recently, Marks and Taylor re-

ported an experiment in which the in-fluences of varying intensities of bothverbal and non-verbal contextual fac-tors on the generation of language ele-ments were shown to be measurable byquantitative methods."

2. Dispositional MechanismsThe notion of cloze procedure was

"sparked" by implications of Osgood'slearning theory of communication. Herelates the "redundancies" and "transi-tional probabilities" of language to thedevelopment of "dispositional mechan-isms" that play a large part in bothtransmitting and receiving messages,"

Redundancy-"Man coming" meansthe same as "A man is coming this waynow." The latter, which is more likeordinary English, is redundant; it indi-cates the singular number of the subjectthree times (by "a," "man," and "is"),the present tense twice ("is coming"and "now"), and the direction of actiontwice ("coming" and "this way"). Suchrepetitions of meaning, such internalties between words, make it possible toreplace "is," "this," "way," or "now,"should anyone of them be missed.

• M. R. Marks and Wilson L. Taylor, "The Bf-fect of Goal and Contextual Constraints UponMeaningfulness of Language," paper presentedby Marks at annual meeting of American Psycho-logical Association, Chicago, 1951; summary inAmerican Psychologist, 6:325 (1951).

• For a fuller explanation of this topic, thereader Is referred to Charles E. Osgood:(a) "The Nature and Measurement of Meaning,"Psychological Bulletin, 49: 197-237 (May 1952);(b) A Theory of Language Behavior (tentativetitle), monograph in preparation, Institute ofCommunications Research, University of Illinois;(c) Method and Theory In Experimental Psychol-ogy (New York: Oxford, 1953).

"Gloze Procedure" 419

Transitional Probabilities - Somewords are more likely than others toappear in certain patterns or sequences."Merry Christmas" is a more probablecombination than "Merry birthday.""Please pass the ---" is more oftencompleted by "salt" than by "sodiumchloride" or "blowtorch." Some transi-tions from one word to the next are,therefore, more probable than others.

Dispositional Language Habits-Inlearning "to think in" a language, anindividual develops an enormous num-ber of complex verbal skill patterns-"bundles of skill sequences"-whichstand for innumerable kinds and shadesof meaning and tend to become so au-tomatic that they "run themselves off"in pertinent situations. These habits re-flect the redundancies and transitionalprobabilities of the language patternsthese skills involve.

Out of his personal experiences andcircumstances, each human develops hisown set of these habits. To the extentthat his set corresponds to the sets ofothers in his culture, he can communi-cate easily; he and they have learnedsimilar meaning-language relationships-the same patterns of symbols go withthe same meanings. But any two sets oflanguage mechanisms can differ consid-erably within the same culture; one manbecomes more disposed to run off par-ticular sequences than another mandoes. To the same extent, the relatedsets of redundancies and transitionalprobabilities can differ also.

Habits of expression take over mostof the work of translating an individ-ual's meaning into an organized seriesof language symbols for transmission toothers. Likewise, his habits of readingor listening cause him to anticipatewords, almost automatically, when heis receiving messages. When he sees thestart of a phrase that looks familiar, heimmediately tends to complete it in hisown way even when the written phraseactually ends differently.

When words come in sequences thatbest fit the existing receiving habits of areader, he understands with little effort.

When the symbols appear in less famil-iar sequences, comprehension is slowerand less sure. And sufficiently improb-able patterns seem like nonsense; theydo not stand for anything in his experi-ence.

3. Random DeletionA random deletion method (or an

every-nth equivalent) which ignores thedifferences between specific words ap-pears to be not only defensible but ra-tionally inescapable when doze proce-dure is used for contrasting readabili-ties.

The main reasons for this view relateto two questions most often asked bythose who have seen the data of thisreport.

Question 1: How can a random sys-tem play fair when some words areeasier to replace than others?

Obviously, one is more likely to beable to supply "an" in "A quinidine is-- alkaloid isomeric . . ." than toguess "$6,425" in "The city councilvoted for a new swimmingpool." Yet the former example is farmore difficult reading.

The answer is that if enough wordsare struck out at random, the blankswill come to represent proportionatelyall kinds of words to the extent thatthey occur. The matter boils down to"How many blanks are enough?"-aproblem to be settled by experiment.

Somewhat the same principle is in-volved in the substitution of a moreconvenient every-nth system for a ran-dom one. For several blanks, an every-nth system might tend to fall in withthe "rhythm" of an author's style andtake out mostly nouns, or mostly arti-cles. . . . The answer is that rhythmsbreak, and nth deletions, if continuedlong enough, start taking out other partsof speech and, eventually, yield theequivalent of random deletion. Again,the practical question is "How manyare enough?"


Question 2: Wouldn't a deletion sys-tem be more sensitive and more reliableif it dealt only with words classified,say, by their "importance to meaning"or their familiarity as gauged by tabledfrequencies of use?

The answer seems to be "No."For one thing, specified words or

kinds of words may not occur equallyoften in different materials. That factitself may be a readability factor, andits effect can be measured only by amethod that operates independently.

An attempt to restrict a counting-outsystem to "important" words (nounsand verbs, for example, as against arti-cles and conjunctions) may find thatone of two equally long passages con-tains twice as many "importants" as theother! What then?

Because the effect of such a differenceneeds to be included in-not excludedfrom-the results, it seems necessary tolet the occurrences of' presumably im-portant _words be represented propor-tionately in deletions.

What has just been stated about de-leting only "important" words applieswith equal force to varying degrees offamiliarity.

Also, it should be remembered thatdoze procedure deals only with wordsas they actually occur in larger pat-terns which stand for particular mean-ings at the time they are transmitted orreceived. The result is that infrequentlyused words may not be hard to replaceat all; and supposedly unimportantwords may become extremely so.

Most Americans can effortlessly sup-ply "tipped" and "lady" in "The politeold gentleman always his hatwhen he met a ---." The ability todo so has very little to do with the fre-quency with which those words, consid-ered individually, occur in the language;"the," which is hundreds of times morecommon, simply doesn't fit in eitherblank. The kind of frequency that mostmatters here is frequency-in-context.

An article can be more important to

meaning than any other word in its sen-tence, and harder to replace than thewords in the previous example. "Youwant to know what the wolf did to thesheep? He killed-- sheep." Note that"sheep," a noun and the object of theverb, matters hardly at all in its secondappearance. Also, if the missing word is"a," it would be quite difficult to guesscorrectly-because "the," "some," "ev-ery," "many," "no" or some finite num-ber could fit too.

"SCORES"-NOT FREQUENCIES

For the purpose of statistical analy-sis, cloze data are treated as "truescores" throughout this report.

This is in conformance with theopinion of Lee J. Cronbach. He saidthe nature of cloze results satisfies theassumptions for scores, but not thosefor chi-square frequency tests becausesuccessive cloze units cannot be con-sidered independent.

If, in "Then he took off his hat,""he" and "his" both were blanked out"getting the first right would probablymean getting "his" right too; just as"she" would go with "her."

At first sight, cloze results appearedto be frequencies (the mere number oftimes missing words were correctly re-placed). But "correctly" implies anunderlying qualitative continuum ofrelative rightness ranging from a com-pletely inappropriate word, throughpoor, medium and good synonyms, tothe exact word left out. Whether or notonly precise matches are counted doesnot affect the existence of such a con-tinuum.

GEN;BRAL PROCEDURE; PLAN OF REPORT

For clarity's sake, it seems best topresent the report of Experiment 1completely-specific purposes, proce-dure and results-before starting onExperiment 2. However, some proce-dural aspects were common to all re-search designs used.


For deletion purposes, a "word" wasdefined by the white spaces with whichan author had separated it from otherwords.

Contractions like "don't"; abbrevia-tions like "Mr." or "U.N."; figures like"1,006"; hyphenated combinations like"self-conscious"-all these were con-sidered as single words. But "self con-scious," written without the hyphen,counted as two.

All passages were 175-plus words inlength; the sentence in which the 175thword occurred was reproduced in full.

Only "mechanical" deletion systems,random or every-nth, were employed.

After mutilation, each version of apassage was reproduced on separatetypewriter-size sheets with an under-lined blank 10 spaces long in place ofeach missing word.

The sheets for anyone subject wereassembled in a predetermined order andstapled together with an explanatoryforeword. Such assemblies were ran-domly assigned to subjects.

Pilot studies used small groups ofsubjects, a dozen or less, chosen on acasual "handy and willing" basis. Thesubjects were not supervised and notime limits were set.

The experiments, however, usedlarger groups made up exclusively ofjuniors and seniors in University ofIllinois journalism courses. The admin-istrator read the foreword aloud andanswered questions regarding it beforesubjects began work. Each subject wasallowed from 10 to 15 minutes for fill-

ing in the blanks of each mutilatedpassage he received.

EXPERIMENT 1PURPOSES AND PROCEDURE

Three passages borrowed directlyfrom Flesch's How to Test Readability 4were used in Experiment 1 and the twopilot studies which preceded it. Listed ina footnote 5 are the original sources ofthe passages, the pages on which theyare found in the Flesch book, the "read-ing ease" scores given by that author,and the Dale-Chall scores computed bythis experimenter,"

The main purpose here was to see ifcloze scores also would rank these pas-sages as the formulas do. If the rankswere maintained, the differences amongthe cloze scores would be tested forstatistical significance.

The research designs of this experi-ment and its pilot studies also soughtanswers to several methodological ques-tions. It was asked whether significantdifferences in relative cloze scoreswould accompany variations in:

(1) Word-deletion systems-a. Random deletion vs. the more

convenient system of counting out everynth word.

b. Counting out every fifth word, forexample, vs. every tenth.

c. Counting out "few" (16) wordsper passage vs. "many" (35) words.(2) Presentation orders-

Presenting .mutilated passages in oneorder vs. presenting them in another.(3) Scoring methods-

Scoring as correct only those fill-ins----

• Rudolf Flesch, How to Test Readability (NewYork: Harper, 1951).

• Source-Author & WorkR.E. Score;

Page InterpretedDale-Chall

"raw" Gr. level

James Boswell, Life of Johnson 12 89-"easy" 6.4 7th-8thJulian Huxley, Man Stands Alone 16 68-"standard" 7.1 9th-10thHenry James The Ambassadors 22 47-"dlfficult" 9.2 13th-15th(NOTE: Dale:Chall scores yield lower-instead of higher-values for "easier" and are interpreted in

terms of school-grade levels.)• Method described in pamphlet: Edgar Dale and Jeanne S. Chall, A Formula for Predicting Read-

ability (Ohio State University, Bureau of Educational Research); reprints two articles from Educa-tional Research Bulletin. 27: 11-20, 37-54 (Jan., Feb. 1948).


that precisely matched original wordsvs. the more tedious process of judgingsynonyms and allowing half for each"good enough" one.

To explore the word-deletion ques-tions, each passage was mutilated bythree different systems: "Every fifth,""every tenth" and "random 10 per-cent."

In Pilot Study 1, the initial word,then every fifth word thereafter, wasknocked out until 35 words, about 20percent of the total number of originalones, were missing.

Both Pilot Study 2 and Experiment 1contrasted random and every-nth sys-tems of deleting about I0 percent oftotal wordage.

One system was based on a table ofrandom numbers; the other counted outevery tenth word. Each system deleted16 original words from each passage.

(Almost entirely different sets ofwords were taken out by the differentdeletion methods. There was no overlapat all between the every-nth systems.The random 10 percent system took outonly two of the same words deleted byeach of the every-nth ones.)

To discover possible order effects, allsix orders of three passages (abc, acb,bac, bca, cab, and cba) were equallyrepresented in each pilot study and inthe experiment.

The scoring-method question was dis-posed of after data of Pilot Study 2were evaluated in both ways and theresults were compared.

In summary, these hypotheses weredeveloped and tested by Experiment 1and its pilot studies:

1. Cloze scores would rank thethree passages in the same order as thetwo standard formulas.

2. For any condition of data gath-ering, the overall difference among thescores of the three passages would yielda significant value of F, as computed byanalysis of variance, double classifica-tion.

Because cloze data for the three pas-sages would be correlated, such compu-tation also _would yield an F for theoverall difference among subjects.

3. The relationship between thecloze scores for the three passageswould remain essentially the same de-spite the use of different word-deletionsystems or the specific words theycounted out, despite different presenta-tion orders, and despite different scor-ing methods.

Pilot Studies 1 and 2 used 6 and 12subjects, respectively. Experiment 1used 24 subjects. In every case, a sub-ject received all three passages.

RESULTS

This summary concerns itself mainlywith the findings of Experiment 1,which tested hypotheses inferred fromthe empirical findings of the precedingpilot studies. However, certain pilot-study results also are presented.

Some findings of both studies appearin Table 1 with reference to the "exist-ence" and discriminatory powers ofcloze scores as related to various meth-ods of gathering them. Further, PilotStudy 2 was itself the experimental testof the hypothesis, drawn from the firststudy, that the evaluation and scoringof synonyms would not be profitable(See Table 2).

1. "Existence" of Gloze Measureof Readability

Entries in Table 1 show how con-sistently cloze scores ranked the threeselected passages in the same order ofreadability as do the Flesch and Dale-Chall formulas. One could assume thatcloze procedure and the formulas weremeasuring the same thing.

2. Power of DiscriminationFor every experimental condition en-

tered in Table 1, analysis of varianceof the overall difference among the


cloze scores of the three passages yield-ed an F which was very highly signifi-cant, to above the 0.1 percent level ofconfidence.

All figures in the first five columns ofTable 1 (this excludes validation entriesfrom Experiment 2) involve correlateddata; that is, all subjects were given allthree passages, hence the type of analy-sis of variance called for ("double clas-

sification") discriminated among sub-jects as well as passages.

All but one of the between-subject F'swere significant to the 5 percent level orhigher.

3. Methodological FindingsResults tending to answer questions

about methodology appear in both Ta-ble 1 and Table 2.

TABLE IMaintenance of Ranks of Three Passages; Significance of

Overall Difference among Clozes

Pl/otStudy1 2

Experiment 1R-IO% Ev-lO Total

Experiment 2(validation)

182514.3%

243210%

CONDITIONS OF DATA GATHERING12 12 1216 16 1610% 10% 10%

S's per passage:.. . . . . . 6Blanks per passage: 35Deletion frequency: ... 20%

TOTAL CLOZE SCORES PER PASSAGERanked by Formulas

1. Bos-J: ••.•..... 136 118 113 87 200 1862. J. Hux: •.••..•. 87 97 80 71 151 1553. H. Jas: ......... 63 72 63 53 116 135

OVERALL DIFFERENCE WITHIN EACH GROUP OF THREE*Between Passages-

Computed F: •••• 00 57.855 15.447 60.455 9.867 42.151 9.418**Deg. Freedom: •• o' 2&10 2&22 2&22 2&22 2&46 2&69P less than: ........ .001 .001 .001 .001 .001 .001

Between Subjects-Computed F: ...... 12.159 2.531 5.735 1.872 3.327Deg. Freedom: 5&10 11&22 11&22 11&22 23&46P less than: ........ .001 .05 .001 (not)*** .001

*Significance of overall difference among scores in each condition computed by analysis of variance.• • • "Experiment 2's data for each passage were based on relatively separate groups of subjects,hence data considered uncorrelated and computation was by "simple classification"-one-dimensional-analysis of variance, for passages only....•••F of 1.872 falls to reach the 5 percent level of confi-dence.

This table shows (1) that the readability rank order given three passages by both theFlesch and Dale-Chall formulas was maintained by cloze scores obtained under a vari-ety of conditions; (2) that the overall difference between the scores of the differentpassages was significant in all conditions to above the 0.1 percent level of confidence;(3) that the two findings just stated were verified by subsequent data from Experiment2. Also, it was discovered that most conditions yielded F's which also discriminatedamong subjects. In the breakdown of Experiment 1 results, the fact that the every-tenthdeletion pattern did not yield a significant between-subjects F is evidence that it was a lessefficient pattern than its corresponding random 10 percent one, but the findings of bothpatterns were qualitatively similar, hence they were considered combinable for furtheranalysis.


a. Deletion Systems-Table 1 showsthat variations in totals of blanks perpassage, amounts of deletion rangingfrom 10 percent to 20 percent, randomand every-nth systems, and almost en-tirely different sets of specific wordsknocked out-all were accompanied byresults that consistently upheld both the"existence" of cloze procedure as areadability measure and its power todiscriminate between passages.

Qualitatively, then, all conditionsyielded the same results. There ap-peared, however, some quantitative dif-

ferences-that is, some conditions weremore "efficient" than others, particular-ly with regard to the discovery that theanalysis also discriminated among sub-jects.

The 35 blanks and every-fifth deletionof Pilot Study 1 discriminated better be-tween its six subjects than did PilotStudy 2, based on only 16 blanks, be-tween its 12 subjects. (Perhaps the for-mer group was simply more heterogene-ous, but it also may be that the degreeof significance found depends on plentyof blanks or on greater intensities ofdeletion.)

TABLE 2Cloze Data en Two Methodological Problems

(A)

Total287

1.00

353.51.00

Bos-I118

.41139.5

.39

WHEN SYNONYMS ARE CONSIDERED

From Pilot Study 2 Data(12 Subjects, 10 percent deletion, 16 blanks per passage)

PassagesI. Hux H. las.

97 72.34 .25

122.5 91.5.35 .26

Precise Matches Only: Score:p:*

Matches Plus Synonyms: •. . . • . • . . . . . . • . . .Score:en-count allowed for each of 133 judged p:

as "good enough")----

*Proportlons of total score associated with each passage.

This shows that allowing half-scores for "good enough" synonyms and adding thosecounts to the precise-match scores raised the scores somewhat but did not improve dis-crimination between passages. The proportions are almost identical.

(B)COMPARISON OF PRESENTATION ORDERS

From Data in Experiment 1(24 Subjects, 10 percent deletion, 16 blanks per passage;Random 10 percent and Every-10th results combined)

Total Cloze Scores for EveryPlace in Presentation Order

lst 2nd 3rd TotalPassagesBos-I: ••••• I •••• 65 66 69 200J.Hux: .......... 48 52 51 151H.Jas: ••••••• 1 •• 41 36 39 116

Totals: 154 154 159 467For anyone passage, Le., anyone row, the subtotals for each of the three places

differ only slightly. Order effects were assumed to be unimportant.


In the breakdown of Experiment 1results between random and every-nthsystems, the random 10 percent yieldeda highly significant between-subject F,put the every-tenth data, while qualita-tively similar, failed to reach the 5 per-cent level. (It seems that the contrastshould be interpreted as meaning thatrandom and every-nth systems will givemore nearly equivalent results if morethan 16 blanks are deleted per passage.)

b. Synonyms-Table 2 entries indi-cate that the more tedious method ofjudging synonyms as "good enough" tobe allocated half-counts yielded slightlylarger total scores for the passages, butthe degree of differentiation was virtu-ally identical to scoring only precisematches.

Note the similarity in the propor-tions of the two kinds of totals brokendown between passages.

c. Presentation Orders-Table 2also shows that the orders in which thethree passages were presented had vir-tually no effect on their scores. (It wasassumed, therefore, that order effectsare unimportant.)

EXPERIMENT 2

PURPOSES AND PROCEDURE

The list of passages for this experi-ment was expanded to eight by addingfive more to the three already used.

Two considerations governed thechoice of additional passages. Desiredwere (1) a list of passages that seemedto be distributed fairly evenly over along hard-to-easy range and (2) the in-clusion of materials which, it wasthought, would show that cloze proce-dure could "handle" passages whichneither of the standard formulas could.

Both considerations were served bythe choice ofpassages from:

Erskine Caldwell, Georgia Boy 7Gertrude Stein, Geography and Plays 8James Joyce, Einnegans Wake 9

In the experimenter's opinion, thefirst of these is by far the easiest to readof all the passages used, and the othertwo are next-hardest and hardest, re-spectively. It was expected that whatwere subjectively assumed to be the"true" readabilities of all three selec-tions would be under-rated or over-rated by either or both formulas.

The Stein selection, although itseemed comparatively unintelligible, wascharacterized by words both short andfamiliar and by short sentences. . . . Itwas thought that both formulas wouldover-rate its readability inasmuch as theFlesch device counts only the numberof syllables per word and the number ofwords per sentence, and the Dale-Challmethod counts only the number ofwords per sentence and the number ofwords not on a list of familiar words.

The other two passages were expect-ed to "fool" the Flesch formula morethan the Dale-Chall one.

It was thought that the Caldwell se-lection would be under-rated by theFlesch method because it contains ratherlong sentences consisting largely of in-dependent clauses connected by "and,"and because this fact would not be off-set by consideration of the familiarityof the vocabulary.

Flesch's formula was expected toover-rate the Joyce selection's readabil-ity. That passage is made up of rela-tively short words and sentences, butmany of the words appear in no diction-ary.

The two selections still needed tobring the total to eight were chosenafter trying out four other passages inFlesch's book on readability testing.Those four, together with the placeswhere they are reprinted, were from

• From short story, "My Old Man and PrettySooky," passage beginning "My old man pickedup one morning •.." p. 80 (New York: Avon,1946).

8 Passage beginning "So the main is seen andthe green is green," pp. 84-85 (Boston: FourSeas, 1922).

• Passage beginning "But who comes yond withpire on poletop?" p. 244 (New York: Viking,1943).


Swift's Gulliver's Travels, p. 14;Charles Dickens' Bleak House, P: 15;William James' Psychology, P: 19; andArticle I, Section 10, U. S. Constitu-tion, p. 28.

Altogether, 10 passages, these fourand the six already earmarked for Ex-periment 2, were used in Pilot Study 3.Each of the six subjects received copiesof all and was allowed to take themhome and complete them at leisure.

All passages used in this pilot studyand the final experiment were mutilatedby still another deletion system; everyseventh word was counted out untileach passage had 25 blanks.

Analysis of the pilot study's resultsshowed that the Swift and Dickens se-lections tended to distribute themselvesconveniently with respect to the sixpassages already chosen, hence the onesfrom William James and the Constitu-tion were discarded.

The eight passages thus chosen forExperiment 2 then were ranked by theirmedian cloze scores, and the order inwhich they fell was considered a predic-tion of how larger groups would rankthem.

Additional Flesch and Dale-Challscores were computed, and the ranks inwhich they fell also were considered aspredictions.

These were the chief hypothesestested by Experiment 2:

1. The main results of Experiment 1would be validated.

2. The readability order of eightpassages, as empirically determined byPilot Study 3, would dependably pre-dict the rank order of the cloze scoresgiven them by other and larger samplesof subjects.

3. Cloze procedure would handlematerials which either or both of thestandard formulas could not.

4. Cloze predictions would be more

successful than those of the two formu-las in guessing the ranks of final clozescores.

5. The overall difference among thecloze scores of the various passageswould be found significant when sub-mitted to analysis of variance.

Because only half an hour was avail-able for the administration of Experi-ment 2, each subject received only twoof the eight passages. Each of the selec-tions Pilot Study 3 indicated as amongthe four easier ones was paired witheach of those which seemed to beamong the four harder, the easier al-ways preceding the harder in assembly.

The 16 kinds of pairs were represent-ed four times each in the final data, towhich a total of 72 subjects contrib-uted. Each passage was administered to18 subjects.

The final experiment's data also weresubjected to certain exploratory at-tacks:

1. To see how finely the cloze meth-od seemed to discriminate betweenreadability levels, passages adjacent infinal rank were taken two at a time,and the significance of the differencebetween their mean scores was com-puted by the t-test for small samples.

2. What may be called the "internalconsistency" of cloze results was at-tacked from the standpoints both ofpassages and of subjects by "fractiona-tion" methods.

The total cloze score for each pas-sage was broken down into subtotalsshowing the aggregate score for thatpassage for the first five blanks, the sec-ond five blanks, etc., until all fifths of25 blanks were represented. The pas-sages then were ranked by these sub-totals for each five-blank series. Thenthe degree of overall rank correlationfor all passages for all five of thesefifths of total blanks was computed.

Then the question was asked: Didsubjects who were higher in total cloze


scores also tend to make consistentlyhigher scores for fractions of the 25-blank series? The subjects for each pas-sage were divided into thirds. and theperformances of those in the upperthird-as judged by total scores-werecompared with the performances ofthose in the lower third.

RESULTS

Data pertinent to all findings underthis heading, except for the "validation"one disposed of in the next paragraph,are shown in Tables 3 and 4.

1. Validation of Experiment l'sFindings

Entered in Table I are certain ofExperiment 2's results that verify the"existence" and power-of-discriminationfindings for the three passages used inExperiment I and its pilot studies.

Note that Experiment 2 used still an-other deletion system on the passagesby Boswell, Huxley and Henry James.Its every-seventh method took out a setof 25 words that overlapped the 10 per-cent systems, random and every-tenth,only two words each; and it struck outonly five of the 35 words which theevery-fifth system deleted.

2. Predictive Success of Pilot Study 3Table 3 shows that Pilot Study 3's

median scores for six subjects rankedthe eight passages in almost exactly thesame way as did the performances ofthe IS-subject groups used in Experi-ment 2.

The rank difference coefficient of .98exceeds the 0.1 percent level of confi-dence. The Boswell and Huxley pas-sages in the middle of the distributionchanged places, but Table 4 shows thatthose two passages did not receive sig-nificantly different cloze totals anyway.

Note that although the same six sub-jects participated in the predictions forall eight passages (hence Pilot Study 3'sdata were correlated), in the experi-ment that followed, each passage wasrated by the performance of a relativelyseparate and independent group of sub-

jects. Experimental conditions allowedonly two passages to be administered toanyone subject, and the experiment'sdesign was such as to reduce correlationbetween the results of any two passagesto the minimum of 28 percent; that is,only five of the 18 subjects who ratedany passage also were among the 18who rated any other passage. Experi-ment 2's scores for different passageswere, therefore, assumed to be inde-pendent.

3. Handling of Certain MaterialsIt seems indisputable that cloze pro-

cedure came closer than either of thestandard formulas to ranking properlythe relative readability levels of certainpassages.

The Stein passage, for example, isranked first by both formulas. TheFlesch method concludes it is "veryeasy" reading, and the Dale-Chall scorerates it as within the comprehensionlevel of fourth or fifth grade schoolchildren!

Otherwise, the two formulas did notperform very similarly. The Dale-Challranks for the Caldwell and Joyce selec-tions appear to make much more sensethan the Flesch ratings of them.

4. Gloze Predictions Better forPopulation Used

It follows from the findings just setdown that previous cloze results weremore successful than those of the twostandard formulas in predicting theranks of future cloze results for thepopulation used.

Although the rank correlation, .70,between the Flesch and Dale-Chall pre-dictions is moderately significant (toabove the 5 percent level of confidence),neither of those sets of predictions cor-relate significantly either with cloze pre-dictions or with final cloze results.

The lower part of Table 3 demon-strates how the removal from consider-ation of the data from any one-or anytwo-of the three handpicked passages(Stein, Joyce and Caldwell) wouldchange the measures of the predictiveaccuracies of the formulas.


5. Significance of Difference AmongPassages

Table 4 shows that analysis of vari-ance, simple classification, yielded an Ffor the overall difference among thescores of various passages which is sig-nificant far above the 0.1 percent level.

Because each passage was adminis-tered to a relatively distinct group of 18subjects, Bartlett's homogeneity of vari-ance test was used. Its non-significantfinding justifies the assumption that thedifferences between passage scores neednot be attributed to mere differences be-tween the groups.

TABLE 3Cloze, Flesch & Dale-Chall Predictions Compared with

Respect to Experiment 2 Cloze Results(Every-7th; 25 Blanks Per Passage; 6 S's Predict for Groups of 18.)

RANK-ORDER PREDICTIONSCloze Standard Formulas Exp. 2 Results

Pas- P. Study 3 Flesch Dale-Chall Cloze FInalsages Mdn. RO R.E. RO "Raw" RO Totals RO

Caldw 19.5 1.0 79 4.5 5.6 2.0 336 1.0Dickn ........ 16 2.0 69* 6.0 6.7 4.0 263 2.0Bos-J ........ 12 3.0 89* 2.0 6.4 3.0 186 3.0J. Hux ....... 11 4.0 68* 7.0 7.1 6.0 155 5.0Swift ••• f •••• 10.5 5.0 80* 3.0 7.0 5.0 170 4.0H.Jas .......• 9 6.0 47* 8.0 9.2 7.5 135 6.0Stein ......... 8.5 7.0 96 1.0 5.0 1.0 123 7.0Joyce ........ 3 8.0 79 4.5 9.2 7.5 49 8.0

RANK-DIFFERENCE CORRELATION COEFFICIENTS

Between Predictions Success of PredictionsCloze vs, Flesch -.12 Cloze vs. Final. .•...........Cloze vs. Dale-Chall . . . . . . . . . . . .44 Flesch vs. Final .Flesch vs. Dale-Chall.. . . . . . . . . . .70 Dale-Chall vs. Final. .

.98**-.02

.46

Effects of Including Joyce, Caldwell, and Stein Passages

(As determined by making removals indicated, re-ranking remaining selections,and recomputing coefflcients.)

PredIctIons w/o w/o w/o w/o w/o w/ovs. Final: Joyce Caldw SteIn J.&C. J.&S. C.&S •

Cloze .96***Flesch -.04Dale-Chall .21

.96-.04

.40

.96.33.92

.94***-.03

.09

.94

.64

.94

•94.71.93

·These five R.E.'s reported by Flesch himself; remaining R.E.'s and all Dale-Chall scores com-puted by experimenter.

··Slgnificant; P less than .001.···Slight reduction in cloze prediction coefficients reflects shortening series of passages-from 8 to 7.

then to 6.

The ranks of Pilot Study 3 scores from six subjects are shown to have predicted almostperfectly how larger groups of subjects would rank eight passages. The doze predictionssuffered only one reversal; it occurred in mid-scale and involved two passages whosescores were not significantly different anyway (see Table 4). The two standard formulasstumbled in their predictions. Both rated the Stein passage as most readable. The Fleschformula had a bad time with the Caldwell and Joyce passages; it rated them equally as"fairly easy."


TABLE 4Significance of Differences between Means

of Passages Adjacent in RankData from Experiment 2

(18 Subjects and 25 Blanks Per Passage)

Passages Total p. (AUN's =18) Slgnlf. DID.In Final Cloze of Betw, Adjac,Rank O. Score 450 Means S.D. Means

Itt" p

Caldw •••• 0 •••• 0 •• 336 .747 18.67 1.5447.068 .001

Dickn ............. 263 .584 14.61 1.8006.719 .001

Bos-J ............. 186** .413 10.33 1.9151.179 .25

Swift 0 •• ••••••••• • 170** .378 9.44 2.4541.035 .31

J.Hux ............ 155** .344 8.61 2.2141.364 .18

H.Jas ............. 135** .300 7.50 2.522.853 .40

Stein .............. 123** .273 6.83 2.0346.990 .001

Joyce •••••••• ,., o' 49 .108 2.72 1.3258.474' .001

(Zero) ............. (0) (.000) (0.00)Bartlett's Homog. of Var.: Chi-square = 13.582 w/7 df, P .10 to .05; not significant;

group heterogeneity rejected.Analysis of Var., "Simple": F = 100.183 w/7&136 df, P less than .001; difference

among passages highly significant.

·Proportlon observed cloze score was of total possible perfect score: 18 x 2S = 450.··Comparlson of alternate pairs of passages yielded these additional results:

fet" PBos·] ys. s. Hux 2.421 .03Swift vs, H. 1as 2.274 .031. Hux vs, Stein 2.442 .02

Each passage was administered to a relatively separate group of 18 subjects, henceanalysis assumed the means to be non-correlated. Actually, because each subject ratedtwo passages, there was a fractional correlation between the subjects involved with anypair of passages; five subjects would be in both groups of 18. It was assumed legitimateto ignore this 28 percent overlap. . . . All P's shown are "two-tailed"; under justifyingexperimental conditions, such P's as these could be halved and significance levels raised.

EXPLORATORY FINDINGS

1. Differences between AdjacentPassages

Table 4 shows the results of t-testanalysis of the differences between themean scores of passages adjacent infinal rank.

The sizes of the P's indicate that the

Caldwell passage scored significantlyhigher than the Dickens one, the Dick-ens one significantly higher than themiddle five (Boswell, Swift, Huxley,Henry James and Stein), and thatwhole group higher than the Joyce se-lection. Also, the last was significantlyabove hypothetical zero.


TABLE 5Clozes, Net and Accumulated, for Fifths of 25 Blanks

1st 5

lligh 8~: 127 (127)

Low 8's: 78 (78)

2nd 5

119 (246)

76 (154)

3rd 5

102 (348)

69 (223)

45th 5

118 (466)

85 (308)

5th 5

104 (570)

60 (368)

Within the group of five, no passagescored significantly higher than the oneadjacent in rank, but those in alternateranks were associated with moderatelysignificantly different scores. (See foot-notes of Table 4.)

The P's in the table are all "two-tailed." That means that when experi-mental conditions justify the process-which passage is the more readable isalready known, therefore the only ques-tion is how much-such P's could behalved and the level of significance moreeasily reached.

2. Internal ConsistencyThe overall rank correlation between

the performances of all eight passages,as expressed by their score subtotals onfive-blank fifths of 25 blanks, wascomputed by a method described byKendall.t? It yielded a "W," or "coeffi-cient of concordance," of .56, which issignificant to above the I percent levelof confidence.

This suggests that the passages wouldhave been ranked in about the sameway even if only, say, half as manywords had been deleted in each pas-sage.

This "w" of .56 is the equivalent ofan "average rho" of .455, which couldotherwise be computed by finding theten possible rank difference coefficientsbetween five sets of rankings, taken twoat a time, and calculating the mean.

Further investigation into passageconsistency considered larger fractionsof 25-blank totals. When data associated

,. Maurice G. Kendall, The Advanced Theoryof Statistics, Vol. 1 (London: Griffin & Co.,1948).

with the 13th blank were discarded andresults for the remaining 24 blanks weredivided into thirds, the three ranks ofsubtotals yielded a higher W, of .68.And when all data were divided intodovetailed halves corresponding to odd-and even-numbered blanks (with thedata of the 25th blank split betweenthem) the two sets of rankings yieldeda rank correlation coefficient of .95.

To investigate subject consistency,the total scores of those six subjectswho scored highest on every passagewere broken down to correspond tofive-blank fifths of 25 blanks. Then thesubtotals for the different passages werecombined. The same was done for thesix who scored lowest on every passage.The contrasting pairs of subtotals areshown in Table 5.

Pushing this line of inquiry farther,the aggregate dozes of the. "highs" and"lows" for each passage were examinedblank by blank as they accumulated. Inthe case of every passage, the cumula-tive score of the high scorers quicklyseparated from that of the low group-usually within the first five blanks-after which the groups continued todiverge.

SUMMARY AND CONCLUSIONSIt was assumed that the relative read-

abilities of two or more written pas-sages of about equal length could becontrasted by mechanically deleting anequal number of words in each, thentotaling, for each passage, the numberof times subjects succeeded in repro-ducing missing words.


1. Ranked Relative ReadabilityUnder a variety of data-gathering

conditions, cloze scores consistentlyranked three selected passages in thesame way that the Flesch and Dale-Chall readability formulas do. If theformulas accurately ranked them, clozeprocedure did too.

2. Statistically Useful ContrastsAlso found was evidence that the

cloze method could discriminate effec-tively between different levels of reada-bility-that is, was sufficiently sensitiveto yield statistically contrastable scores.

The overall difference among thescores of the three passages most usedwas found to be highly significant underall data-gathering conditions.

Likewise, the overall difference amongthe scores of eight passages used in thefinal experiment was highly significant.

Further, exploratory analysis of thefinal experiment's data showed thatsome of the individual differences be-tween the scores of passages adjacent inrank, taken two at a time, were highlysignificant. And all differences betweenthose alternate in rank were at leastmoderately significant.

3. Reliability and PredictionThe mere repetitive ranking of three

passages in the same way was roughevidence of the reliability of cloze re-sults. More dramatic was the findingthat the median scores of six subjectson eight passages ranked those passagesin almost precisely the same way as didsubsequent scores of relatively inde-pendent groups of 18 subjects, onegroup to a passage.

Further evidence of the cloze meth-od's reliability as a way to comparepassages was the exploratory "internalconsistency" finding that a significantoverall rank correlation existed betweenthe ways in which passages ranked oneach five-blank fifth of Experiment 2's2S-blank series.

4. Range of ApplicabilityCloze procedure, in both a pilot study

and the final experiment, assessed theassumed "true" readabilities of passagesby Caldwell, Gertrude Stein and JamesJoyce more adequately than either orboth of the standard formulas. Bothformulas, for example, seemed to errbadly in ranking the Stein passage asmost readable of the eight employed.

However, the point here is not thatthe formulas did poorly, for that waspreordained. Those three passages werehandpicked because their characteristicsviolated certain common assumptions.about what sorts of language elementscorrelate highly with readability. Thereal point is that cloze procedure ap-peared to handle those passages ade-quately.

5. MethodologyAnalyses of cloze scores gathered by

contrasting methods yielded qualita-tively similar and quantitatively signifi-cant differences for passages despite:

a. Different mutilation systems-ran-dom vs, every nth; deletion densitiesvarying from 10 percent to 20 percentof original words; totals of blanks vary-ing from 16 to 3S per passage; and al-most entirely different sets of specificwords blanked out.

b. The orders in which two or morepassages were administered to the samesubject.

c. Ignoring vs. allowing for syno-nyms in final scoring.

However, some methods seemed tobe relatively more efficient quantitative-ly. More needs to be determined aboutthe comparative advantages of randomvs, every-nth systems; various intensitiesof deletion; minimum sizes of popula-tion samples, etc.

6. Between SubjectsThe possibility of using cloze proce-

dure for contrasting the reading abilitiesof different individuals-as opposed to

432 J9URNALISM QUARTERLY

the readabilities of different passages-was clearly suggested by the between-SUbject F's associated with finding be-tween-passage F's in the correlated dataof Pilot Studies 1 and 2 and Experi-ment 1.

This suggestion also is supported bythe results of the "internal consistency"exploration of the fractionated per-formances of high- and low-scoring in-dividuals in Experiment 2.

DISCUSSIONEvaluation

At present, the more outstandingcharacteristics of cloze procedure ap-pear to be: The fact that it worked, its"mechanical" simplicity, the range ofinfluences it involves, and its apparentpossibilities for future use.

All hypotheses developed and testedseemed to stand up. Of course, the con-clusions do not yet bear much general-izing-they apply only to the specificpassages employed-but they were con-sistent.

To avoid misunderstanding, it is em-phasized that the observed contrasts inthe readability of passages are not to beextendedeither to the works from whichthe selections were drawn, or to thestyles of the authors themselves. Tomake conclusions about the works orthe authors would require the use ofadequately representative samples ofmaterials.

The practical advantages of a proce-dure that is simple and straightforward,that does not involve "experts" or diffi-cult administrative judgments, are ob-vious.

The discovery that presentation ordershad no important effects on the score ofa passage tends toward simplification ofexperimental designs.

The finding that the trouble of evalu-ating and scoring synonyms was unprof-itable was most welcome. It appearedthat the problem of "coder reliability"

that so plagues "content analysis" could.be avoided.

Because various kinds of deletion sys-tems yielded corresponding results, mostof the questions concerning them seemto boil down to using enough blanksand determining, by further experiment,whether an every-fifth system is more orless efficient than, for example, an every-tenth or every-15th one. Or how manyevery-nth blanks are necessary to yielda dependable equivalent to a randomsystem that takes out enough words.

Potentially important, it seems, is thefact that a cloze score appears to be ameasure of the aggregate influences ofall factors which interact to affect thedegree of correspondence between thelanguage patterns of transmitter and re-ceiver. As such, its potential usefulnessis by no means confined either to read-ability or to the reading abilities ofindividuals.

It seems that the effect of any manip-ulable factor, or any combination oftwo or three of them, might be meas-ured by properly controlled experimentswhich contrast the cloze performancesof groups equated or homogeneous withrespect to other factors.

As it now stands, cloze procedure notonly needs research to make it as effi-cient as possible but it needs to be bet-ter validated and the range of its pos-sible applications explored.'

More research is under way. Some ofit is directed at the problem of validat-ing cloze scores as measures of com-prehension-that is what the term"readability" seems to imply. Tentativeresults indicate that cloze scores corre-late highly with the scores of tests de-signed to measure comprehension andgeneral intelligence.

Regarding "Formulas"

It is not the purpose of this report todisparage the Flesch, Dale-Chall andother formulas. The two named were


chosen for experiment because they areamong the more prominent in the field.

Also, it must be admitted that theseand most other formulas have certainadvantages over cloze procedure.

They are easier and quicker to apply.Their use does not require word dele-tion, the reproduction of materials, ex-perimental controls, and representativepopulation samples. (They do, ofcourse, have similar material-samplingproblems.)

Also, for what may be called "stand-ard" materials, these formulas seemreasonably accurate-the occurrencesof the elements they choose to countusually do correlate better than chancewith such criteria of validity as compre-hension test scores and lists of gradedreadings.

And they are "reliable." With rela-tively little training, different users ofthe same formula get virtually identicalresults for the same materials. Also, theresults of different formulas have often

been shown to correlate significantly.Their disadvantages, however, are

too important to be overlooked.There seems no positive way to iden-

tify in advance which materials are"standard" enough to be handled by aparticular formula.

"Reliability" isn't everything; a for-mula can be reliably wrong. Anyoneusing either the Flesch or Dale-Challformula on the eight passages in Ex-periment 2 would have to conclude thatthe Stein selection is the "easiest."

Also, it seems that a readabilitygauge might well be flexible enough toapply not only to a generalized "aver-age American," for example, but to par-ticular populations too. It is a little un-reasonable that a single readabilityscore for an article on cattle breedingshould apply alike to residents of Texas"cow country" and metropolitan Brook-lyn. In such cases, it appears that theuser of a formula might employ dozeprocedures to check up on his results.

"The editorial page [should be] the instrument through which a news-paper seeks to influence public opinion in what its publisher and editors,its reporters and deskmen consider to be the right direction.

"To do this effectively, a newspaper must work at the task steadily andconsistently, not be forever jumping from one crusadeto another. Afterall, a newspaper, like an individual, only has so many basic ideals andprinciples. But by applying them to various situations as they arise, a news-paper can indoctrinate its readers with its own beliefs and obtain their

.acceptance in the community."We do not like it if the people of Toledo ask what stand the Blade is.

going to take on a sharp, given issue. If we have done our work well andmade our principles known, they should know how we will apply them toany particular problem.

"And so when a storm of controversy breaks out over the admission ofNegroes to public housing projects where segregation had been practiced,it is not so much what we say then that will calm the furor and lead to theright solution. It is what we have been saying on our editorial page foryears which counts when such a crisis comes."-MICHAEL BRADSHAW,editor, the Toledo Blade, in address at 1953 AEJ convention.

cloze procedure : a new tool for measuring readability · 2021. 1. 12. · test is for gauging a...

Documents