an investigation of the selective deletion cloze test as a ... · an investigation of the selective...
TRANSCRIPT
An Investigation of the Selective Deletion Cloze Testas a Valid Measure of Grammar-Based Proficiency in
Second Language Learning
Gregory S. Hadley and John E. Naaykens
rffiNt44-gtiFilLifif9tJ1997*12)q
An Investigation of the Selective Deletion Cloze Testas a Valid Measure of Grammar-Based Proficiency in
Second Language Learning
Gregory S. Hadley and John E. Naaykens
Introduction
Few issues in the field of second language research have been as contentious as doze testing.
Over the years, opinions i n the T E F L academic community have been divided over the
applicability of doze tests for the second language classroom. Some contend that doze tests
measure a language learner's overall communicative abil ity in the target language (Hanania
and Shikhani 1986) Others maintain that doze tests assess only the most basic o f second
language learning and reading comprehension (Shanahan, Kamil and Tobin 1982). Still others
support a moderate position. Ikeguchi (1995), who quotes Bachman (1990:86-89) , states that
doze testing:
hold Es] potential for measuring aspects of students' written grammatical competence,
"knowledge o f vocabulary, morphology, syntax, and phonology," as we l l as textual
competence, "knowledge of cohesive and rhetorical properties of text" in second language
(II 167).
Some years earlier, Bachman (1982:61-70) reported that certain types of doze tests, such as
the selective deletion doze, can be used t o investigate a subject's knowledge o f wri t ten
discourse items such as context cohesion, syntax and strategic textual comprehension.
Anderson (1979) adds that doze testing correlates more closely with grammar tests than with
reading tests, and according to Bowen et al. (1985:376), the selective deletion doze is ideal for
testing vocabulary and grammar. Claims such as these should prompt us to f ind ou t for
ourselves if doze tests, such as the selective deletion doze, can measure a subject's knowledge
of grammar. Would students with higher scores on a selective deletion doze test also score
higher on a criterion-referenced examination designed to measure grammatical competency?
— 111 —
We will consider this question as we review a 1996 study conducted at Niigata University. The
purpose of this study was to investigate whether the selective deletion doze correlates highly with
traditional, grammar-based tests. Many language teachers in the national university system opt
for criterion-referenced tests (C-RTs) wh ich attempt to measure grammatical knowledge
(Garland 1996). Putting aside the issue of whether language teachers should focus primarily on
grammatical proficiency, a selective-deletion doze test, i f proven to be a valid measure o f
grammatical competency, might provide a time-saving method of examination which is both fair to
students and easier to grade for teachers. Before looking at the findings of this study, however, a
brief history has been provided for those new to doze testing.
Cloze testing was first introduced by W.L. Taylor (1953) , who developed it as a reading test
for native speakers. He defined the term "doze" from a gestalt concept which teaches that an
individual w i l l be able to complete a task only after i ts pattern has been discerned:
A doze unit may be defined as: any single occurrence of a successful attempt to reproduce
accurately a part deleted from a 'message' (any language product) , by deciding from the
context that remains, what the missing par t should be (p . 416).
Cloze tests consist of a text (usually two or three paragraphs) which has had words or parts
of words deleted from it. Test subjects must draw from their knowledge of the language in
order to wr i te appropriate words i n the blanks (see Table One).
Ours
These
was the marsh (1), as the r iver wound, (3)
first most vivid and (5)
are the words to choose from:
Mg;k4: ' ILWF5 ' t
Purpose
Close Test ing: A n Overview
t h i n g s , seemst o me(7)m e m o r a b l e r a w afternoontowards(9)a time (10) f o u n d out for certain, (11)bleak place overgrown (12) n e t t l e s was the churchyard; (13)t h a t Phi l ipP i r r ip , la te(14)t h i s parish,and the (15)w i f e o f t h e above,(16)d e a d a n d bur ied .
I were that M y to wi thin w i th a o f broad twenty and
Table 1: Example of a Fixed-Rate Cloze Test.
— 112 —
, down by the river, (2)miles of the sea. (4)impression of the identity (6)
have been gained on (8)• A t such
t h i s
Georgiana o f evening country
An Investigation of the Selective Deletion Cloze Test as a Valid Measure of Grammar-Based Proficiency in Second Language Learning
English 1B, Niigata University, 1996-1997Language Japanese
Age 18 (82%) 1 9 (18%)Sex Male (55%) Female (45%)
Department Science (91%) Education (9%)Skill Level False Beginners
Total Number Subjects 22
There are a t least f ive main types o f doze tests available to language teachers: The
fixed-rate deletion, t h e selective delet ion ( a l s o k n o w n a s t h e rat ional d o z e ) , t h e
multiple-choice doze, the doze elide and the C-test (Ikeguchi 1995; Weir 1990: Klein-Braley
and Raatz 1984).
In the fixed-rate deletion, after one or two sentences, every nth word is deleted. Usually
every fifth or seventh word is deleted, but Brown (1983) suggests that longer texts with every
eleventh or fifteenth word deleted can be used with subjects who have a lower level of language
proficiency. Mult iple choice doze tests provide the subjects wi th several possible items to
choose from for each blank. The doze elide inserts words which do not belong in the text, and
requires the subjects to identify the incorrect words plus wri te appropriate items in their
place. The C-test consists of deleting only par t of every second word in a text, and asks
subjects to complete each truncated word. In the selective deletion or rational doze, the tester
chooses which items he or she wishes to delete from the text. The goal for teachers using this
test is not only to fine tune the level of difficulty of the text, but also to measure the knowledge
of specific grammatical points and vocabulary items. Let us now consider whether the selective
deletion doze t ru l y i s a reliable measure o f grammatical knowledge.
Subjects
One group (see Table Two) from Niigata University was selected for this study. As Table
Two shows, all were native Japanese speakers consisting mostly of first year Science majors.
No special criteria was used in selecting or excluding the subjects. Neither was the group
tested on thei r English proficiency level before entering the course. However, classroom
experience with the subjects led us to believe that most group members had limited speaking,
listening and wri t ing skills, typically representative of a Japanese university f i rst year EFL
class (cf. Wadden 1993).
Table 2
— 113 —
gAt4-4q-NrifLerct
Materials
Interchange Two (Richards, et al. 1993) was used as the primary text. The selective deletion
doze was created from one of the general interest reading texts in the f i rst chapter of the
course book (Richards et al. 1993:7, see Table Three). While the subjects had read the text
several months earlier, we were fair ly certain that very few, if any of the students had read the
text again since that time. The doze test consisted of a 133 word passage with 25 blanks,
meaning that roughly 19% of the total text was deleted. Test-retest was conducted two separate
times on this particular doze. A t a probability rating at less than one percent that the results
are due to chance (p < .01) , the reliability coefficient for this doze test reached a moderate level
of significance ( r x , = + .56 and + .60).
There are many things people remember about the sixties. Some people remember i t fo rmini-skirts, the Beatles, hippies and the flower children. I t was a time when young people"owned" the world and thought that anything was possible. In art, fashion, and music, the bignames were often in their early twenties, and some of them were already millionaires! Thesixties was a time when young people used to do whatever things they wanted. "Don't trustanyone over 30!" they said. In the arts, people l ike Andy Warhol created "pop art." Andfashions changed, too. The mini-skir t became popular, and then the "unisex" look followed.Young people started wearing blue jeans everywhere - - to school, fancy restaurants, andconcerts. Many o f them had very long hair and wore lots o f rings, beads and bracelets.
Table 3: Te x t Selected f o r Use i n th is Study. Adapted f rom Richards, e t al. (1993:7)
Procedure
The doze test (see Figure One) was administered to the subjects two times, separated by a
period of two weeks. During the second administration, a grammar-based test created by the
textbook designers was also given t o the subjects (Richards, e t al. 1993:168-172). The
instructions were given to the students verbally and in written form, both in English and
Japanese, to facilitate a clear understanding of the task. On each occasion, the doze tests were
collected after 20 minutes. One significant variable that was different, however, is that the first
test was administered during a regular class session, while the other was given during their
midterm test. While this is certainly not standard practice when studying the validity of a
certain test design, allowing this procedure provided a venue to find out how the doze test
would function under a variety o f classroom conditions.
— 114 —
An Investigation of the Selective Deletion Cloze Test as a Valid Measure of Grammar-Based Proficiency in Second Language Learning
Student Name:
Student N u m b e r '
There are m a n y th ings people r e m e m b e r about the sixties. S o m e people i t for mini-
skirts, B e a t l e s , h ipp ies t h e f lower chi ldren. I t a t ime
y o u n g p e o p l e " o w n e d " t h e a n d t h o u g h t t h a t a n y t h i n g w a s
In art, fashion, a n d music, the b ig names o f t e n t h e i r early twenties, a n d
s o m e
Reading and Vocabu la ry
Instruct ions: F i l l Out the b lanks be low with the cor rec t words.
-fTd)ts o c .
them a l r e a d y mil l ionaires! T h e w a s t i m e
when young people t o do wha teve r t h e y wanted. - D o n ' t
anyone o v e r 30! t h e y said. I n a r t s , peop le l ike A n d y Warho l created "pop
" A n d f a s h i o n s c h a n g e d , t o o , m i n i - s k i r t p o p u l a r
then ' u n i s e x ' look fol lowed. Yo u n g people s tar ted b l u e
everywhere — to school, f ancy a n d concerts. M a n y o f them h a d very long ha i r
and wore lots o f rings, beads and bracelets.
Figure 1: Select ive Delet ion C loze Te s t Des igned f o r this Study.
Analysis
The tests were graded by two scorers. The classroom teacher graded the grammar-based
tests using the key provided in the teacher's manual (Richards, et al., 1993:189-190), while a
native English speaking TEFL lecturer graded the tests using the Semantically Acceptable
Word (SEMAC) Method. Typically, doze tests can be graded using either the Exact Word or
SEMAC scoring method. In the exact word method, the doze test blanks must be completed
with the exact word as was in the original text. Correct answers receive 1 point, while any
other response receives no points. SEMAC scoring allows subjects to write answers which are
grammatically and lexically appropriate, although not the original words deleted from the text.
For the purposes of this experiment, i t did not matter whether the exact word method or
SEMAC method was used, since they both correlate highly with each other (cf. Owen et al.
1996; Hadley and Naaykens, in press) However, SEMAC scoring may require a subjective
— 115 —
judgement by the scorer. In order to avoid the doze test scores to be influenced by personal
knowledge of the subjects, an evaluator unacquainted with the subjects was chosen. Before
grading the tests, the bl ind evaluator was given a manuscript o f the complete text, and
instructed t o a l low any words i n the doze tha t were either synonymous, lexical ly o r
grammatically correct. Mistakes i n historical accuracy, and minor spelling errors were
ignored. If it was difficult to ascertain whether an answer was acceptable or not, it was scored
as incorrect.
After the scores were totaled, a l l o f the data was analyzed using the VA R Grade fo r
Windows 2.0 software package (Revie 1997). The method o f analysis was set up as a
directional one-tailed test which used the Pearson r correlation coefficient. The doze test
scores were correlated with the scores of the grammar-based test, and resulted ill a correlation
coefficient o f + .72 (See Figure Two) .
24.0
20.4
Selective 16.8Deletion
Cloze 132
9.6
VAt-'?-g-guicIL1Jf3r,
— 116 —
46.4
Implications for Language Teachers
6.032.0 3 5 . 6 3 9 . 2 4 2 . 8
Grammar TestFigure 2 Correlated Scores of Grammar-Based Test and Selective Deletion Cloze. n =22, r = .72.
50.0
According to Brown (1993:132-141), at p 005 , the critical level of significance for a group
of 22 i s approximately + .51 (see also Fisher and Yates, 1963). Th is suggests that the
correlation between the grammar-based test and the selective-deletion doze may be quite
significant.
It would be foolhardy if language teachers completely changed their testing practices simply
on the basis of this one study. However, the findings o f this research tends to suggest that
selective-deletion doze tests could be used in place of or alongside of grammar-based language
An Investigation of the Selective Deletion Ooze Test as a Valid Measure of Grammar-Based Proficiency in Second Language Learning
tests. If careful consideration is given to the design of the selective-deletion doze, it has a high
potential for reliability, even under less than desirable testing conditions. It may be even more
reliable than tests which our learners are frequently exposed to: tests which have been thrown
together l a te a t n ight b y language teachers under the pressure o f several deadlines.
Conservative use of the selective deletion doze could provide teachers with a time-saving
method of testing their learners. Learners could be assured that, despite the brevity of the test,
their level o f grammatical competence in the target language is being, to a certain degree,
reliably measured. Both teacher and learners might then be liberated from the unnecessary
amount of time normally spent on testing, and more time could be dedicated to studying the
target language.
Conclusion
It is hoped that language teachers wi l l begin experimenting with doze testing as a viable
option t o the tradit ional tests which are normally administered i n universi ty language
classrooms. Even i f some are uncertain about the rel iabi l i ty and val idi ty o f the selective
deletion doze for use as a C-RT, i t could still be used as a quick measure to see i f the learners
are making progress i n the course.
This study opens avenues f o r future research. F o r example, t o what extent would a
selective-deletion doze correlate with a test measuring oral proficiency, or with a listening
proficiency test? I f such scores did consistently correlate highly, would this suggest that doze
tests can measure more than just grammatical competence in second language learning? These
are just a few of the many questions which deserve further investigation as we continue our
search fo r innovative and effective methods o f second language testing.
References
Alderson, J.C. (1979). "The doze procedure and proficiency in English as a second language." TESOL
Quarterly, 13 , 219-226.
Bachman, L. (1990). Fundamental Considerations in Language Testing. Oxford: Oxford University Press.
Bachman, L . (1982). "The t ra i t structure o f doze test scores. TESOL Quarterly, 16 , 61-70.
Bowen, JD., Madsen H, and Hi l ferty, A . (1985). TESOL: Techniques and Procedures. Rowley, MA:
Newbury House Publishers.
Brown, JD. (1983). "A closer look at the doze: Val idi ty and reliability." In J.W. 011er, Jr. (Ed.) Issues in
Language Test ing Research. (p . 237-250). Rowley, MA:Newbury House.
— 117 —
* V I M3U 5 t
Brown, J.D. (1993). Understanding Research i n Second Language Learning New York : CambridgeUniversity Press.
Brown, J.D. and Yamashita S. (Eds.) (1995). Language Testing in Japan. Tokyo: The Japan Associationfor Language Teaching.
Fisher, R.A. and Yates, F. (1963). Statistical Tables for Biological, Agricultural and Medical Research.London: Longman.
Garland, V. (1996). 'Teaching techniques and learning styles in Japanese universities'. Journal of Cross-Cultural Studies. 6:73-96.
Hadley, G. and Naaykens, J. (In Press). 'Testing the Test: Comparing SEMAC and Exact Word Scoring onthe Selective Deletion Cloze.' Korea TESOL Journal. E .
Hanania, E. and Shikhani, M. (1986). 'Interrelationships among three tests of language proficiency:Standardized ESL, doze and writing.' TESOL Quarterly, 20, 97-109.
Ikeguchi, C. (1995) “Cloze testing options for the classroom." in J.D. Brown and S. Yamashita (Eds.) 1995.Language Testing in Japan (p. 166-178). Tokyo: The Japan Association for Language Teaching.
Klein-Braley, C. and Raatz, U. (1984). "A survey of research on the C-test." Language Testing, 1,134-146.
Oiler, J.W. Jr. (Ed.) (1983). Issues in Language Testing Research. Rowley, MA: Newbury House.
Owen, C., Reeves, J. and Widener, S. (1996). Testing. Birmingham, UK: University of Birmingham.
Revie, D. (1997). VAR Grade for Windows 2.0: Grading Tools for Teachers. Thousand Oaks, CA: VARedSoftware.
R chards, J., Hull, J., and Proctor, S. (1993). Interchange 2:English for International Communication. NewYork: Cambridge University Press.
Richards, J., Hull, J., and Proctor, S. (1993). Interchange 2 : English for International Communication:Teacher's Manual. New York: Cambridge University Press.
Shanahan, T., Kamil, MI., and Tobin, A. (1982). 'Cloze as a measure of intersentiental comprehension.'
Reading Research Quarterly, 17 , 229-225.
Taylor, W.L. (1953). "Cloze procedure: A new tool for measuring readability." Journalism Quarterly, 30,415-433.
Wadden. P. (Ed.) (1992). A Handbook for Teaching English at Japanese Colleges and Universities. NewYork: Oxford University Press.
Weir, C. (1990). Communicative Language Testing. Hemel Hempstead: Prentice Hall International Ltd.
— 118 —