psychometric issues in the ell assessment and special ... · uncorrected proof psychometric issues...

23
UNCORRECTED PROOF Psychometric Issues in the ELL Assessment and Special Education Eligibility JAMAL ABEDI University of California, Davis Assessments in English that are constructed and normed for native English speakers may not provide valid inferences about the achievement of English language learners (ELLs). The linguistic complexity of the test items that are not related to the content of the assessment may increase the measurement error, thus reducing the reliability of the assessment. Language factors that are not relevant to the content being assessed may also be a source of construct-irrelevant variance and negatively impact the validity of the assessment. More important, the results of these tests used as the criteria for iden- tification and classification of ELL students, particularly those at the lower end of the English proficiency spectrum, may be misleading. Caution must be exercised when the results of these tests are used for special education eligibility, particularly in placing ELL students with lower English language proficiency in the learning/reading dis- ability category. This article discusses psychometric issues in the assessment of English language learners and examines the validity of classifying ELL students, with a focus on the possibility of misclassifying ELL students as students with learning disabilities. The policy of including of English language learners (ELLs) and students with disabilities (SD) is not only a necessary for reliable, valid, and fair assessments for all, but it is also a congressionally mandated policy. The recent No Child Left Behind Act (2001), which is the reauthorization of the Elementary and Secondary Education Act (the Improving America’s Schools Act), and the amendments to the Individuals with Disabilities Edu- cation Act (IDEA) in 1997 require that all students be included in national and state assessments. The term all in the legislation means every studentincluding those with disabilities and limited English proficiency. The intent is to provide appropriate assessment based on the same high standards and ensure that all students are part of the indicators used for school account- ability (Thurlow & Liu, 2001). Many instructional decisions that will be made could have grave conse- quences for ELLs if their knowledge and skills are not appropriately as- sessed. Although the increasing level of attention to the inclusion and assessment of these students is encouraging, not enough work has been TCRE 782 B Dispatch: 6.9.06 Journal: TCRE CE: Blackwell Journal Name Manuscript No. Author Received: No. of pages: 22 PE: Rashmi/HVA Teachers College Record Volume 108, Number 11, November 2006, pp. 2282–2303 Copyright r by Teachers College, Columbia University 0161-4681 TCRE 782 (BWUS TCRE 782.PDF 06-Sep-06 20:45 148003 Bytes 22 PAGES n operator=HVAnanth)

Upload: others

Post on 01-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Psychometric Issues in the ELL Assessment and Special ... · UNCORRECTED PROOF Psychometric Issues in the ELL Assessment and Special Education Eligibility JAMAL ABEDI University of

UNCORRECTED PROOF

Psychometric Issues in the ELL Assessment

and Special Education Eligibility

JAMAL ABEDI

University of California, Davis

Assessments in English that are constructed and normed for native English speakersmay not provide valid inferences about the achievement of English language learners(ELLs). The linguistic complexity of the test items that are not related to the content ofthe assessment may increase the measurement error, thus reducing the reliability of theassessment. Language factors that are not relevant to the content being assessed mayalso be a source of construct-irrelevant variance and negatively impact the validity ofthe assessment. More important, the results of these tests used as the criteria for iden-tification and classification of ELL students, particularly those at the lower end of theEnglish proficiency spectrum, may be misleading. Caution must be exercised when theresults of these tests are used for special education eligibility, particularly in placingELL students with lower English language proficiency in the learning/reading dis-ability category. This article discusses psychometric issues in the assessment of Englishlanguage learners and examines the validity of classifying ELL students, with a focuson the possibility of misclassifying ELL students as students with learning disabilities.

The policy of including of English language learners (ELLs) and studentswith disabilities (SD) is not only a necessary for reliable, valid, and fairassessments for all, but it is also a congressionally mandated policy. Therecent No Child Left Behind Act (2001), which is the reauthorization of theElementary and Secondary Education Act (the Improving America’sSchools Act), and the amendments to the Individuals with Disabilities Edu-cation Act (IDEA) in 1997 require that all students be included in nationaland state assessments. The term all in the legislation means every student—including those with disabilities and limited English proficiency. The intentis to provide appropriate assessment based on the same high standards andensure that all students are part of the indicators used for school account-ability (Thurlow & Liu, 2001).

Many instructional decisions that will be made could have grave conse-quences for ELLs if their knowledge and skills are not appropriately as-sessed. Although the increasing level of attention to the inclusion andassessment of these students is encouraging, not enough work has been

T C R E 7 8 2 B Dispatch: 6.9.06 Journal: TCRE CE: Blackwell

Journal Name Manuscript No. Author Received: No. of pages: 22 PE: Rashmi/HVA

Teachers College Record Volume 108, Number 11, November 2006, pp. 2282–2303Copyright r by Teachers College, Columbia University0161-4681

TCRE 782(BW

US

TC

RE

782

.PD

F 06

-Sep

-06

20:4

5 14

8003

Byt

es 2

2 PA

GE

S n

oper

ator

=H

VA

nant

h)

Page 2: Psychometric Issues in the ELL Assessment and Special ... · UNCORRECTED PROOF Psychometric Issues in the ELL Assessment and Special Education Eligibility JAMAL ABEDI University of

UNCORRECTED PROOF

done to examine the issues and improve the quality of instruction andassessment for these students. For example, Ortiz (2002) indicated that‘‘students learning English are likely to fail when they do not have access toeffective bilingual education of English as a second language (ESL) pro-gram’’ (p. 41). Lack of access to effective education will also affect theirassessment results. Research has clearly demonstrated that assessments de-signed and normed mainly for native English speakers may not as reliableand valid for ELL students.

The Standards for Educational and Psychological Testing (American Educa-tional Research Association, American Psychological Association, & NationalCouncil on Measurement in Education [AERA, APA, & NCME], 1999)elaborated on this issue:

For all test takers, any test that employs language is, in part, a measureof their language skills. This is of particular concern for test takerswhose first language is not the language of the test. Test use withindividuals who have not sufficiently acquired the language of the testmay introduce construct-irrelevant components to the testing process.In such instances, test results may not reflect accurately the qualitiesand competencies intended to be measured. . . . Therefore it isimportant to consider language background in developing, selecting,and administering tests and in interpreting test performance. (p. 91)

This article will discuss the impact of linguistic factors on assessment andclassification of ELL students. Among the major threats to the validity ofclassifying ELL students is the indistinct line between ELL students at thelower levels of English proficiency and students with learning disabilities.

We will first explain the psychometric issues in the assessment of ELLsand then discuss the possibility of misclassification—due to the use of in-appropriate assessment tools—of ELL students as having a learning dis-ability. Jenkins and O’Connor (2002) indicated that students with readingand/or learning disabilities are not proficient in reading and writing skills.ELL students, particularly those at the lower level of English proficiencyspectrum, also suffer from such lack of proficiency in English. Although acomprehensive and valid diagnostic approach can distinguish students withreading/learning disabilities from ELL students, distinguishing betweenthese two groups of students can be a daunting task.

PSYCHOMETRIC ISSUES IN THE ASSESSMENT OF ELL STUDENTS

Literature on the assessment of English language learners suggests thatstudents’ content-based assessment outcomes may be confounded by lan-

Psychometric Issues in the ELL Assessment and Special Education Eligibility 2283

TCRE 782(BW

US

TC

RE

782

.PD

F 06

-Sep

-06

20:4

5 14

8003

Byt

es 2

2 PA

GE

S n

oper

ator

=H

VA

nant

h)

Page 3: Psychometric Issues in the ELL Assessment and Special ... · UNCORRECTED PROOF Psychometric Issues in the ELL Assessment and Special Education Eligibility JAMAL ABEDI University of

UNCORRECTED PROOF

guage background variables. ELLs generally perform lower than non-ELLson content-based assessments such as math, science, and social sciences—astrong indication that English language proficiency affects instruction andassessment. Research also shows that ELL students’ assessment outcomessuffer from lower reliability and validity; that is, language factors may be asource of measurement error in the content-based assessment of ELL stu-dents and may impact the reliability of the test. Language factors may alsobe a source not relevant to the construct of such assessments (Messick, 1994)and may affect the test’s construct validity.

In the assessment of ELL students, the results of studies suggested thatunnecessary linguistic complexity is a nuisance variable that introduces asource of measurement error and is considered a construct-irrelevant factorin the assessment (Abedi & Leon, 1999; Abedi, Leon, & Mirocha, 2003).Such nuisance variables may seriously confound the outcome of assessmentin content-based areas (Abedi, Lord, & Hofstetter, 1998; Cocking &Chipman, 1988; De Corte, Verschaffel, & DeWin, 1985; Kintsch & Greeno,1985; Trenholme, Larsen, & Parker, 1978; Lepik, 1990; Mestre, 1988;Munro, 1979; Noonan, 1990; Orr, 1987; Rothman & Cohen, 1989; Spanos,Rhodes, Dale, & Crandall, 1988).

The results of a series of experimentally controlled studies by researchersat the National Center for Research on Evaluation, Standards, and StudentTesting (CRESST) on the impact of language on test performance of ELLshave clearly demonstrated that (1) ELLs have difficulty with linguisticallycomplex test items, and (2) reducing linguistic complexity of test items nar-rows the performance gap between ELL and non-ELL students in content-based areas such as math and science (see for example, Abedi & Lord, 2001;Abedi, Lord, & Plummer, 1997; Abedi, Lord, Hofstetter, & Baker, 2000;Abedi et al., 1998; Abedi, Lord, Kim-Boscardin, & Miyoshi, 2000). Summar-ized below are studies that demonstrate the impact of language on the as-sessment outcome of ELLs and suggest that (1) ELLs had more difficulty withtest items that were more linguistically complex, and (2) modifying test itemsto reduce the level of their linguistic complexity reduced the performancegap between ELL and non-ELL students in content-based areas.

Analyses of test data from four locations nationwide (Abedi et al., 2003)found a large performance gap between ELL and non-ELL students inreading and writing, areas that have a substantial amount of language de-mand. The performance gap was lower for science and lowest for mathproblem solving, for which the test items were less linguistically challengingfor ELL students. The performance gap virtually disappeared in math com-putation, for which the language demands of the test items were minimal.

By reducing the impact of language factors on content-based test per-formance, the validity and reliability of assessments can be improved andcan result in fairer assessments for all students—including ELLs and stu-

2284 Teachers College Record

TCRE 782(BW

US

TC

RE

782

.PD

F 06

-Sep

-06

20:4

5 14

8003

Byt

es 2

2 PA

GE

S n

oper

ator

=H

VA

nant

h)

Page 4: Psychometric Issues in the ELL Assessment and Special ... · UNCORRECTED PROOF Psychometric Issues in the ELL Assessment and Special Education Eligibility JAMAL ABEDI University of

UNCORRECTED PROOF

dents with disabilities. To minimize the impact of language factors andconsequently reduce the performance gap between ELL and other stu-dents, language modification of assessment tools may be a viable option. Forexample, when math test items were modified to reduce the level of lin-guistic complexity, over 80% of middle school students who were inter-viewed preferred the linguistically modified over the original Englishversion of the test items (see Abedi et al., 1997). Abedi et al. (1998), in astudy of 1,394 eighth-grade students in schools with high enrollments ofSpanish speakers, showed that modification of language of the items con-tributed to improved performance on 49% of the items; the students gen-erally scored higher on shorter problem statements.

Another study (Abedi & Lord, 2001) of 1,031 eighth-grade students inSouthern California found small but significant differences in the scores ofstudents in low- and average-level math classes taking the linguisticallymodified version of the test. Among the linguistic features that appeared tocontribute to the differences were low-frequency vocabulary, conditionalclauses, and passive-voice verb constructions (for a description of the lin-guistic features below and for a discussion of the nature of and rationale forthe modifications, see Abedi et al., 1997).

Beattie, Grise, and Algozzine (1983) found positive results in modifying testsfor students with a learning disability. Math, reading, and writing tests weremodified in the following ways: hierarchical progression of difficulty; unjus-tified arrangement of sentences; vertical arrangement of bubbles; placement ofpassages in shaded boxes; examples set off from test items; and arrows andstop signs in the corner of pages to indicate continuing and ending pages.

Tindal, Anderson, Helwig, Miller, and Glasgow (2000) used ‘‘simplifiedlanguage’’ as a test accommodation for students with a learning disability,which they argued could also be used for ELLs. Results indicated that sim-plifying the language did not affect the test. However, the authors notedthat their simplification process was perhaps too limited, which suggestedthe need for future studies to expand the simplification process.

Another study consisting of 422 students in eighth-grade science classes(Abedi, Lord, Kim-Boscardin, & Miyoshi, 2000) compared performance onNational Assessment of Educational Progress (NAEP) science items in threetest formats: one booklet in original format (no accommodation); one bookletwith English glosses and Spanish translations in the margins; and one bookletwith a customized English dictionary at the end of the test booklet. Thecustomized dictionary, which was used for the first time by Abedi and hiscolleagues, included only the non-content words that appeared in the testitems. By helping students with their language needs, English learners scoredhighest on the customized dictionary accommodation (their mean scores forthe customized dictionary was 10.18 on a 20-item test as compared withmeans of 8.36 and 8.51 for the other two accommodations).

Psychometric Issues in the ELL Assessment and Special Education Eligibility 2285

TCRE 782(BW

US

TC

RE

782

.PD

F 06

-Sep

-06

20:4

5 14

8003

Byt

es 2

2 PA

GE

S n

oper

ator

=H

VA

nant

h)

Page 5: Psychometric Issues in the ELL Assessment and Special ... · UNCORRECTED PROOF Psychometric Issues in the ELL Assessment and Special Education Eligibility JAMAL ABEDI University of

UNCORRECTED PROOF

In a study on the impact of accommodation on eighth-grade students inmath, Abedi, Lord, Hofstetter, and Baker (2000) applied four differenttypes of accommodation: linguistically modified English version of the test;standard NAEP items with glossary only; extra time only; and glossary plusextra time. Students were also tested using standard NAEP items with noaccommodation. Among these accommodations, linguistic modification oftest items was the only accommodation that reduced the performance gapbetween ELL students and non-ELL students. Because the non-ELL stu-dents in this study, who are among the low-performing student population,also benefited from linguistic modification of test items, this suggests thatclarifying the language of assessment may be helpful not only to ELL stu-dents but to SD students as well.

The summary of research above suggests that reducing the linguisticcomplexity of assessment materials helps ELL students and low-performingnative English speakers to provide a more valid picture of what they knowand can do. All students can benefit from instructional materials that areeasier to understand (i.e., material with unnecessary linguistic complexity).Similarly, all students can better understand assessments that clearly conveythe message related to the concept being assessed.

LINGUISTIC FEATURES THAT MAY IMPACT COMPREHENSION

The results of CRESST research led to identification of linguistic featuresthat have greater effects on ELL student performance. These features slowdown students, make misinterpretation more likely, and add to students’cognitive load, thus interfering with concurrent tasks. Indexes of languagedifficulty include unfamiliar words, long phrases in questions, complexsentences, and conditional and adverbial clauses. Other linguistic featuresthat may cause difficulty for readers include long noun phrases, relativeclauses, prepositional phrases, abstract versus concrete presentation ofproblem, passive voice, and negation. Below is a brief description of some ofthese features, along with some illustrative examples. A few references areadded for each of the features. For a detailed description of these features,see Abedi, Courtney, and Goldberg (2000).

UNFAMILIAR WORDS

Assessments containing unfamiliar words are more difficult for ELL stu-dents than those with more familiar words. Some words, word pairs, orgroups of words still unfamiliar to ELLs might be used in a test item. Theyare unnecessary if they are not essential to the concept being tested.

Idioms are words, phrases, or sentences that cannot be understood lit-erally. Many proverbs, slang phrases, phrasal verbs, and common sayings

2286 Teachers College Record

TCRE 782(BW

US

TC

RE

782

.PD

F 06

-Sep

-06

20:4

5 14

8003

Byt

es 2

2 PA

GE

S n

oper

ator

=H

VA

nant

h)

Page 6: Psychometric Issues in the ELL Assessment and Special ... · UNCORRECTED PROOF Psychometric Issues in the ELL Assessment and Special Education Eligibility JAMAL ABEDI University of

UNCORRECTED PROOF

cannot be decoded by ELLs. On the other hand, words that are high on ageneral frequency list for English are likely to be familiar to most readersbecause they are often encountered. Following are a few examples of un-familiar words used in assessments (Abedi et al., 1997; Adams, 1990; Chall,Jacobs, & Baldwin, 1990; Dale & Chall, 1948; Gathercole & Baddeley, 1993;Klare, 1974;).

Circle the clumps of eggs in the illustration.Patty expects that each tomato plant in her garden will bear 24tomatoes.In the last census, 80% of the households had one or more wage-earners.

LONG PHRASES IN QUESTIONS

Long questions are used less than short questions. Complex question typesmight have an opening phrase or clause that either replaces or postponesthe question word (Adams, 1990).

At which of the following times should Ed feed the parking meter?Of the following bar graphs, which represents the data?According to the passage above, where do sea turtles lay their eggs?

COMPLEX SENTENCES

A complex sentence contains a main clause and one or more subordinating(dependent) clauses. Subordinating words include because, when, after,although, if, and since (more on if under Conditional Clauses; Botel & Gra-nowsky, 1974; Hunt, 1965, 1977; Wang, 1970).

Because she wants to stay in touch, Peggy frequently ______.When she came home, he ______ the letter.Although the ship was ______, she was calm.

LOGICAL CONNECTORS: CONDITIONAL/ADVERBIAL CLAUSES

Conditional clauses and adverbial clauses are among the sources contrib-uting to the linguistic complexity of assessments. Logical connectors areadverbial expressions that allow a listener or reader to infer connectionsbetween two structures. They include dependent words (subordinatingconjunctions; see above). In mathematics, they often include conditional‘‘if-then’’ statements. Some take the form of complex sentences (Celce-Murcia & Larsen-Freeman, 1983; Haiman, 1985; Shuard & Rothery, 1984;Spanos et al., 1988).

Psychometric Issues in the ELL Assessment and Special Education Eligibility 2287

TCRE 782(BW

US

TC

RE

782

.PD

F 06

-Sep

-06

20:4

5 14

8003

Byt

es 2

2 PA

GE

S n

oper

ator

=H

VA

nant

h)

Page 7: Psychometric Issues in the ELL Assessment and Special ... · UNCORRECTED PROOF Psychometric Issues in the ELL Assessment and Special Education Eligibility JAMAL ABEDI University of

UNCORRECTED PROOF

Adverbial clauses:

When the barber was finished with the haircut, he took the customer’smoney.While he was listening to music, he did his homework.

Conditional clauses:

As long as you bring your own bedding, you can stay with us.Given that a is a positive number, what is–a?If one pint will fill 2 cups, how many cups can be filled from 8 pints?(vs. One pint will fill 2 cups. Eight pints will fill _____ cups).In Jean’s class, there are twice as many boys as girls. If there are10 girls in the class, how many boys and girls are there in the class?

LONG NOUN PHRASES

Nouns sometimes work together to form one concept, such as pie chart orbar graph. Sometimes adjectives and nouns work together to create mean-ing: high school diploma, income tax return. To further complicate interpreta-tion, strings of adjectives and nouns create subjects and objects: freshwaterpond, long-term investment, new word processing program (Celce-Murcia & Lar-sen-Freeman, 1983; Halliday & Martin, 1993; Just & Carpenter, 1980; King& Just, 1991; MacDonald, 1993; Spanos et al., 1988).

A loaded trailer truck weighs 26,643 kilograms. When the trailer truckis . . .Of the following number pairs, which is the dimension of a 100-square-foot room?To become next year’s tennis team captain, how many votes will San-dra need?

RELATIVE CLAUSES

A relative clause is an embedded clause that provides additional informa-tion about the subject or object it follows. Because relative clauses are lessfrequent in spoken English than in written English, some students may havehad limited exposure to them. Words that lead a relative clause include that,who, and which. Note: Often that is omitted from a relative clause. Whenpossible, relative clauses should be removed or recast (Pauley & Syder,1983; Schachter, 1983).

2288 Teachers College Record

TCRE 782(BW

US

TC

RE

782

.PD

F 06

-Sep

-06

20:4

5 14

8003

Byt

es 2

2 PA

GE

S n

oper

ator

=H

VA

nant

h)

Page 8: Psychometric Issues in the ELL Assessment and Special ... · UNCORRECTED PROOF Psychometric Issues in the ELL Assessment and Special Education Eligibility JAMAL ABEDI University of

UNCORRECTED PROOF

A bag that contains 25 marbles . . . (vs. One bag has 25 marbles. Asecond . . .)Joe found the student who had loaned him the book.

PREPOSITIONAL PHRASES

Prepositional phrases work as adjectives or adverbs to modify nouns, pro-nouns, verbs, adverbs, or adjectives. When they occur before questionwords, between the subject and the verb, or in strings, they can be especiallyconfusing to ELLs (Orr, 1987; Slobin, 1968; Spanos et al., 1988).

Which of the following is the best approximation of the area of theshaded rectangle in the figure above if the shaded square representsone unit of area?

ABSTRACT (VS. CONCRETE) PRESENTATION OF PROBLEM

Respondents show better performance when assessment questions are pre-sented in concrete rather than abstract terms. Information presented innarrative structures tends to be understood and remembered better thaninformation presented in expository text (Cummins, Kintsch, Reusser, &Weimer, 1988; Lemke, 1986).

The weights of two objects were measured vs. The clerk weighed twosuitcases.

PASSIVE VOICE

Assessments containing passive-voice construction are more difficult forELL students to follow. In active voice, the subject is the one performing anaction. In passive voice, the one receiving the action is in the subject pos-ition. Often the ‘‘actor’’ is not stated (Abedi et al, 1997; Celce-Murcia &Larsen-Freeman, 1983; Forster & Olbrei, 1973)

He was given a ticket vs. The officer gave him a ticket.Girls’ ears were pierced in infancy vs. Parents pierced infant girls’ ears.When comparisons were made, the amounts in each jar had beenreduced.

NEGATION

Studies suggest that a sentence containing negations (e.g., no, not, none,never) are harder to comprehend than affirmative sentences. Several typesof negative forms are confusing to ELLs (Mestre, 1988):

Psychometric Issues in the ELL Assessment and Special Education Eligibility 2289

TCRE 782(BW

US

TC

RE

782

.PD

F 06

-Sep

-06

20:4

5 14

8003

Byt

es 2

2 PA

GE

S n

oper

ator

=H

VA

nant

h)

Page 9: Psychometric Issues in the ELL Assessment and Special ... · UNCORRECTED PROOF Psychometric Issues in the ELL Assessment and Special Education Eligibility JAMAL ABEDI University of

UNCORRECTED PROOF

Proper double negative:

Not all the workers at the factory are not male.It’s not true that all the workers at the factory are not male.

POSSIBLE IMPACT OF LANGUAGE FACTORS ON RELIABILITY OFASSESSMENT

The unnecessary linguistic complexity of test items can be a source ofmeasurement error and can reduce the reliability of the tests. Because true-score variance (s2

T) in classical test theory is defined as the observed-scorevariance (s2

X) minus the error variance (s2E), any increase in the size of

error variance directly affects (reduces) the size of true score variance (Allen& Yen, 1979; Linn & Gronlund, 1995; Salvia & Ysseldyke, 1998) and con-sequently decreases the reliability of the assessment. In a perfectly reliabletest, the error variance (s2

E) would be zero; therefore, the true-score vari-ance (s2

T) would be equal to the observed-score variance.However, in measurement involving human subjects, there is always an

error component. Appropriate evaluation of the measurement error is im-portant in any type of assessment, whether in the traditional multiple-choice approach or in performance-based assessments (Linn, 1995; see alsoAERA, APA, & NCME, 1999). Many different sources (e.g., occasion, task,test administration conditions) may contribute to measurement error intraditional assessment instruments for all students. It is important to note,however, that the unnecessary linguistic complexity of test items as a sourceof measurement error differentially impacts performance of differentgroups of students with different levels of English proficiency. The linguisticcomplexity factor affects the performance of ELL students because thecommon characteristic of these students is their needs in the area of Englishlanguage. Thus, there is an interaction between students’ ELL status andtheir underlying measurement model.

In the classical approach to estimating reliability of assessment tools, thelevel of contribution of different sources to measurement error may beindeterminable. Through the generalizability approach, one would be ableto determine the extent of the variance that each individual source con-tributes (such as occasion, task, item, scorer, and language factors) to theoverall measurement error (see Cronbach, Gleser, Nanda, & Rajaratnam,1972; Shavelson & Webb, 1991).

To estimate reliability of the standardized achievement tests and to in-vestigate their measurement error across different subgroups of students(e.g., ELLs versus non-ELLs), we considered different approaches. Becauseparallel forms or test-retest data were not available in many districts and

2290 Teachers College Record

TCRE 782(BW

US

TC

RE

782

.PD

F 06

-Sep

-06

20:4

5 14

8003

Byt

es 2

2 PA

GE

S n

oper

ator

=H

VA

nant

h)

Page 10: Psychometric Issues in the ELL Assessment and Special ... · UNCORRECTED PROOF Psychometric Issues in the ELL Assessment and Special Education Eligibility JAMAL ABEDI University of

UNCORRECTED PROOF

states across the nation, we decided to use an internal consistency approachfor estimating the reliability of some of the commonly used standardizedachievement tests. As mentioned, the linguistic complexity of test items mayintroduce another dimension into the assessment, the dimension of lan-guage. If this is the case, and because the internal consistency approach isextremely sensitive to multidimensionality of test items (see, for example,Abedi, 1996; Cortina, 1993), then the estimated reliability for ELL studentsmay be considerably lower than the reliability for native speakers of English.One may also argue that language factors may create a restriction of rangeon content-based outcome measures, and the restriction of range may re-duce the internal consistency coefficient. In other words, unnecessary lin-guistic complexity makes assessment more difficult for ELL students,thereby reducing their performance level. Lower performance level thencreates a restriction of range for the ELL performance distribution, and thatin turn results in lower reliability for ELL students. These two explanationsboth relate to the impact of language factors on the assessment of ELLs, andboth indicate the negative effects of linguistic complexity of content-basedassessment on ELL student performance. To illustrate this phenomenon,we present results of some analyses performed by CRESST researchers onthe reliability of test items.1

Table 1 presents internal consistency (alpha) coefficients for the Stanford9 subscale scores for second- and ninth-grade students from one of our datasites. Data in Table 1 show two interesting trends: (1) as expected, theinternal consistency coefficients for non-ELL students were higher thanthose for ELL students, and (2) the difference between ELL and non-ELLstudents was higher in the higher grades, in which content-based languageis more complex.

For students in second grade, the alpha coefficient was .914 for non-ELLstudents as compared with .856 for ELL students. In math, the alpha for

Table 1. Internal Consistency Coefficients for ELL/Non-ELL Students in Grades 2

and 9

Content Area

Grade 2 Grade 9

Non-ELL ELL Non-ELL ELL

Reading .914 .856 .876 .750(n 5 234,505) (n 5 101,399) (n 5 181,202) (n 5 52,720)

Math .894 .881 .898 .802(n 5 249,000) (n 5 118,740) (n 5 183,262) (n 5 54,815)

Science .805 .597(n 5 144,821) (n 5 40,255)

Social science .806 .530(n 5 181,078) (n 5 53,925)

Psychometric Issues in the ELL Assessment and Special Education Eligibility 2291

TCRE 782(BW

US

TC

RE

782

.PD

F 06

-Sep

-06

20:4

5 14

8003

Byt

es 2

2 PA

GE

S n

oper

ator

=H

VA

nant

h)

Page 11: Psychometric Issues in the ELL Assessment and Special ... · UNCORRECTED PROOF Psychometric Issues in the ELL Assessment and Special Education Eligibility JAMAL ABEDI University of

UNCORRECTED PROOF

non-ELL students was .894, as compared with .881 for ELL students. Asthese data show, although alpha coefficients for ELL students were lower,the gap between ELL and non-ELL students was not large in second grade.For students in ninth grade, however, there was a larger gap between ELLand non-ELL students. In reading, for non-ELL students, the alpha coef-ficient was .876, as compared with .750 for ELL students. In math, thealpha for ninth grade non-ELL students was .898, as compared with .802for ELL students. In science, the alpha for non-ELLs was .805, as comparedwith .597 for ELLs. Finally, in social science, the alpha for non-ELLs was.806, as compared with .530 for ELLs.

As these data suggest, the difference between internal consistency coef-ficients for ELL and non-ELL subscale scores were substantially larger forstudents in higher grades (ninth grade) than for students in lower grades(second grade). These differences were statistically significant. The averagealpha for students in second grade over all subject areas was .904 for non-ELL students, as compared with .869 for ELL students—a small differenceof 4%. For students in ninth grade, however, the average alpha for non-ELLwas .846, as compared with an average alpha of .670, a difference of 26%.Comparing the percent alpha difference of 26% for ninth-grade studentswith the 4% of second-grade students once again suggests that in a morelinguistically complex environment, the difference between ELL and non-ELL students is more apparent.

VALIDITY

In content-based assessments such as in math and science, the linguisticcomplexity of test items may introduce another dimension or construct inaddition to the construct that is the target of assessments. This may be thecase particularly for ELL students. The linguistic complexity factors incontent-based assessment may be considered a source of construct-irrele-vant variance because it is not conceptually related to the content beingmeasured (Messick, 1994):

With respect to distortion of task performance, some aspects of thetask may require skills or other attributes having nothing to do withthe focal constructs in question, so that deficiencies in the construct-irrelevant skills might prevent some students from demonstrating thefocal competencies. (p. 14)

The concept of ‘‘construct-irrelevant’’ applies to the situations in which aconstruct other than the construct targeted for assessment is involved. Forexample, when the linguistic structure of an assessment in a content area(e.g., math or science) is so complex that ELL students cannot adequately

2292 Teachers College Record

TCRE 782(BW

US

TC

RE

782

.PD

F 06

-Sep

-06

20:4

5 14

8003

Byt

es 2

2 PA

GE

S n

oper

ator

=H

VA

nant

h)

Page 12: Psychometric Issues in the ELL Assessment and Special ... · UNCORRECTED PROOF Psychometric Issues in the ELL Assessment and Special Education Eligibility JAMAL ABEDI University of

UNCORRECTED PROOF

understand the question, then the English language becomes another con-struct that is measured by the test but is not relevant to the content beingassessed. In other words, language interferes with the targeted content inthe assessment. Linguistic complexity of test items as a possible source ofconstruct-irrelevant variance may be a threat to the validity of achievementtests because it could be a source of measurement error in estimatingthe reliability of the tests. The construct-irrelevant variance may change thestructural relationships among test items, between subscale scores andthe fit of the structural model. Because linguistic factors may have moreinfluence on the assessment outcomes for ELL students, then the structuralrelationships of ELL assessment outcomes may be different with those ofEnglish-only students.

To examine the hypothesis of differences between ELL and non-ELLstudents on the structural relationship of test items, a series of structuralequation models was created. Fit indices were compared across ELL andnon-ELL groups. The results generally indicated that the relationships be-tween individual items, items with the total test score, and items with theexternal criteria were stronger for non-ELL students than for ELL stu-dents.

Item parcels in each content area (e.g., reading, science, and math) werecreated. Each parcel was constructed to systematically contain items withvarying degrees of item difficulty. Through this process, each parcel con-tained three categories of difficulty: easy, difficult, and moderately difficultitems. The main reason for creating item parcels was to provide multiplemeasures for each of the content areas. For example, rather than having asingle score of math as the sum of all items in a math test, items weredivided into several groups, or parcels. Each parcel of items provided ameasure of math; therefore, we obtained as many measures of math as thenumber of parcels (for a detailed description of item parcels and ways tocreate them, see Cattell & Burdsal, 1975). A reading latent variable wasconstructed based on these four parcels. Similarly, item parcels and latentvariables for science and math were created from the 48 math items and40 science items by the same process. The correlations between the reading,math, and science latent variables were estimated. Models were tested onrandomly selected subsamples to demonstrate the cross-validation of theresults.

Table 2 shows the results of the structural models for ninth-grade stu-dents from a large state. To examine the consistency of the results of theseanalyses over independent samples, students were randomly divided intotwo cross-validation sample, namely sample 1 and sample 2. The resultsobtained under these two independent samples were consistent. Forexample, the average factor loading across all four item parcels in eachcontent area and across all three content areas was .795 for sample 1 and

Psychometric Issues in the ELL Assessment and Special Education Eligibility 2293

TCRE 782(BW

US

TC

RE

782

.PD

F 06

-Sep

-06

20:4

5 14

8003

Byt

es 2

2 PA

GE

S n

oper

ator

=H

VA

nant

h)

Page 13: Psychometric Issues in the ELL Assessment and Special ... · UNCORRECTED PROOF Psychometric Issues in the ELL Assessment and Special Education Eligibility JAMAL ABEDI University of

UNCORRECTED PROOF

.794 for sample 2 for non-ELL students. For ELL students, the averagefactor loading was .666 for sample 1 and .671 for sample 2. Similarly, thereis a high level of consistency on average factor correlations across the twoindependent samples. The average factor correlation for non-ELL was .830for sample 1 and .827 for sample 2. For ELLs, the average factor correlationwas .794 for sample 1 and .755 for sample 2. Once again, these data suggesthigh level of consistency across the cross-validation samples.

Table 2 also compares the structural relationships of test items across thecategories of ELL. The data show major differences between ELL and non-ELL students. As data in Table 2 suggest, correlations of item parcels withthe latent factors (factor loadings) were consistently lower for ELL studentsthan they were for non-ELL students. For example, the average factorloadings across different content areas and multiple samples was .795 fornon-ELLs as compared with .668 for ELLs, a substantial difference. Simi-larly, there was a large difference between ELL and non-ELLs on averagefactor correlations. For non-ELLs, the average factor correlation was .829,as compared with .774 for ELLs—once again, a large difference.

In term of fit indices, the structural models showed a good fit of themodel to the data for both ELLs and non-ELLs. However, the trend ofdifferences between ELL and non-ELLs was also seen here, even thoughthe difference was small. Models for non-ELLs had slightly higher fit ascompared with the models for ELLs.

Table 2. Grade 9 Students From Data Site, Stanford 9 Reading, Math, and Science

Structural Modeling Results (df 5 51)

Non-ELL(N 5 22,782)

ELL(N 5 4,872)

Sample 1 Sample 2 Sample 1 Sample 2

Average factor loadings Across thefour parcels

Reading comprehension .846 .846 .746 .749Math .830 .830 .711 .724Science .708 .707 .541 .539Average factor correlation

across the three content areas.830 .827 .794 .755

Goodness of fitChi square 488 446 152 158

NFI .997 .998 .992 .992NNFI .997 .997 .993 .993CFI .998 .998 .995 .995

Note. There was significant invariance for all constraints tested with the multiplegroup model (Non-ELL/ELL).

2294 Teachers College Record

TCRE 782(BW

US

TC

RE

782

.PD

F 06

-Sep

-06

20:4

5 14

8003

Byt

es 2

2 PA

GE

S n

oper

ator

=H

VA

nant

h)

Page 14: Psychometric Issues in the ELL Assessment and Special ... · UNCORRECTED PROOF Psychometric Issues in the ELL Assessment and Special Education Eligibility JAMAL ABEDI University of

UNCORRECTED PROOF

The hypotheses of invariance of factor loadings and factor correlationsbetween the ELL and non-ELL groups were tested. Specifically, we testedthe invariance of (1) the correlations between parcel scores and a readinglatent variable; (2) correlations between parcel scores and a science latentvariable; (3) correlations between parcel scores and a math latent variable;and (4) correlations between content-based latent variables across the ELLand non-ELL groups.

The null hypotheses for all these tests of invariance were rejected, suggest-ing that ELL and non-ELL students responded differently to the test items.

ISSUES CONCERNING ELL CLASSIFICATION AND SPECIALEDUCATION ELIGIBILITY

Researchers have expressed concerns over the validity of classification forELL students. Because of the lack of a commonly accepted operationaldefinition of the term ELL or LEP (limited English proficiency) and becauseof validity issues in the criteria used for such classification, large discrep-ancies have been reported in the ELL classification practices across thenation (see, for example, Abedi, 2004, 2005; Abedi et al., 1997). Althoughproblems in the classification of ELL students is very serious and affectsboth instruction and assessment for these students, a more serious problemis the possibility of ELL students at the lower level of English proficiencydistribution being misidentified as students with disabilities because stu-dents’ limitations in English may be interpreted as a sign of learning (orreading) disability.

Artiles, Rueda, Salazar, and Higareda (2005) found that ELL studentswith lower levels of proficiency in L1 and L2 (first and second language,respectively) showed the highest rate of identification in the special edu-cation categories. The authors also indicated that more ELL students tendto be placed in the ‘‘learning disability’’ category than in ‘‘language andspeech impairment.’’ Similarly, Artiles and Ortiz (2002) found a differentialrate of overrepresentation of ELL students in special education programs.For example, based on their data, 26.5% of ELLs in Massachusetts, 25.3% inSouth Dakota, and 20.1% in New Mexico were placed in special educationprograms as compared with less than 1% of ELLs in Colorado, Maryland,and North Carolina placed in similar programs. Rueda, Artiles, Salazar, andHigareda (2002) reported that in a 5-year period—1993–1994 to 1998–1999—the placement rate of Latino ELLs increased by 345%, while theiroverall population in the district increased by only 12% during this periodof time.

To examine this complex issue of classification of ELL students whenrelated to eligibility for special education, we first discuss issues concerning

Psychometric Issues in the ELL Assessment and Special Education Eligibility 2295

TCRE 782(BW

US

TC

RE

782

.PD

F 06

-Sep

-06

20:4

5 14

8003

Byt

es 2

2 PA

GE

S n

oper

ator

=H

VA

nant

h)

Page 15: Psychometric Issues in the ELL Assessment and Special ... · UNCORRECTED PROOF Psychometric Issues in the ELL Assessment and Special Education Eligibility JAMAL ABEDI University of

UNCORRECTED PROOF

validity of classification for ELL students and then elaborate on the criteriafor placing ELL students in learning disability category. It must be noted atthis point, however, that issues concerning classification of students withlearning disability are beyond the scope of this article. The aim of thissection is to discuss some of the technical issues concerning ELL studentswho are placed in the learning disability category. (For a thorough discus-sion of classification issues for students with learning disabilities, see Bra-dley, Danielson, & Hallahan, 2002).

Based on Title IX No. 25 in the No Child Left Behind Act (2001), an LEPstudent is defined as someone who (1) is aged 3–21; (2) is enrolled orpreparing to enroll in an elementary or secondary school; (3) was not bornin the United States or whose native language is not English; (4) is a NativeAmerican, Alaskan Native, or a resident of outlying areas; (5) comes from anenvironment in which a language other than English has had a significantimpact on an individual’s English language proficiency; (6) is migratory andcomes from an environment where English is not the dominant language;and (7) has difficulties in speaking, reading, writing, or understanding theEnglish language that may deny the individual the ability to meet the state’sproficient level of achievement, to successfully achieve in classrooms whereEnglish is the language of instruction, or to participate fully in society(No Child Left Behind Act).

The above definition is based on information related to students’ lan-guage background and their level of English proficiency. Research has ex-pressed concerns about these sources of information. Information onstudents’ language background is obtained from the Home Language Sur-vey (HLS), and data on the students’ level of English proficiency are basedon existing English language proficiency tests. Unfortunately, the validity ofthe HLS is often threatened because parents may give inconsistent infor-mation for a variety of reasons, including concerns over equity of oppor-tunity for their children, citizenship issues, and the literacy of the parent(see Abedi, 2005). Research has also raised concerns about the validity ofexisting English language proficiency scores as a criterion for ELL classi-fication. Reviewers of these tests found major differences between the con-tent that these tests measure and the alignment of the content of these teststo English as a Second Language (ESL) standards (see Abedi, 2005; Zehler,Hopstock, Fleischman, & Greniuk, 1994).

Let us assume that students are correctly being classified across the cat-egories of ELL (ELL/non-ELL). The next question would be to look into thevalidity of criteria used for special education eligibility for these students.There are many different forms of disabilities with different needs anddifferent characteristics. Based on the data from the U.S. Department ofEducation, Office of Special Education and Rehabilitative Services, thereare at least 12 categories of disabilities. Among these categories, however,

2296 Teachers College Record

TCRE 782(BW

US

TC

RE

782

.PD

F 06

-Sep

-06

20:4

5 14

8003

Byt

es 2

2 PA

GE

S n

oper

ator

=H

VA

nant

h)

Page 16: Psychometric Issues in the ELL Assessment and Special ... · UNCORRECTED PROOF Psychometric Issues in the ELL Assessment and Special Education Eligibility JAMAL ABEDI University of

UNCORRECTED PROOF

students with a learning disability are the largest group, constituting 46% ofall students with disabilities (National Center for Education Statistics, 2000).

Jenkins and O’Connor (2002) summarized some of the techniques thathave been used for identifying students with reading disabilities (RD). Stu-dents with reading and/or learning disabilities often leave elementaryschool with deficient reading and writing skills, which makes early iden-tification and prevention important. The authors defined the foundation ofreading as ‘‘the ability to read words; the ability to comprehend language;and the ability to access background and topical knowledge relevant tospecific texts’’ (p. 100). A student with RD, they argued, has a weakness inone or more of these three foundation areas. However, during the earlydevelopmental stages of reading, ‘‘word-level reading skill’’ is the most sa-lient characteristic, and difficulties in this area can signal an RD.

Jenkins and O’Connor (2002) provided instructions as what they con-sider ‘‘sensible actions’’ to identify children with reading disability based onresearch. Among the criteria suggested are assessment of the prerequisiteskills of letter naming and phonemic awareness early in kindergarten;watching children as they attempt to write or spell words for clues into theirunderstanding of the alphabetic principle; and recording progress in letterand phonemic knowledge in ways that encourage closer monitoring ofchildren who appear most at risk. It is therefore clear from the discussionabove that language factors are among the most important criteria for clas-sifying a student as having a learning disability.

Students with learning disabilities and ELL students (particularly those atthe lower levels of English proficiency distribution) may have more diffi-culty with test items that have unfamiliar words and/or a complex linguisticstructure. Thus, language factors that affect the performance of ELLs mayalso influence the performance of students with a learning disability. Thesesimilarities between the language background characteristics and the levelof English proficiency may make ELL students with lower level of Englishparticularly vulnerable for misclassification as students with learning and/orreading disability.

Earlier in this article, we presented results of data analyses that showedlarger performance gaps between ELL and non-ELLs in areas with greaterlevels of language demand. A similar trend can be seen for students withdisabilities in general, and students with learning disabilities in particular.To demonstrate this trend, we present results of analyses of achievementtest data for students identified as having disabilities/learning disability.These results show that these students also have difficulty in content areaswith higher level of language demands.

A 4-year trend of student performance in reading and math was exam-ined on the New York State Pupil Evaluation Program (PEP) test for thethird and sixth grades from 1995 to 1998 (New York State Education De-

Psychometric Issues in the ELL Assessment and Special Education Eligibility 2297

TCRE 782(BW

US

TC

RE

782

.PD

F 06

-Sep

-06

20:4

5 14

8003

Byt

es 2

2 PA

GE

S n

oper

ator

=H

VA

nant

h)

Page 17: Psychometric Issues in the ELL Assessment and Special ... · UNCORRECTED PROOF Psychometric Issues in the ELL Assessment and Special Education Eligibility JAMAL ABEDI University of

UNCORRECTED PROOF

partment, 1999). In all 4 years and for both grade levels, the performancegap between the SD and non-SD students2 was much larger on the readingassessment than on the math assessment. For example, in 1995, the gapbetween the percentage of third-grade SD and non-SD students scoringabove the state reference point was 51.6 percentage points on the readingassessment and 35.0 points on the math assessment. In 1998, the gap be-tween the percentage of SD and non-SD students scoring above the statereference point was 46.6 percentage points on the reading assessment and27.2 points on the math assessment. Similar PEP test performance gapsbetween SD students and non-SD students were seen in sixth grade. It isinteresting to note that on a separate state assessment (Regents CompetencyTest), sixth-grade performance gaps between SD and non-SD students forreading and math were much smaller. Many SD students were tested on theRegents Competency tests as compared with the PEP tests. This discrepancyhighlights the effect that testing only a small proportion of the SD popu-lation can have on the interpretation of results.

As part of a recent CRESST study, we examined the 1998 reading andmath Stanford 9 Test data for grades 3 and 11 for a state with a largestudent population. In each grade level, the gap in performance betweenthe SD students and non-SD students was larger on the reading assessmentthan on the math assessment. For example, in grade 11, the gap betweenthe mean normal curve equivalent (NCE) of SD/non-ELL and non-SD/non-ELL students was 23.2 on the reading assessment and 18.3 points on themath assessment. The gap between the mean NCE of SD/non-ELL studentsand non-SD/non-ELL students was 33.7 on the reading assessment and23.1 points on the math assessment. The gaps were smaller in third grade,but again, the SD student population had more difficulty with the readingassessment than with the math assessment (Abedi et al., 2003).

In the same study, data from another state provided 1998 Stanford 9 testdata for grades 3 and 10 in reading and math. In each grade level, the gapin performance between the SD students and non-SD/non-ELL studentswas larger on the reading assessment than on the math assessment. Forexample, in grade 10, the gap between the mean NCE of SD/non-ELL andnon-SD/non-ELL students was 27.1 on the reading assessment and 21.2points on the math assessment. The gap between the mean NCE of SD/non-ELL students and non-SD/non-ELL students was 39.8 on the reading as-sessment and 25.7 points on the math assessment. The gaps were smaller inthird grade, but again, the SD population had more difficulty with thereading assessment than with the math assessment (Abedi et al., 2001).

An examination of New Jersey student performance in language arts,science, and math in 2001 on the Elementary and Grade School ProficiencyAssessments (ESPA & GSPA) for fourth and eighth grades revealed that ineach grade level, the gap in performance between the SD students and non-

2298 Teachers College Record

TCRE 782(BW

US

TC

RE

782

.PD

F 06

-Sep

-06

20:4

5 14

8003

Byt

es 2

2 PA

GE

S n

oper

ator

=H

VA

nant

h)

Page 18: Psychometric Issues in the ELL Assessment and Special ... · UNCORRECTED PROOF Psychometric Issues in the ELL Assessment and Special Education Eligibility JAMAL ABEDI University of

UNCORRECTED PROOF

SD students was larger on the language assessment than on the science andmath assessments (New Jersey Department of Education, 2001). For ex-ample, in fourth grade, the gap between the percentage of students scoringin the partially proficient category of SD and non-SD/non-ELL students was39.6 on the language arts assessment, compared with 20.1 on the scienceassessment and 33.2 points on the math assessment. In eighth grade, thegap between the percentage of students scoring in the partially proficientcategory of SD and non-SD/non-ELL students was 56.9 on the language artsassessment, compared with 42.5 on the science assessment and 52.7 pointson the math assessment.

These findings once again clearly suggest that language factors not onlyinfluence the performance of ELL students, but they also affect the per-formance of students with disabilities, particularly those identified as havinglearning disability.

DISCUSSION

Federal and state legislation calls for equal educational opportunity andinclusion of all students in assessments. On the other hand, research on theassessment and accommodation of ELL students questions the fairness ofassessments that are used for these students, particularly those assessmentsthat are developed and normed for mainstream native English speakers.

Studies that were summarized in this article clearly show a large per-formance gap in content-based assessment outcomes between English lan-guage learners (ELLs) and native English speakers. However, thisperformance gap is not necessarily due to the lack of content knowledge;it may be due to students’ lack of English proficiency. The confounding oflanguage factors with the content knowledge has raised concerns about thevalidity and authenticity of the available high-stakes assessment and ac-countability systems for ELLs, particularly those at the lower level ofEnglish proficiency.

Assessment tools that have complex linguistic structures may providepoor achievement outcomes for ELLs and SDs. The results of such assess-ments may not be as reliable and valid for ELLs and SDs as for non-ELL/non-SD students. Consequently, decisions made based on the results ofthese assessments may be problematic for ELL students and other sub-groups of students with language barriers. In this article, based on thefindings of experimentally controlled studies, we illustrated that the reli-ability of commonly used standardized assessments in content-based areasmay be negatively affected by the complex linguistic structure of test items,a construct that is not the target of assessment. We have also discussed theinfluence of linguistic complexity of test items as a source of construct-irrelevant variance in influencing the validity of assessment.

Psychometric Issues in the ELL Assessment and Special Education Eligibility 2299

TCRE 782(BW

US

TC

RE

782

.PD

F 06

-Sep

-06

20:4

5 14

8003

Byt

es 2

2 PA

GE

S n

oper

ator

=H

VA

nant

h)

Page 19: Psychometric Issues in the ELL Assessment and Special ... · UNCORRECTED PROOF Psychometric Issues in the ELL Assessment and Special Education Eligibility JAMAL ABEDI University of

UNCORRECTED PROOF

As we demonstrated in this article, there is a larger performance gapbetween ELL and non-ELLs in areas with greater levels of language de-mand. We also showed a similar trend for students with disabilities in gen-eral, and students with learning disabilities in particular. Therefore,language factors that affect the performance of ELLs may also influencethe performance of students with a learning disability. These similaritiesbetween the language background characteristics and the level of Englishproficiency may make ELL students with lower levels of English particularlyvulnerable to misclassification as students with learning and/or readingdisability.

Thus, assessment results that are influenced by linguistic factors as con-struct-irrelevant may not be valid criteria in the classification of ELL stu-dents. This situation becomes even more complex when ELL students arebeing assessed for eligibility in special education. Unfortunately, as wedemonstrated in this article, the likelihood of misclassification of low-per-forming ELL students as students with a learning disability is not negligible.Care must be taken to increase the validity and authenticity of criteria usedfor eligibility of ELL students for special education. Misclassification of ELLstudents, particularly misidentifying them as students with learning dis-abilities, may have very serious consequences for these students. They maybe placed in an inappropriate educational system and subsequently receiveinappropriate curriculum.

Based on the results of multiple studies, cited in this article, that focus onthe impact of language factors on assessment of the special needs studentpopulation, we believe that if the education community truly wants no childleft behind, serious attention must be given to the current assessment andclassification system for English language learners and students with dis-abilities, particularly ELL students with lower levels of English proficiency.

Notes

1 Data were obtained from four different locations in the nation. For further detailregarding these sites, please see Abedi, Leon, and Mirocha (2003).

2 In the studies mentioned in the rest of this subsection, the population referred to as‘‘non-SD students’’ does not include English language learners (ELLs); thus, the comparisongroup is less likely struggling with language.

References

Abedi, J. (1996). The interrater/test reliability system (ITRS). Multivariate Behavioral Research,31, 409–417.

Abedi, J. (2004). The No Child Left Behind Act and English language learners: Assessment andaccountability issues. Educational Researcher, 33, 4–14.

2300 Teachers College Record

TCRE 782(BW

US

TC

RE

782

.PD

F 06

-Sep

-06

20:4

5 14

8003

Byt

es 2

2 PA

GE

S n

oper

ator

=H

VA

nant

h)

Page 20: Psychometric Issues in the ELL Assessment and Special ... · UNCORRECTED PROOF Psychometric Issues in the ELL Assessment and Special Education Eligibility JAMAL ABEDI University of

UNCORRECTED PROOF

Abedi, J. (2005). Issues and consequences for English language learners. In J. L. Herman &E. H. Haertel (Eds.), Uses and misuses of data in accountability testing (pp. ). Malden, MA: Blackwell.

Abedi, J., Courtney, M., & Goldberg, J. (2000). Language modification of reading, science and mathtest items. Los Angeles: University of California, National Center for Research on Evaluation,Standards, and Student Testing.

Abedi, J., & Leon, S. (1999). Impact of students’ language background on content-based performance:Analyses of extant data. Los Angeles: University of California, National Center for Researchon Evaluation, Standards, and Student Testing.

Abedi, J., Leon, S., & Mirocha, J. (2001). Examining ELL and non-ELL student performancedifferences and their relationship to background factors: Continued analyses of extant data. LosAngeles: University of California, National Center for Research on Evaluation, Standards,and Student Testing.

Abedi, J., Leon, S., & Mirocha, J. (2003). Impact of student language background on content-basedperformance: Analyses of extant data (CSE Tech. Rep. No. 603). Los Angeles: University ofCalifornia, National Center for Research on Evaluation, Standards, and Student Testing.

Abedi, J., & Lord, C. (2001). The language factor in mathematics tests. Applied Measurement inEducation, 14, 219–234.

Abedi, J., Lord, C., & Hofstetter, C. (1998). Impact of selected background variables on students’NAEP math performance (CSE Tech. Rep. No. 478). Los Angeles: University of California,National Center for Research on Evaluation, Standards, and Student Testing.

Abedi, J., Lord, C., Hofstetter, C., & Baker, E. (2000). Impact of accommodation strategies onEnglish language learners’ test performance. Educational Measurement: Issues and Practice,19(3), 16–26.

Abedi, J., Lord, C., Kim-Boscardin, C., & Miyoshi, J. (2000). The effects of accommodations on theassessment of LEP students in NAEP (CSE Tech. Rep. No. 537). Los Angeles: University ofCalifornia, National Center for Research on Evaluation, Standards, and Student Testing.

Abedi, J., Lord, C., & Plummer, J. (1997). Language background as a variable in NAEP mathematicsperformance (CSE Tech. Rep. No. 429). Los Angeles: University of California, NationalCenter for Research on Evaluation, Standards, and Student Testing.

Adams, M. J. (1990). Beginning to read: Thinking and learning about print. Cambridge, MA: MIT Press.Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole.American Educational Research Association, American Psychological Association, & National

Council on Measurement in Education. (1999). Standards for educational and psychologicaltesting. Washington, DC: American Educational Research Association.

Artiles, A. J., & Ortiz, A. (Eds.). (2002). English language learners with special needs: Identification,placement, and instruction. Washington, DC: Center for Applied Linguistics.

Artiles, A. J., Rueda, R., Salazar, J., & Higareda, I. (2005). Within-group diversity in minoritydisproportionate representation: English language learners in urban school districts.Exceptional Children, 7(1), 283–300.

Beattie, S., Grise, P., & Algozzing, B. (1983). Effects of test modifications on the minimumcompetency of learning disabled students. Learning Disabled Quarterly, 6, 75–77.

Bradley, R., Danielson, L., & Hallahan, D. P. (Eds.). (2002). Identification of learning disabilities:Research to practice. Hillsdale, NJ: Erlbaum.

Botel, M., & Granowsky, A. (1974). A formula for measuring syntactic complexity: A directionaleffort. Elementary English, 1, 513–516.

Cattell, B. R., & Burdsal, A. C. (1975). The radial parcel double factoring design: A solution tothe item-vs.-parcel controversy. Multivariate Behavioral Research, 10, 165–179.

Celce-Murcia, M., & Larsen-Freeman, D. (1983). The grammar book: An ESL/EFL teacher’s book.Rowley, MA: Newbury House.

Chall, J. S., Jacobs, V. S., & Baldwin, L. E. (1990). The reading crisis: Why poor children fall behind.Cambridge, MA: Harvard University Press.

Q1

Psychometric Issues in the ELL Assessment and Special Education Eligibility 2301

TCRE 782(BW

US

TC

RE

782

.PD

F 06

-Sep

-06

20:4

5 14

8003

Byt

es 2

2 PA

GE

S n

oper

ator

=H

VA

nant

h)

Page 21: Psychometric Issues in the ELL Assessment and Special ... · UNCORRECTED PROOF Psychometric Issues in the ELL Assessment and Special Education Eligibility JAMAL ABEDI University of

UNCORRECTED PROOF

Cocking, R. R., & Chipman, S. (1988). Conceptual issues related to mathematics achievementof language minority children. In R. R. Cocking & J. P. Mestre (Eds.), Linguistic and culturalinfluences on learning mathematics (pp. 17–46). Hillsdale, NJ: Erlbaum.

Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications.Journal of Applied Psychology, 78, 98–104.

Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability ofbehavioral measurements: Theory of generalizability for scores and profiles. New York: Wiley.

Cummins, D. D., Kintsch, W., Reusser, K., & Weimer, R. (1988). The role of understanding insolving word problems. Cognitive Psychology, 20, 405–438.

Dale, E., & Chall, J. S. (1948). A formula for predicting readability. Educational Research Bulletin,27, 11–20 28, 37–54.

De Corte, E., Verschaffel, L., & DeWin, L. (1985). Influence of rewording verbal problems onchildren’s problem representations and solutions. Journal of Educational Psychology, 77,460–470.

Forster, K. I., & Olbrei, I. (1973). Semantic heuristics and syntactic trial. Cognition, 2, 319–347.Gathercole, S. E., & Baddeley, A. D. (1993). Working memory and language. Hillsdale, NJ: Erlbaum.Haiman, J. (1985). Natural syntax: Iconicity and erosion. New York: Cambridge University Press.Halliday, M. A. K., & Martin, J. R. (1993). Writing science: Literacy and discursive power. Pittsburgh,

PA: University of Pittsburgh Press.Hunt, K. W. (1965). Grammatical structures written at three grade levels (Research Rep. No. 3).

Urbana, IL: National Council of Teachers of English.Hunt, K. W. (1977). Early blooming and late blooming syntactic structures. In C. R. Cooper &

L. Odell (Eds.), Evaluating writing: Describing, measuring, judging (pp. 69–90). Urbana, IL:National Council of Teachers of English.

Jenkins, J. R., & O’Connor, R. E. (2002). Chapter II: Early identification and intervention foryoung children with reading/learning disabilities. In R. Bradley, L. Danielson, &D. P. Hallahan (Eds.), Identification of learning disabilities: Research to practice (pp. 99–149).Hillsdale, NJ: Erlbaum.

Just, M. A., & Carpenter, P. A. (1980). A theory of reading: From eye fixation to comprehen-sion. Psychological Review, 87, 329–354.

King, J., & Just, M. A. (1991). Individual differences in syntactic processing: The role ofworking memory. Journal of Memory and Language, 30, 580–602.

Kintsch, W., & Greeno, J. G. (1985). Understanding and solving word arithmetic problems.Psychological Review, 92, 109–129.

Klare, G. R. (1974). Assessing readability. Reading Research Quarterly, 10, 62–102.Lemke, J. L. (1986). Using language in classrooms. Victoria, Australia: Deakin University Press.Lepik, M. (1990). Algebraic word problems: Role of linguistic and structural variables.

Educational Studies in Mathematics, 21, 83–90.Linn, R. L. (1995). Assessment-based reform: Challenges to educational measurement. Princeton, NJ:

Educational Testing Service.Linn, R. L., & Gronlund, N. E. (1995). Measuring and assessment in teaching (7th ed.). Englewood

Cliffs, NJ: Prentice Hall.MacDonald, M. C. (1993). The interaction of lexical and syntactic ambiguity. Journal of Memory

and Language, 32, 692–715.Messick, S. (1994). The interplay of evidence and consequences in the validation of perform-

ance assessments. Educational Researcher, 23(2), 13–23.Mestre, J. P. (1988). The role of language comprehension in mathematics and problem solving.

In R. R. Cocking & J. P. Mestre (Eds.), Linguistic and cultural influences on learning mathematics(pp. 200–220). Hillsdale, NJ: Erlbaum.

Munro, J. (1979). Language abilities and math performance. Reading Teacher, 32, 900–915.No Child Left Behind Act of 2001, Pub. L. No. 107-110, 115 Stat. 1425 (2002).

2302 Teachers College Record

TCRE 782(BW

US

TC

RE

782

.PD

F 06

-Sep

-06

20:4

5 14

8003

Byt

es 2

2 PA

GE

S n

oper

ator

=H

VA

nant

h)

Page 22: Psychometric Issues in the ELL Assessment and Special ... · UNCORRECTED PROOF Psychometric Issues in the ELL Assessment and Special Education Eligibility JAMAL ABEDI University of

UNCORRECTED PROOF

National Center for Education Statistics. (2000). Digest of education statistics, 2000. Retrieved July12, 2002, from http://nces.ed.gov/pubs2001/digest/dt053.html

New Jersey Department of Education. (2001). New Jersey statewide assessment reports. RetrievedJuly 11, 2002, from http://www.state.nj.us/njded/schools/achievement/2002/

New York State Education Department. (1999). 1999 pocketbook of goals and results for individualswith disabilities. Retrieved July 11, 2002, from http://www.vesid.nysed.gov/pocketbook/1999/

Noonan, J. (1990). Readability problems presented by mathematics text. Early Child Developmentand Care, 54, 57–81.

Orr, E. W. (1987). Twice as less: Black English and the performance of black students in mathematics andscience. New York: W. W. Norton.

Ortiz, A. A. (2002). Prevention of school failure and early intervention for English languagelearners. In A. J. Artiles & A. A. Ortiz (Eds.), English language learners with special educationneeds (pp. 40–58). Washington, DC: Center for Applied Linguistics.

Pauley, A., & Syder, F. H. (1983). Natural selection in syntax: Notes on adaptive variation andchange in vernacular and literary grammar. Journal of Pragmatics, 7, 551–579.

Rothman, R. W., & Cohen, J. (1989). The language of math needs to be taught. AcademicTherapy, 25, 133–142.

Rueda, A., Artiles, A. J., Salazar, J., & Higareda, I. (2002). An analysis of special education as aresponse to the diminished academic achievement of Chicano/Latino students: An update.In R. R. Valenica (Ed.), Chicano school failure and success: Past, present, and future (2nd ed.,pp. 310–332). London: Routledge/Falmer.

Salvia, J., & Ysseldyke, J. (1998). Assessment. Boston: Houghton Mifflin.Schachter, P. (1983). On syntactic categories. Bloomington: Indiana University Linguistics Club.Shavelson, R., & Webb, N. (1991). Generalizability theory: A primer. Newbury Park, CA: Sage.Shuard, H., & Rothery, A. (Eds.). (1984). Children reading mathematics. London: J. Murray.Slobin, D. I. (1968). Recall of full and truncated passive sentences in connected discourse.

Journal of Verbal Learning and Verbal Behavior, 7, 876–881.Spanos, G., Rhodes, N. C., Dale, T. C., & Crandall, J. (1988). Linguistic features of math-

ematical problem solving: Insights and applications. In R. R. Cocking & J. P. Mestre (Eds.),Linguistic and cultural influences on learning mathematics (pp. 221–240). Hillsdale, NJ: Erlbaum.

Thurlow, M., & Liu, K. (2001). State and district assessments as an avenue to equity and excellence forEnglish language learners with disabilities (LEP Projects Report 2). Minneapolis: University ofMinnesota, National Center on Educational Outcomes.

Tindal, G., Anderson, L., Helwig, R., Miller, S., & Glasgow, A. (2000). Accommodating studentswith learning disabilities on math tests using language simplication. Eugene: University of Oregon,Research, Consultation, and Teaching Program.

Trenholme, B., Larsen, S. C., & Parker, R. M. (1978). The effects of syntactic complexity uponarithmetic performance. Learning Disabilities Quarterly, 1, 81–85.

Wang, M. D. (1970). The role of syntactic complexity as a determiner of comprehensibility.Journal of Verbal Learning and Verbal Behavior, 9, 398–404.

Zehler, A. M., Hopstock, P. J., Fleischman, H. L., & Greniuk, C. (1994). An examination ofassessment of limited English proficient students. Arlington, VA: Development Associates, SpecialIssues Analysis Center.

JAMAL ABEDI is a professor at the Graduate School of Education of theUniversity of California, Davis, and a research partner at the NationalCenter for Research on Evaluation, Standards, and Student Testing (CRE-SST). His interests include studies in the area of psychometrics focusing onthe validity of assessment and accommodation for English language learn-ers (ELL), and research on the opportunity to learn for ELLs.

Psychometric Issues in the ELL Assessment and Special Education Eligibility 2303

TCRE 782(BW

US

TC

RE

782

.PD

F 06

-Sep

-06

20:4

5 14

8003

Byt

es 2

2 PA

GE

S n

oper

ator

=H

VA

nant

h)

Page 23: Psychometric Issues in the ELL Assessment and Special ... · UNCORRECTED PROOF Psychometric Issues in the ELL Assessment and Special Education Eligibility JAMAL ABEDI University of

Author Query Form

_______________________________________________________

_______________________________________________________

Dear Author,

During the copy-editing of your paper, the following queries arose. Please respond to these by marking up your proofs with the necessary changes/additions. Please write your answers clearly on the query sheet if there is insufficient space on the page proofs. If returning the proof by fax do not write too close to the paper's edge. Please remember that illegible mark-ups may delay publication.

Journal       TCREArticle       782

Query No. Description Author Response

Q1Please provide page range for Abedi, 2005 entry.