Validation Studies &
Cut Scores
January 21, 2007
EPSB Work
Session
Purposes of Testing/Cut
Scores
To meet statutory requirements and to select individuals who have a minimum level of academic proficiency & content knowledge to be presumed capable of delivering education to children in the public schools.
Testing partners with –
Admission requirements into teacher preparation programs, including Praxis I tests, college admission exams, grade point averages, and other academic proficiency assessments
Kentucky Teacher Internship Program (KTIP)
What decision are we making?
What purpose are we serving?
What’s the best way to make a decision?
Inferences
Two Types of Inferences
1. Select those with the highest levels of qualification
2. Select those who have minimum qualifications
Minimum = low
Inferences
Negative Consequences of increasing cut scores:
1. Reduce the number of qualified applicants for certification
2. Increase the number of emergency and conditional certificates
3. Create teacher shortages4. Reduce institutional QPI scores5. Disparate impact
In any case, the same people will be teaching.
NegativeConsequen
ces
Positive consequences of increasing cut scores:
1. Decrease the number of false positives
2. Perhaps marginally increase the quality of teaching
PositiveConsequen
ces
Research
When teacher basic skills test scores have been used as predictors of teacher performance, few studies have shown any strong relationship, and some studies have shown no relationship at all.
Studies that have attempted to relate content knowledge to teacher performance (most of them confined to mathematics and science) have shown modest results at best.
Standard Error of Measurement (SEM)
Methodologies for establishing cut scores– Angoff– Contrasting Groups– Bookmark– Yeager-Mills (Body of Work)– Other
TechnicalConsiderati
ons
• Results in determination of whether a test is valid for use in KY & provides a recommended cut score
• Most widely used
• Requires teacher judgments
• Held up in most research studies as the method that produces the most stable results
• Generally accepted by the courts & professionals in the field
Angoff Method
Validation Process/Cut Score Recommendation
• A panel of teachers representative from across the state and from each grade level and content area are selected to review items on each test.
• Teachers are asked to estimate the proportion of persons with minimally acceptable skills in the content area who would be expected to get each item right. (At least 70% of the items must be judged job relevant in order for the test to be deemed a valid measure of performance).
• After all items have been rated, the judgments of the teachers are combined to recommend a cut score for the whole test.
Angoff MethodProcess
Based on Decision Rules applied since May 1999
Accept the recommendation of the validation panel unless:
a. The recommendation fell below the current passing score, or
b. The recommendation fell below the Southern Regional Education Board (SREB) average, orc. The recommendation and SREB score fell below the
15th
national percentile, ord. The recommendation exceeded the 25th national
percentile
Cut Score Recommendat
ions
TestTest
Number
ValidationPanel
RecommendedCut Score
Regulatory Cut Score
Earth Science: Content Knowledge 0571 145 145Principles of Learning & Teaching: Grades K-6 0522 164 161Principles of Learning & Teaching: Grades 5-9 0523 166 161Principles of Learning & Teaching: Grades 7-12 0524 158 161Speech Communication 0220 570 580Education of Exceptional Students: Core Content Knowledge 0353 157 157Elementary Education: Content Knowledge 0014 148 148Education of Exceptional Students: Mild to Moderate Disabilities 0542 165 172Education of Exceptional Students: Core Content Knowledge 0353 157 157Biology: Content Knowledge 0235 154 146Chemistry: Content Knowledge 0245 161 147Physics: Content Knowledge 0265 145 133
Tests Validated Since May 1999 & Corresponding Cut Scores
Challenges
Evidence must support that each test chosen is:
#1: valid for the purpose for which it is used
#2: anchored in reasonable expectations of job performance
#3: a reliable measure
#4: does not unfairly disadvantage members of demographic groups
ETS Assuranc
e
The Educational Testing Service (ETS) employs many
psychometricians and conducts many test studies that influence test development, maintenance,
and revision, including bias reviews, DIF analysis, reliability coefficients, and calculation of p-
values.
Recommended
Framework
• Cut scores between the 15th – 25th percentiles, inclusive
• Greater than or equal to current cut score
• Comparable to SREB average cut score
• Use disparate impact estimates as indicators of possible program performance reviews, combined with other information
Legal Consideratio
ns
14th AmendmentDue Process and Equal Protection Clause
Section 1“. . . nor shall any State deprive any person
of life, liberty, or property, without due process of law; nor deny to any person
within its jurisdiction the equal protection of the laws.”
Section 5The Congress shall have the power to
enforce, by appropriate legislation, the provisions of this article.
Legal Consideratio
ns
Civil Rights Act of 1964
Title VII prohibits discrimination in employment
on the basis of race, color, religion, national origin, or
sex.
Title VI prohibits discrimination in federally
funded programs or activities on the basis of race, color, or
national origin.
Legal Consideratio
ns
Testing, in and of itself, is usually only
determined to be discriminatory if it has a disparate impact on a
protected class.
Prima Facie
Disparate Impact case “. . . established when: (1)
plaintiff identifies a specific employment practice to be challenged; and (2) through relevant statistical analysis proves that the challenged practice has an adverse impact on a protected group.” Isabel v. City of Memphis, 404 F.3d 404, 411 (6th Cir.2005).
Prima Facie
If the plaintiff meets this burden, the employer must show that the protocol in question has “a manifest relationship to the employment”-the so-called “business justification.” Griggs, 401 U.S. at 432, 91 S.Ct. 849.
Prima Facie
If the employer succeeds, the plaintiff must then show that other tests or selection protocols would serve the employer's interest without creating the undesirable discriminatory effect.
“An employer cannot be held liable for disparate impact if a legitimate business policy results in workforce disparities.” Bacon v. Honda of America Mfg., Inc., 370 F.3d 565, 579 (6th Cir.2004)
Good Example of a Bad Example of “Business Justification:”So we went in that little room there, and we looked at one another, and we knew we were playing with fire. We had all these pressures.... You knew you were putting both feet, both hands, in the middle of a philosophic war, a media war, a racial war ...Finally somebody said, well, what can we take to the people? At that point we forgot the university. We forgot everybody.... What kind of argument we can make that the people gon buy? And some soul in there said, well could we make the argument that the teachers ought to be smarter than half the students. And we looked around. We said, them old boys down there in Letohatchee will buy that. Everybody will buy it. We were all Alabamians. We all good old boys.We said, we can sell that. Folks in Lowndes County will buy it. Folks up in Wilburn will buy it. Even sophisticates up there in them Birmingham Newspapers, that'll make sense that the teacher ought to be as smart as at least half the students she's teaching.So [one of the steering committee members] was commissioned to go to his office and find out what the average ACT was for graduates, came back and said, I believe it's 16.4. So our big decision was whether to go to 17 or 16. And the only argument I think I recall them arguing for 16. Then we could go back out and say, looka here. Of course, this is also a fallacious argument because the student-the teacher never is as smart as half the students.... [But] that was the scientific basis of it gentlemen and lady. It was just that scientific. Groves v. Alabama State Board of Education, 776 F.Supp. 1518, 1530 (M.D.Ala. 1971).
Prior Litigations
Sharif by Salahuddin v. New York State Education Department, 709 F.Supp. 345 (S.D.N.Y. 1989)
Fields v. Hallsville Independent School District, 906 F.2d 1017 (5th Cir. 1990)
Groves v. Alabama State Board of Education, 776 F.Supp. 1518, 1530 (M.D.Ala. 1991)
Association of Mexican-American Educators v. State of California, 231 F.3d 572 (9th Cir. 2000)
White v. Engler, 188 F.Supp.2d 730 (E.D.Michigan 2001)
Summary
Teacher testing is used to make inferences about future teacher performance
Inferences make sense only in the context of a decision
The decision of interest with teacher tests is whether an individual has a minimum level
of academic proficiency and content knowledge
Summary
Cut scores are based on the relevance of test items to performance as a teacher in the
appropriate content area, using a modified Angoff procedure
Cut scores are recommended through the application of an agreed upon process but
approved by the EPSB Board
Summary
A set of Decision Rules have been applied since May 1999.
A recommended framework suggests that:
Cut scores be – between the 15th & 25th percentiles greater than or equal to current cut scores comparable to SREB average cut scores
Disparate impact be used as a possible indicator of program concerns
Summary
Tests that have been scientifically tailored and vetted to measure a legitimate skill set related to the
certificate holder's duties will withstand judicial scrutiny.
Questions?