how is testing supposed to improve schooling? edward haertel april 15, 2012 ncme career award...
TRANSCRIPT
How is Testing Supposed toImprove Schooling?
Edward HaertelApril 15, 2012
NCME Career Award Address
Vancouver, British Columbia
1
How Many Purposes… ?
2
2004 2005 2006 2007 2008 2009 2010 2011 2012 20130
2
4
6
8
10
12
14
16
Year of Talk
Nu
mb
er
of
Te
st
Use
s
Purposes for Educational Testing
3
Measuring Influencing
Learning Instructional Guidance
Learners Student Placement and Selection
Directing Student Effort
Methods Informing Comparisons Among Educational Approaches
Focusing the System
Actors Educational Management
Shaping Public Perceptions
Measuring versus InfluencingMeasuring
◦Relies directly on informational content of specific test scores
Influencing◦Effects intended to flow from testing
per se, independent of specific test results Deliberate efforts to raise test scores Changing perceptions or ideas
4
Example: Weekly Spelling Test
Measuring◦ Note words often missed (guides
reteaching)◦ Assign grades◦ Guide students’ review following testing
Influencing◦ Motivate studying◦ Convey importance of spelling proficiency
5
Leap from measuring to influencing
6
Arguments … claim … program will lead to improvements in school effectiveness and student achievement by focusing … attention … on demanding content.
Yet, the validity arguments … attend only to the descriptive part of the interpretive argument …. The validity evidence … tends to focus on scoring and generalization to the content domain for the test.
The claim that the imposition of the accountability requirements will improve the overall performance of schools and students is taken for granted.
Kane, M. T. (2006). Validation. In R. L Brennan (Ed.), Educational Measurement (4th ed., pp. 17-64)
Interpretive ArgumentScoring
◦ Alignment, DIF, scaling, norming, equating, …
Generalization◦ Score precision, reliability, generalizability,
…Extrapolation
◦ Score as reflection of intended constructDecision or Implication
◦ Use in guiding action or informing description
7
8
“Appropriate test use and sound interpretation of test scores are likely to remain primarily the responsibility of the test user.”
Standards for Educational and Psychological Testing, p. 111
Not our concern?
Process too linear?
Curriculum FrameworkTest SpecificationItem WritingForms AssemblyTryout and revisionAdministrationScaling
9
Today’s FocusAchievement tests taken by students
◦ Some attention to aptitude tests as well◦ Exclude tests taken by teachers◦ Include uses of student test scores to
evaluate teachers◦ Exclude testing for individual diagnosis of
special needs
10
Testing and Prior Instruction
Curriculum-Dependent Test Question
Curriculum-Neutral Test Question
May assume prior knowledge and skills
May probe reasoning with what is already known
May “drill deeper,” testing application of concepts
Must include requisite information with item
Must set up context in order to probe reasoning
Often limited to testing knowledge of concept definitions
11
Seven Broad Purposes of Testing
12
Purpose Primary Users ConstructLinkage to Curriculum Interpretation
Instructional Guidance Teachers, Students
Narrow Achievement Targets
Strong CR, Individual
Student Placement and Selection
School Administrators (and Others)
Aptitude, Achievement
Varies Varies, Individual
Informing Comparisons Among Educational Approaches
School Administrators, Researchers
Achievement (curriculum-specific; curriculum-neutral)
Varies NR, Group
Educational Management
Public, Elected Officials, Administrators
Achievement Grows higher NR (may look like CR), Group
Directing Student Effort
Students Aptitude, Achievement
Varies (should be strong)
mostly CR, Individual
Focusing the System Teachers, School Administrators
Achievement Grows higher NR (may look like CR), Group
Shaping Public Perceptions
Public, Elected Officials, Administrators
Achievement Should be Strong
NR, Group
Seven Broad Purposes of Testing
13
Purpose Primary Users ConstructLinkage to Curriculum Interpretation
Instructional Guidance Teachers, Students
Narrow Achievement Targets
Strong CR, Individual
Student Placement and Selection
School Administrators (and Others)
Aptitude, Achievement
Varies Varies, Individual
Informing Comparisons Among Educational Approaches
School Administrators, Researchers
Achievement (curriculum-specific; curriculum-neutral)
Varies NR, Group
Educational Management
Public, Elected Officials, Administrators
Achievement Grows higher NR (may look like CR), Group
Directing Student Effort
Students Aptitude, Achievement
Varies (should be strong)
mostly CR, Individual
Focusing the System Teachers, School Administrators
Achievement Grows higher NR (may look like CR), Group
Shaping Public Perceptions
Public, Elected Officials, Administrators
Achievement Should be Strong
NR, Group
Seven Broad Purposes of Testing
14
Purpose Primary Users ConstructLinkage to Curriculum Interpretation
Instructional Guidance Teachers, Students
Narrow Achievement Targets
Strong CR, Individual
Student Placement and Selection
School Administrators (and Others)
Aptitude, Achievement
Varies Varies, Individual
Informing Comparisons Among Educational Approaches
School Administrators, Researchers
Achievement (curriculum-specific; curriculum-neutral)
Varies NR, Group
Educational Management
Public, Elected Officials, Administrators
Achievement Grows higher NR (may look like CR), Group
Directing Student Effort
Students Aptitude, Achievement
Varies (should be strong)
mostly CR, Individual
Focusing the System Teachers, School Administrators
Achievement Grows higher NR (may look like CR), Group
Shaping Public Perceptions
Public, Elected Officials, Administrators
Achievement Should be Strong
NR, Group
Seven Broad Purposes of Testing
15
Purpose Primary Users ConstructLinkage to Curriculum Interpretation
Instructional Guidance Teachers, Students
Narrow Achievement Targets
Strong CR, Individual
Student Placement and Selection
School Administrators (and Others)
Aptitude, Achievement
Varies Varies, Individual
Informing Comparisons Among Educational Approaches
School Administrators, Researchers
Achievement (curriculum-specific; curriculum-neutral)
Varies NR, Group
Educational Management
Public, Elected Officials, Administrators
Achievement Grows higher NR (may look like CR), Group
Directing Student Effort
Students Aptitude, Achievement
Varies (should be strong)
mostly CR, Individual
Focusing the System Teachers, School Administrators
Achievement Grows higher NR (may look like CR), Group
Shaping Public Perceptions
Public, Elected Officials, Administrators
Achievement Should be Strong
NR, Group
Seven Broad Purposes of Testing
16
Purpose Primary Users ConstructLinkage to Curriculum Interpretation
Instructional Guidance Teachers, Students
Narrow Achievement Targets
Strong CR, Individual
Student Placement and Selection
School Administrators (and Others)
Aptitude, Achievement
Varies Varies, Individual
Informing Comparisons Among Educational Approaches
School Administrators, Researchers
Achievement (curriculum-specific; curriculum-neutral)
Varies NR, Group
Educational Management
Public, Elected Officials, Administrators
Achievement Grows higher NR (may look like CR), Group
Directing Student Effort
Students Aptitude, Achievement
Varies (should be strong)
mostly CR, Individual
Focusing the System Teachers, School Administrators
Achievement Grows higher NR (may look like CR), Group
Shaping Public Perceptions
Public, Elected Officials, Administrators
Achievement Should be Strong
NR, Group
Purposes for Educational Testing
17
Measuring Influencing
Learning Instructional Guidance
Learners Student Placement and Selection
Directing Student Effort
Methods Informing Comparisons Among Educational Approaches
Focusing the System
Actors Educational Management
Shaping Public Perceptions
Seven Broad Purposes of Testing
18
Purpose Primary Users ConstructLinkage to Curriculum Interpretation
Instructional Guidance Teachers, Students
Narrow Achievement Targets
Strong CR, Individual
Student Placement and Selection
School Administrators (and Others)
Aptitude, Achievement
Varies Varies, Individual
Informing Comparisons Among Educational Approaches
School Administrators, Researchers
Achievement (curriculum-specific; curriculum-neutral)
Varies NR, Group
Educational Management
Public, Elected Officials, Administrators
Achievement Grows higher NR (may look like CR), Group
Directing Student Effort
Students Aptitude, Achievement
Varies (should be strong)
mostly CR, Individual
Focusing the System Teachers, School Administrators
Achievement Grows higher NR (may look like CR), Group
Shaping Public Perceptions
Public, Elected Officials, Administrators
Achievement Should be Strong
NR, Group
Seven Broad Purposes of Testing
19
Purpose Primary Users ConstructLinkage to Curriculum Interpretation
Instructional Guidance Teachers, Students
Narrow Achievement Targets
Strong CR, Individual
Student Placement and Selection
School Administrators (and Others)
Aptitude, Achievement
Varies Varies, Individual
Informing Comparisons Among Educational Approaches
School Administrators, Researchers
Achievement (curriculum-specific; curriculum-neutral)
Varies NR, Group
Educational Management
Public, Elected Officials, Administrators
Achievement Grows higher NR (may look like CR), Group
Directing Student Effort
Students Aptitude, Achievement
Varies (should be strong)
mostly CR, Individual
Focusing the System Teachers, School Administrators
Achievement Grows higher NR (may look like CR), Group
Shaping Public Perceptions
Public, Elected Officials, Administrators
Achievement Should be Strong
NR, Group
Seven Broad Purposes of Testing
20
Purpose Primary Users ConstructLinkage to Curriculum Interpretation
Instructional Guidance Teachers, Students
Narrow Achievement Targets
Strong CR, Individual
Student Placement and Selection
School Administrators (and Others)
Aptitude, Achievement
Varies Varies, Individual
Informing Comparisons Among Educational Approaches
School Administrators, Researchers
Achievement (curriculum-specific; curriculum-neutral)
Varies NR, Group
Educational Management
Public, Elected Officials, Administrators
Achievement Grows higher NR (may look like CR), Group
Directing Student Effort
Students Aptitude, Achievement
Varies (should be strong)
mostly CR, Individual
Focusing the System Teachers, School Administrators
Achievement Grows higher NR (may look like CR), Group
Shaping Public Perceptions
Public, Elected Officials, Administrators
Achievement Should be Strong
NR, Group
Purposes for Educational Testing
21
Measuring Influencing
Learning Instructional Guidance
Learners Student Placement and Selection
Directing Student Effort
Methods Informing Comparisons Among Educational Approaches
Focusing the System
Actors Educational Management
Shaping Public Perceptions
Instructional GuidanceFormative Assessment (informal)
◦ Scoring Sound items adequately sampling domain?
◦ Generalization Test scores with adequate precision?
◦ Extrapolation Mastery extends beyond test per se?
◦ Decision or Implication Used to adapt teaching work to meet learning
needs?
22
Instructional GuidanceFormative Assessment (highly
structured)◦ Winnetka Plan◦ Programmed Instruction approaches◦ Benjamin Bloom’s Mastery Learning◦ Pittsburgh LRDC’s IPI Math Curriculum◦ Criterion-Referenced Testing movement
23
• Scoring• Generalization• Extrapolation• Decision or Implication
Instructional GuidanceFormative Assessment (highly
structured)◦ Scoring
Questions mapped well to behavioral objectives
◦ Generalization Multiple items highly redundant
◦ Extrapolation ??? Assume decomposability, decontextualization
◦ Decision or Implication Relied on cut scores, simple rules; insufficient
attention to actual effects
24
Student Placement and SelectionIQ-based trackingGATE programsEnglish Learner status (Entry / Exit)MCTs / HSEEsAdvanced Placement /
International BaccalaureateSAT / ACT…
25
IQ-Based TrackingRationale
◦ Teachers deliver uniform instruction to all students in a classroom
◦ Students learn at different rates Or, have different “capacities”
◦ Grouping students by ability will improve efficiency because all will receive content at a rate appropriate to their ability This will reduce wasted effort and frustration
26
IQ-Based TrackingContext
◦ Increasing immigration (since late 19th century)
◦ Perceived success of Army Alpha◦ Scientific School Management movement◦ Prevailing hereditarian views
27
IQ-Based TrackingScoring
◦ Scores free from bias and distortion?Generalization
◦ High correlations across forms and occasions
Extrapolation◦ Assumed based on strong theory, some
criterion-related validity evidenceDecision or Implication
◦ Largely unexamined
28
Student Placement and SelectionIQ-based trackingGATE programsEnglish Learner status (Entry / Exit)MCTs / HSEEsAdvanced Placement (AP) /
International Baccalaureate (IB)SAT / ACT…
29
Comparing Educational ApproachesESEA-mandated Project Head Start
evaluationsEvaluations of NSF-sponsored science
curriculaNational Diffusion NetworkWhat Works ClearinghouseBoth RCTs and Quasi-experimental
research
30
Educational ManagementMeasuring Schools
◦ NCLB Adequate Yearly Progress (AYP) determinations Intervention for schools “in need of improvement”
Measuring Teachers◦ “Value-Added” Models
31
“Measuring” purpose (Educational Management) is only part of the story. “Influencing” interacts with “measuring.”
“Value-Added” Models forTeacher EvaluationScoring
◦ May require vertical scaling◦ Bias due to violations of model assumptions
Generalization◦ Extra error due to student sampling and sorting
Extrapolation◦ Score gains as proxy for teacher effectiveness /
teaching quality broadly definedDecision or Implication
◦ Largely unexamined
32
InfluencingPurposes of directing effort, focusing
the system, and shaping perceptions rarely stand alone◦ Direct use of test scores for measuring is
always included◦ Influencing purposes may nonetheless be
more significant
33
Shaping Public Perceptions
34
"Test results can be reported to the press. … Based on past experience, policymakers can reasonably expect increases in scores in the first few years of a program … with or without real improvement in the broader achievement constructs that tests … are intended to measure."
R. L. Linn (2000, p. 4)
Attending to Influencing Purposes in Test Validation Importance
◦ Influence as ultimate rationale for testing◦ Place in the interpretive argument where
unintended consequences ariseChallenge
◦ Purposes not clearly articulated◦ Required data not available for years◦ Required research methods unfamiliar◦ Disincentives to look closely◦ Expensive, may not matter
35
Clarity of Purpose
36
SBAC and PARCC Consortia must have:
“A theory of action that describes in detail the causal relationships between specific actions or strategies … and … desired outcomes …, including improvement in student achievement and college- and career-readiness.”
Availability of DataFamiliar problem in literature on
program evaluation◦ Plan ahead◦ Attend to implementation cycle◦ Do not ask for results too soon
Plan for “audit” tests?Phased implementation?
37
Expanded Methods and TheoriesCan we view testing phenomena
through other disciplinary lenses?Validation requires both empirical
evidence and theoretical rationales◦ Common sense gets us part way there◦ Where does theory for “Influencing”
purposes come from?◦ What research methods can we borrow?
38
Costs and IncentivesNeed increased investment in
comprehensive validationNeed help from agents, agencies
beyond test makers, test administrators
Need more explicit press for comprehensive validation in RFPs, public discourse
39
40
Thank you