testing, stakes, and feedback in student achievement: a meta regression analysis

29
Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis Richard P. PHELPS & Mónica SILVA

Upload: richard-p-phelps

Post on 14-Apr-2017

72 views

Category:

Education


2 download

TRANSCRIPT

Page 1: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

Richard P. PHELPS & Mónica SILVA

Page 2: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

© 2016, Richard P PHELPS, Monica SILVA 18th Congress, World Assn. of Education Research, Eskisehir, Turkey, June, 2016 2

An enormous research literature on the effects of testing

• But, assertions that all or most of it does not exist are common:

– e.g., OECD, World Bank, US National Research Council

– Some claims are made by those who oppose standardized testing, may be wishful thinking

– Others are “firstness” claims

Page 3: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

© 2016, Richard P PHELPS, Monica SILVA 3

Dismissive research reviews

• With a dismissive research literature review, a researcher assures all that no other researcher has studied the same topic

18th Congress, World Assn. of Education Research, Eskisehir, Turkey, June, 2016

Page 4: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

© 2016, Richard P PHELPS, Monica SILVA 4

Firstness claims

• With a firstness claim, a researcher insists that he or she is the first to ever study a topic

18th Congress, World Assn. of Education Research, Eskisehir, Turkey, June, 2016

Page 5: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

© 2016, Richard P PHELPS, Monica SILVA 5

Social costs are enormous

•Public, policy-makers do not understand the relative benefits of testing, and so:

– Do not test when beneficial– Test when detrimental– In general, test sub-optimally

• We cycle through pro-testing, anti-testing fads, instead of making adjustments toward optimal use

18th Congress, World Assn. of Education Research, Eskisehir, Turkey, June, 2016

Page 6: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

© 2016, Richard P PHELPS, Monica SILVA 6

Meta-analysis

A method for summarizing research literature, with a single, comparable measure.

18th Congress, World Assn. of Education Research, Eskisehir, Turkey, June, 2016

Page 7: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

© 2016, Richard P PHELPS, Monica SILVA 7

Background of the study:

• “The effect of testing on student achievement 1910-2010”. Phelps, R. (2012), International Journal of Testing,12(1), 21-43.

• Phelps analyzed about 700 separate

source documents comprising 1,600 studies (quantitative, qualitative and surveys)

• 2,000 other source documents reviewed and found incomplete or inappropriate

18th Congress, World Assn. of Education Research, Eskisehir, Turkey, June, 2016

Page 8: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

© 2016, Richard P PHELPS, Monica SILVA 8

Criteria that guided search of studies to include in the Phelps (2012) meta-analyses

1. Studies in English language that found an effect from testing on student achievement or on teacher instruction…

18th Congress, World Assn. of Education Research, Eskisehir, Turkey, June, 2016

Page 9: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

© 2016, Richard P PHELPS, Monica SILVA 9

Criteria that guided search of studies to include in the meta-analyses

2. Studies that addressed effects when:

• a test is newly introduced, or newly removed• quantity of testing is increased or reduced• test stakes are introduced or increased, or removed or

reduced

18th Congress, World Assn. of Education Research, Eskisehir, Turkey, June, 2016

Page 10: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

© 2016, Richard P PHELPS, Monica SILVA 10

Looking for studies to include in the meta-analyses

3. Data base keyword search:

• ERIC• OCLC• Dissertation Abstracts• EBSCO• Google Scholar …etc.

4. Citation chain, or “ancestry” method

18th Congress, World Assn. of Education Research, Eskisehir, Turkey, June, 2016

Page 11: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

© 2016, Richard P PHELPS, Monica SILVA 11

Number of studies of effects, by methodology type…

Methodology typeNumber of

studiesNumber of

effects

Quantitative 177 640

Surveys and public opinion polls (US & Canada)

247 813

Qualitative 245 245

TOTAL 669 1698

18th Congress, World Assn. of Education Research, Eskisehir, Turkey, June, 2016

Page 12: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

© 2016, Richard P PHELPS, Monica SILVA 12

Measure of effect size for quantitative studies: Cohen’s d

d = (YE - YC) / Spool

YE = mean, experimental group

YC = mean, control group

Spooled = standard deviation

18th Congress, World Assn. of Education Research, Eskisehir, Turkey, June, 2016

Page 13: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

© 2016, Richard P PHELPS, Monica SILVA 13

Effect size: Cohen´s (1988) guidelines:

• d between 0.20 to 0.50 weak effect

• d between 0.50 to 0.75 medium effect

• d more than 0.75 strong effect

John Hattie: in educational achievement:

d of 0.5 ≈ one grade level

18th Congress, World Assn. of Education Research, Eskisehir, Turkey, June, 2016

Page 14: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

© 2016, Richard P PHELPS, Monica SILVA 14

Findings from Phelps (2012):

• Survey study effect sizes average >1.0

• Over 90% of qualitative studies positive

• For quantitative studies, univariate effect sizes positive and stronger when:– Testing more frequently– Testing with feedback– Testing with stakes

18th Congress, World Assn. of Education Research, Eskisehir, Turkey, June, 2016

Page 15: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

© 2016, Richard P PHELPS, Monica SILVA 15

Overall effect size for quantitative studies*:

• “Bare bones” calculation:

d ≈ +0.55 …a medium effect

• Bare bones effect size adjusted for measurement error

d ≈ +0.71 …a stronger effect

• Using same-study-author aggregation

d ≈ +0.88 …a strong effect

Source: Phelps, 2012.

18th Congress, World Assn. of Education Research, Eskisehir, Turkey, June, 2016

Page 16: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

© 2016, Richard P PHELPS, Monica SILVA 16

- 171 source documents

- 640 studies (i.e., effect sizes)

- population coverage: 7 million

-100 moderators, 27 coded for this study

Quantitative studies data base:

18th Congress, World Assn. of Education Research, Eskisehir, Turkey, June, 2016

Page 17: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

© 2016, Richard P PHELPS, Monica SILVA 17

Source documents included, by type

Geographic Origin

154 USA 8 Canada 3 Multiple countries 1 Barbados, Belgium, China, Israel, Korea, Mexico, UK

18th Congress, World Assn. of Education Research, Eskisehir, Turkey, June, 2016

Methodologies

115 Controlled trials, with random assignment 33 Multivariate (e.g., regression, SEM) 20 Pre-post (7 with shadow tests) 5 Post-test only

Page 18: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

© 2016, Richard P PHELPS, Monica SILVA 18

This study:

• Re-analyzes Phelps (2012) data set of quantitative studies to test the joint effects of selected moderators through meta-analytic regression

• Analogue to multiple regression analyses.

18th Congress, World Assn. of Education Research, Eskisehir, Turkey, June, 2016

Page 19: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

© 2016, Richard P PHELPS, Monica SILVA 19

Meta-regression provides:

• An estimate of the contribution of testing frequency, stakes and feedback to the prediction of achievement, after controlling for background moderators.

Test whether or not model adequately explains the observed variability in effect sizes

18th Congress, World Assn. of Education Research, Eskisehir, Turkey, June, 2016

Page 20: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

© 2016, Richard P PHELPS, Monica SILVA 20

Methodology for model fitting:

• Weighted least-squares regression as outlined by Hedges & Olkin (1985)

• Moderators in the equation selected via univariate significance tests reported in Phelps (2012).

18th Congress, World Assn. of Education Research, Eskisehir, Turkey, June, 2016

Page 21: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

© 2016, Richard P PHELPS, Monica SILVA 21

Step II: Test Moderators Stakes Feedback Frequency

Hierarchical Meta-Regression

18th Congress, World Assn. of Education Research, Eskisehir, Turkey, June, 2016

Step I:BackgroundModerators

Alignment Timing of predictor Commercial test used Shadow test used Large scale program Longitudinal study type

Page 22: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

2218th Congress, World Assn. of Education Research, Eskisehir, Turkey, June, 2016© 2016, Richard P PHELPS, Monica SILVA

Avoiding “Data Dependence”

Most source documents include multiple studies, which may be conducted on the same population or data set – they are not

independent of each other.

Solution: Run 6 meta-regressions1 with the largest effect sizes from each source document

1 with the smallest effect sizes from each source document

1 with a randomly chosen study from each document

with and without remediation studies

Page 23: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

© 2016, Richard P PHELPS, Monica SILVA 23

Results Hierarchical WLSR:

Step Largest effect

size studies

% variance explainedWith remediation Without remediation

1 Background factors 41.4 39.02 Test factors 8.6 11.4

Total variance explained 50.0 50.4

18th Congress, World Assn. of Education Research, Eskisehir, Turkey, June, 2016

Step Random effect

size studies

% variance explainedWith remediation Without remediation

1 Background factors 45.0 45.32 Test factors 8.8 9.5

Total variance explained 53.8 54.8

Step Smallest effect

size studies

% variance explainedWith remediation Without remediation

1 Background factors 29.6 24.82 Test factors 7.8 11.8

Total variance explained 37.4 36.6

Page 24: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

© 2016, Richard P PHELPS, Monica SILVA 24

Results Hierarchical WLSR:Direction of effects

Background Moderator Largest Effects

Random Effects

Smallest Effects

Longitudinal study + + +Timing of treatment, before outcome + – nsCommercial test used + + nsShadow test used – – –Alignment – – +Large-scale program – – –

Test Moderator

Frequency (existence of testing) + + +

Stakes (consequences) + + ns

Feedback + + +

18th Congress, World Assn. of Education Research, Eskisehir, Turkey, June, 2016

Page 25: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

© 2016, Richard P PHELPS, Monica SILVA 25

Relative independent contributionof test factors (post hoc)

When entered last in the equation:

Added variance explained

Testing frequency > 6%

Stakes 4%

Feedback 1%

18th Congress, World Assn. of Education Research, Eskisehir, Turkey, June, 2016

Page 26: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

© 2016, Richard P PHELPS, Monica SILVA 26

With background moderators controlled:

Conclusions

18th Congress, World Assn. of Education Research, Eskisehir, Turkey, June, 2016

• Frequency, stakes, and feedback significantly contribute to the prediction of achievement gains

• Frequency is strongest and most consistent predictor

• Stakes and feedback are less consistent predictors

• Substantial variability remains unexplained

Page 27: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

© 2016, Richard P PHELPS, Monica SILVA 27

Where do we go from here?

• Optimal combinations of frequency, stakes, feedback?

• Tests should be more frequent? How much more?

• Which stakes work best when and on which targets: students, teachers, schools?

• Which feedback works best when and on which targets?

• Others?

18th Congress, World Assn. of Education Research, Eskisehir, Turkey, June, 2016

Page 28: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

© 2016, Richard P PHELPS, Monica SILVA 28

Where do we go from here?

• Would be nice if we could add more moderators

• Unfortunately, the moderators available are mostly determined by the original studies

18th Congress, World Assn. of Education Research, Eskisehir, Turkey, June, 2016

Page 29: Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

Testing, Stakes, and Feedback in Student Achievement: A Meta Regression Analysis

Richard P. PHELPS & Mónica SILVA