a re-evaluation of the tennessee star project

22
A Re-Evaluation of The Tennessee STAR Project Alexander Lebedinsky, PhD. Adam Pendry

Upload: edmund-carroll

Post on 08-Jan-2018

220 views

Category:

Documents


0 download

DESCRIPTION

What is The STAR Project? Conducted in 1985, the Tennessee student teacher achievement ratio (STAR) project was a random treatment control group study. It’s implementation was designed to isolate the treatment effect of smaller class sizes on student test scores. Students & Teachers Small 13-17 Regular 22-25 Regular +Aide 22-25

TRANSCRIPT

Page 1: A Re-Evaluation of The Tennessee STAR Project

A Re-Evaluation of The Tennessee STAR Project

Alexander Lebedinsky, PhD.Adam Pendry

Page 2: A Re-Evaluation of The Tennessee STAR Project

What is The STAR Project?

Small13-17

Regular22-25

Regular +Aide22-25

Conducted in 1985, the Tennessee student teacher achievement ratio (STAR) project was a random treatment control group study. It’s implementation was designed to isolate the treatment effect of smaller class sizes on student test scores.

Students & Teachers

Page 3: A Re-Evaluation of The Tennessee STAR Project

Further Details

• In total 80 schools from across the state of TN were involved.– These schools voluntarily signed up, and were not

randomly chosen.– A school had to have a minimum of 57 students to

be eligible (be able to fill one of each class type).• The first year saw 6,500 students distributed

across 330 classrooms.

Page 4: A Re-Evaluation of The Tennessee STAR Project

Importance

• Mosteller (1995) The Tennessee Study of Class Size in the Early School Grades– Cited 554 times as of June 17th 2014

• Kruger (1999) Experimental Estimates of Education Production Function– Cited 1282 times as of June 17th 2014

• Boozer and Caccolia (2001) Inside the 'Black Box‘ of Project STAR: Estimation of peer effects using experimental data– Cited 112 times as of June 17th 2014

Page 5: A Re-Evaluation of The Tennessee STAR Project

Criticism

• Parental involvement and selection bias.– Given that the students’ parents were informed about the

experiment some opted for their children to be in the treatment group.

– This lead to a selection bias towards the smaller class sizes.• Attrition & Addition Rates

– Throughout the school year students would enter and exit the project leading to imperfect conditions.

– Although not too prevalent in Kindergarten, later grades were far more susceptible.

Page 6: A Re-Evaluation of The Tennessee STAR Project

Precursor to Our Research‘Bad Apples’

• How do bad apples effect classroom performance?

Page 7: A Re-Evaluation of The Tennessee STAR Project

Findings• After calculating leave-out means to

determine ‘bad apples’ we stumbled upon this. newid StudentScore ClassAvg ClassSd

1 21.93 6.52 6.022 20.35 6.62 6.273 19.65 6.66 6.374 11.86 7.15 7.055 7.83 7.40 7.156 6.84 7.46 7.157 6.19 7.50 7.148 5.40 7.55 7.139 5.36 7.55 7.13

10 5.17 7.57 7.1211 4.78 7.59 7.1112 3.89 7.65 7.0913 2.53 7.73 7.0314 1.91 7.77 7.0015 1.37 7.80 6.9716 1.07 7.82 6.9517 0.08 7.88 6.88

Page 8: A Re-Evaluation of The Tennessee STAR Project

Shifting Focus

• The previous findings inspired the question is STAR truly random assignment?– We seek to measure the non-randomness through

the use of the first two moments.• Two sample t-tests.

– Using leave-out means and unequal variances.• Ratio of Variances.

– Compare ratios of groups defined by the previous t-test statistics

Page 9: A Re-Evaluation of The Tennessee STAR Project

Two Sample T-test &Leave-Out Means

42.556.4

73.875.882.6

73.3

Leave-Out mean of 71.1 Leave-Out mean of 63.1

When we run the t-test we compare the excluded class average with that of the leave-out mean. The results from this example show a t-stat of -5.3 and 2.8 respectively, thus under our cut-off of 3 standard deviations the second class is only just under our specification of a ‘flagged’ class.

T-stat -5.3

42.556.4

73.875.8

82.6

73.3

T-stat 2.8

Page 10: A Re-Evaluation of The Tennessee STAR Project

Original Dataset Results:Means

• For a cut-off t-stat=3– 36 classes were flagged across 17 schools.– 16 Small Classes.

• 13/16 were positive outliers thus 81.25% of these were more than 3 standard errors above the school leave-out mean.

– 20 Regular and Regular with Aide Classes.• 15/20 were negative outliers thus 75% of these were

more than 3 standard errors below the school leave-out mean.

Page 11: A Re-Evaluation of The Tennessee STAR Project

Ratio of Variances• Calculated by the ratio between School variance and class

variance.– With larger variance in the denominator.

• Assuming no selection bias.– We expect to see a ratio between class and school variance of

1.0.

• Assuming a selection bias.– We would expect classes to have a smaller variance and thus a

ratio < 1.0.

Page 12: A Re-Evaluation of The Tennessee STAR Project

Ratio of Variances

0 25 50 75 100

Test Scores

Small ClassRegular Class

Assuming a selection bias where small classes were picked from the top end of the distribution and vice-versa.

Page 13: A Re-Evaluation of The Tennessee STAR Project

Ratio of Variances

0 25 50 75 100

Test Scores8519

School Var.

Here we see the distributions of Small & Regular classes super imposed on the school.

Page 14: A Re-Evaluation of The Tennessee STAR Project

Original Dataset Results:Variance Ratios

• When comparing the ratio of variances between the flagged and non-flagged classes.– Flagged classes had an average ratio of 0.85– Non-flagged classes had an average ratio of 0.97

• The combined results of both moments tend to suggest some type of selection bias occurring in the dataset.

Page 15: A Re-Evaluation of The Tennessee STAR Project

Simulation

• What if the STAR experiment was repeated one million times with built-in random assignment?– What proportion of classes are flagged?– What does the ratio of variances look like?– Are Smaller Classes more likely to be positive

outliers?

Page 16: A Re-Evaluation of The Tennessee STAR Project

Simulation: Setup

• Using the original data.– Reshuffle the students within each of the 79 schools.– Re-calculate the t-stats to flag any outlying classes.– Repeat the entire operation one million times.– From the final results calculate expected proportion

of flagged classes.– Compare the mean values for the variance ratios.

Page 17: A Re-Evaluation of The Tennessee STAR Project

Simulation: Student Scores

• Given that we are using the original data we had to be able to remove the treatment effect prior to re-shuffling.– To do this we use the residuals from Kruger’s (1999)

original regression.• Using these values as a base ‘intelligence’ score for each

student we then added on the beta coefficient values of student characteristics to calculate the new percentile score.

• This method allows us to reshuffle the students in absence of the treatment effect.

Page 18: A Re-Evaluation of The Tennessee STAR Project

Simulation: Student ScoresStudent: 1

SES: 1Female: 0

Race: 1

Student: 2SES: 0

Female: 1Race: 1

Student: 3SES : 0

Female: 0Race: 0

Student: 4SES: 1

Female: 0Race: 1

Student: 5SES: 1

Female: 1Race: 0

Array of Student

Residuals

𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑆𝑐𝑜𝑟𝑒=𝛼+𝛽1𝑆𝑜𝑐𝑖𝑜𝐸𝑐𝑜𝑛𝑆𝑡𝑎𝑡𝑢𝑠+𝛽2𝐹𝑒𝑚𝑎𝑙𝑒+𝛽3𝑅𝑎𝑐𝑒+¿

Student 1 Score = 54Student 2 Score = 34Student 3 Score = 67Student 4 Score = 23Student 5 Score = 78

= -13.208 = 4.705 = 9.540

Page 19: A Re-Evaluation of The Tennessee STAR Project

Simulation: Results

• When we held the treatment effect at 5.55.– The highest frequency of flagged classes was 8.

• Two times the treatment effect.– The highest frequency was 14

• Three times the treatment effect.– The highest frequency was 32

• Despite having a treatment effect three times larger than the original we still did not see 36 flagged classes.

Page 20: A Re-Evaluation of The Tennessee STAR Project

Conclusion: Randomization

• Using a combination of the original data and a simulation.– Given that we did not see 36 flagged classes in the

simulation of one million iterations we can conclude that the probability of such an event happening is virtually zero.

Page 21: A Re-Evaluation of The Tennessee STAR Project

Conclusion: Regression

Variable ParameterEstimate

StandardError

P-Value

Intercept 54.62 2.94 <.0001Small Class Size 5.55 0.76 <.0001Regular Class Size 0.22 0.73 0.77White 9.54 1.27 <.0001Female 4.70 0.60 <.0001Socio-Economic Status -13.21 0.73 <.0001

Teacher Race -1.09 1.21 0.37Teacher Exp. 0.27 0.06 <.0001Teacher Degree -1.01 0.79 0.20

Results of the original regressionVariable Parameter

EstimateStandard

Errort Value

Intercept 54.48 3.59 <.0001Small Class Size 2.92 0.96 0.0024Regular Class Size 1.05 0.92 0.25White 10.15 1.62 <.0001Female 4.41 0.76 <.0001Socio-Economic Status

-13.86 0.87 <.0001

Teacher Race -0.13 2.23 0.95Teacher Exp. 0.16 0.08 0.05Teacher Degree -1.19 0.93 0.20

Results with flagged schools removed

Removing the flagged schools cuts the estimate of the treatment effect in half.

Page 22: A Re-Evaluation of The Tennessee STAR Project

Further Investigation

• Test other variables for randomization.– Gender, race, socio-economic status, and age.

• Using simulation data.– Run a regression for each iteration and calculate a

treatment effect beta.• Plot the distribution of the treatment effect for hypothesis

testing.– Remove flagged classes from each simulation and

see what the reduction in treatment effect is.• Compare with the reduction seen in the original data.