Running head: STATISTICS EXCEL PROJECT 1
Statistics Excel Project
Elle Parsons (Sauro) and Mackenzie Quartly
Seattle Pacific University
Interpreting & Apply Educational Research
Spring Quarter, 2012
Running head: STATISTICS EXCEL PROJECT1
This dataset was compiled in response to recent controversy over equity in
public school expenditures. While some argue that the prevailing system of
financing local schools is unfair, aggregate data reported in the media seems
to suggest, paradoxically, that school spending and academic performance are
statistically unrelated. The data was collected from the 50 United States,
plus the District of Columbia. The variables chosen for analysis were:
estimated average annual salary of teachers in public elementary and secondary
schools, percent of students in elementary and secondary schools who are
eligible for free or reduced-price lunch (socioeconomic status, or SES), and
average verbal SAT scores.
Part I
The data collected from the 50 United States, plus the District of
Columbia, includes: estimated average annual salary of teachers in public
elementary and secondary schools, percent of students in elementary and
secondary schools who are eligible for free or reduced-price lunch (SES), and
average verbal SAT scores. To analyze the distributions of this data, the
data has been put into histograms, box plots, and a table (see below).
(Table 1. Descriptive Statistics for the Variables of Interest)
Estimated Average Salary SES Average Verbal SAT Score
N Valid 51 50 51
Missing 0 1 0
Mean 47679.08 39.82 534.94
Median 45575.00 37.35 523.00
Std. Deviation
6942.01 10.64 37.80
STATISTICS EXCEL PROJECT 3
Skewness .60 .62 .31
Std. Error of Skewness
.33 .34 .33
Minimum 35607.00 17.70 482.00
Maximum 61372.00 67.50 610.00
In the histogram for estimated average annual salary of teachers
(Figure 1), the frequency distribution appears to be positively skewed. Most
of the scores for the estimated average salary are clustered around $40,000
and $45,000. The positive skewness is also supported by Table 1, which shows
that the mean estimated average salary of $47,679.08 is greater than the
median estimated average salary of $45,575.00. The skewness level is 0.60.
There is also a smaller cluster of scores around $55,000 to around $62,500,
which contributes to the positive skew of the frequency distribution of
estimated average salary, as well as the mean being higher than the median.
Because of this, we can conclude that the data did not follow a normal
distribution. As you can see in Figure 2, there doesn’t appear to be any
major outliers affecting the skew. Data presented in Table 1 for estimated
average salary shows that a wide range exists among the 50 United States and
District of Columbia, with the minimum estimated average salary being
$35,607.00 (South Dakota) and the maximum estimated average salary being
$61,372.00 (California).
STATISTICS EXCEL PROJECT 4
Figure 1: Estimated Average Salary Histogram
0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000
(Figure 2: Estimated Average Salary Boxplot)
In the histogram for socioeconomic status (SES) as measured by percent
of students on free and reduced lunch (Figure 3), the frequency distribution
appears to be positively skewed. Most of the scores for the estimated SES are
clustered around 30% to 40%. Table 1 also supports this conclusion. Table 1
shows that the mean SES of 39.82% is greater than the median SES of 37.85%.
The skewness level is 0.62. There is also a smaller cluster of scores around
40% to around 50%, which contributes to the positive skewness of the frequency
distribution of estimated SES. Because of this, we can conclude that the data
did not follow a normal distribution. In Figure 4, there is an outlier of
zero, because Nevada has no information available. This may contribute to the
STATISTICS EXCEL PROJECT 5
mean score, which may have affected the skew because the distribution
frequency could have been more positively skewed if there wasn’t this outlier.
Data presented in Table 1 for estimated SES shows that a wide range exists
among the 50 United States and District of Columbia, with the minimum SES
being 17.70% (New Hampshire) and the maximum SES being 67.5% (Mississippi).
(Figure 3: Percentage of Students Eligible for Free/Reduced Lunch Histogram)
-40 -20 0 20 40 60 80 100 120
(Figure 4: Percentage of Students Eligible for Free/Reduced Lunch Boxplot)
In the histogram for verbal SAT scores (Figure 5), the frequency
distribution appears to be slightly positively skewed. The mean score for the
verbal SAT is 534.94 and the median score is 523.00. This shows that the
average verbal SAT score frequency distribution is positively skewed, with a
skewness level of 0.31. Because of this, we can conclude that the data did
STATISTICS EXCEL PROJECT 6
not follow a normal distribution. Data presented in Table 1 for estimated
average verbal SAT scores shows that a wide range exists among the 50 United
States and District of Columbia, with the minimum average verbal SAT score
being 482.00 (Hawaii) and the maximum average verbal SAT score being 610.00
(North Dakota).
Figure 5: Average Verbal SAT Score Histogram
200 300 400 500 600 700 800 900
Figure 6: Average Verbal SAT Score Boxplot
Part 2
We have examined the distribution of the variables of average salary,
socioeconomic status as determined by free and reduced lunch, and the average
verbal SAT scores. By looking at the four regions of West, Midwest, South and
Northeast and comparing the data, we have now analyzed any differences within
STATISTICS EXCEL PROJECT 7
these regions. We determined whether a significant and/or practical
difference existed in each area by running an Analysis of Variance, or ANOVA.
The first area we looked at was estimated average salary. Looking at Table 3, the p-value is
approximately at .02, which indicates the p-value is smaller than .05 but larger than .01. This data
supports that a significant difference in group means exists. Further examination of Table 2 shows a
Partial Eta Squared value of .18. This data supports that an effect size of .18 exists, which means that 18
percent of the variance in current estimated average salary is due to the difference in regions. While
this data supports that an effect is present and that this effect has significance, it does not tell us where
the effect exists.
We can look at Table 4 (Tukey) to compare the regions and see where the effects are happening.
Examination of Table 4 shows that when comparing the Northeast to the other three regions, a
consistent p-value that is approaching zero exists, at approximately .0001. This data supports that the
significant difference lies between the Northeast and the other three regions, in terms of estimated
average salary. When comparing the other three regions–West, Midwest, and South–to each other in
terms of current estimated average salary, a p-value of .92 or greater exists. This data does not support
that a significant difference exists between the West, Midwest or South; the effect appears to lie
between the Northeast and the other three regions. It is important to examine the direction of the
difference between the Northeast and the other three regions. As we can see in Table 2, the Northeast
has a larger mean average salary of $53,864.89, while the other regions’ means range from $45,717.35
to $ 47,223.38.
(Table 2: Descriptive Statistics for average salary)
STATISTICS EXCEL PROJECT 8
Descriptive Statistics
Dependent Variable:estimated ave salary 2005-2006
region MeanStd.
Deviation N
West 47223.38 5969.66 13
Midwest 46312.50 7302.27 12
South 45717.35 6086.72 17
Northeast 53864.89 6779.55 9
Total 47679.08 6942.01 51
(Table 3: Test of Between Subject Effects)
Tests of Between-Subjects Effects
Dependent Variable:estimated ave salary 2005-2006
Source
Type III Sum of
Squares df Mean Square F P Partial Eta Squared
region 434910474.84 3 144970158.28 3.45 .02 .18
Error 1974666324.85 47 42014177.12
Total 118347597323.00 51
Corrected Total
2409576799.69 50
(Table 4: Multiple Comparisons of Estimated Average Salary)
Multiple Comparisons
estimated ave salary 2005-2006Tukey HSD
(I) region (J) region
Mean Difference
(I-J) Std. Error Sig.
95% Confidence Interval
Lower BoundUpper Bound
West Midwest 910.88 2594.81 .99 -6000.11 7821.88
STATISTICS EXCEL PROJECT 9
South 1506.03 2388.15 .92 -4854.55 7866.62
Northeast -6641.50 2810.72 .10 -14127.52 844.52
Midwest West -910.88 2594.81 .99 -7821.87 6000.11
South 595.15 2443.89 1.00 -5913.89 7104.18
Northeast -7552.39 2858.22 .05 -15164.94 60.16
South West -1506.03 2388.15 .92 -7866.62 4854.55
Midwest -595.15 2443.89 1.00 -7104.18 5913.89
Northeast -8147.54* 2672.01 .02 -15264.15 -1030.92
Northeast West 6641.50 2810.71 .10 -844.52 14127.52
Midwest 7552.39 2858.22 .05 -60.16 15164.94
South 8147.54* 2672.01 .02 1030.92 15264.15
In the next section, we looked at socioeconomic status as determined by percentage of free and
reduced lunch. Looking at Table 6, the p-value is at .0001, which indicates that this is smaller than .01 or
.05. This data supports that a significant difference in group means exists. Further examination of Table
7 shows a Partial Eta Squared value of .49. This data supports that an effect size of .49 exists, which
means that 49 percent of the variance in current estimated average salary is due to the difference in
regions. While this data supports that an effect is present and that it is of significance, it does not tell us
where the effect exists.
We can look at Table 7 (Tukey) to compare the regions and see where the effects are happening.
Examination of Table 7 shows that there is a significant difference between the Midwest and the West
(.38) and between the Midwest and the Northeast (.54), but not a significance between the Midwest
and the South (.00). When comparing the South to the other three regions, a constant p-value that is
getting close to zero exists. This data supports that the significant difference is between the Midwest
and the Northeast and between the Midwest and the West, in terms of socioeconomic status. There is
STATISTICS EXCEL PROJECT 10
not a significant difference between the South and any of the other regions indicated, as the p-value
is .01 or lower. It is important to look at the direction of the difference between the Midwest and the
West and between the Midwest and the Northeast. In Table 5, the South has a larger mean SES of
49.14, while the other regions’ means range from 29.78 to 39.57. In the South the percentage of people
with free and reduced lunch is higher than in the other three regions.
(Table 5: Descriptive Statistics: % of Students Eligible for Free/Reduced Lunch)
Descriptive Statistics
Dependent Variable:% of students eligible for free/reduced lunch 2006-07
region MeanStd.
Deviation N
West 39.57 8.600 12
Midwest 34.43 3.77 12
South 49.14 9.53 17
Northeast 29.78 7.02 9
Total 39.82 10.64 50
(Table 6: Tests of Between-Subject Effects)
Tests of Between-Subjects Effects
Dependent Variable:% of students eligible for free/reduced lunch 2006-07
SourceType III Sum of Squares df
Mean Square F P
Partial Eta Squared
region 2732.83 3 910.94 14.87 .00 .49
Error 2817.70 46 61.25
Total 84848.08 50
Corrected Total 5550.53 49
a. R Squared = .492 (Adjusted R Squared = .459)
(Table 7: Multiple Comparisons of % of Students Eligible for Free/Reduced Lunch)
STATISTICS EXCEL PROJECT 11
Multiple Comparisons
% of students eligible for free/reduced lunch 2006-07Tukey HSD
(I) region (J) region
Mean Difference
(I-J)Std. Error P
95% Confidence Interval
Lower Bound
Upper Bound
West Midwest 5.14 3.20 .38 -3.38 13.66
South -9.57* 2.95 .01 -17.43 -1.70
Northeast 9.79* 3.45 .03 .59 18.99
Midwest West -5.14 3.20 .38 -13.66 3.38
South -14.71* 2.95 .00 -22.58 -6.84
Northeast 4.65 3.45 .54 -4.55 13.85
South West 9.57* 2.95 .01 1.70 17.43
Midwest 14.71* 2.95 .00 6.84 22.58
Northeast 19.36* 3.23 .00 10.76 27.96
Northeast West -9.79* 3.45 .03 -18.99 -.59
Midwest -4.65 3.45 .54 -13.85 4.55
South -19.36* 3.23 .00 -27.96 -10.76
*. The mean difference is significant at the .05 level.
When looking at verbal SAT scores, in Table 8 we can see the mean percentage of all eligible
students taking the SAT in 2006-07, which ranges from 12.67 percent in the Midwest to 81.44 percent in
the Northeast, with a total overall mean of 39.33.
Analyzing Table 9 shows us that the average verbal SAT scores vary across the different regions.
While the West has an average verbal SAT score of 528.6, the Midwest has an average verbal SAT score
of 576.50, the South has an average verbal SAT score of 526.76, and the Northeast has an average verbal
STATISTICS EXCEL PROJECT 12
SAT score of 504.00. This data supports the conclusion that the Midwest has the highest average mean
SAT score of 576.50, which is higher than the other three regions.
It could easily be said that since the Midwest only has 12.67 percent of eligible students taking
the SAT, their sample size is smaller and has a lower population than the Northeast. Since the Northeast
has a high percentage of eligibility, this could present a bias (i.e. due to the large population size, not all
students may be interested in getting good scores). While in the Midwest, students who are taking the
test may be very interested in doing well. In the Northeast, since most students take the SAT, the
sample size could consist of all types of students, whereas in the Midwest only higher-performing
students may have taken the test. It would have been helpful to know the GPAs of students as a
predictor to their SAT scores, and/or how many students are in each grade in the different regions.
When thinking about the verbal SAT scores, we looked at Table 10, where a p-value of .0001
exists which is smaller than a significance level of .05 or .01. This data supports that a significant
difference in the group means exists. Further examination of Table10 shows a Partial Eta Squared value
of .43. This data supports that an effect size of .43 exists, which means that 43 percent of the variance
in regions can be attributed to average verbal SAT score. While this data supports that an effect is
present and that it is of significance, it does not tell us where the effect exists.
Calculating the Tukey (HSD) allows us to compare the regions and determine where the effect
lies. Examination of Table 11 illustrates that when comparing the Midwest to the other three regions,
the mean difference in the average verbal SAT score is significant at the .05 level. This data supports that
the significant difference lies within the Midwest, in terms of average verbal SAT score. When
comparing the three other regions–West, Northeast, and South–to each other in terms of average
verbal SAT score, a mean difference at the significance level of .05 only exists when comparing the West
to the Northeast. Table 9 indicates that the average verbal SAT for the West is higher than that of the
STATISTICS EXCEL PROJECT 13
Northeast. Although this significance exists, it is still smaller than the difference between the Midwest
compared to the other three regions, and seems to have less practical significance. The biggest effect
appears to lie within the Midwest when compared to the other three regions.
(Table 8: Descriptive Statistics: % of Eligible Students Taking the Verbal SAT)
Descriptive Statistics
Dependent Variable:average verbal SAT score 2005-06
region MeanStd.
Deviation N
West 528.69 24.87 13
Midwest 576.50 31.05 12
South 526.76 36.69 17
Northeast 504.00 10.48 9
Total 534.94 37.80 51
(Table 10: Tests of Between Subject Effects)
Tests of Between-Subjects Effects
Dependent Variable:average verbal SAT score 2005-06
SourceType III Sum of Squares df
Mean Square F P
Partial Eta Squared
region 30986.00 3 10328.67 12.00 .00 .43
Error 40450.83 47 860.66
Total 14665702.00 51
Corrected Total 71436.82 50
a. R Squared = .434 (Adjusted R Squared = .398)
(Table 11: Multiple Comparisons of Average Verbal SAT Score)
Multiple Comparisons
STATISTICS EXCEL PROJECT 14
average verbal SAT score 2005-06Dunnett C
(I) region (J) region
Mean Difference
(I-J)Std. Error
95% Confidence Interval
Lower Bound
Upper Bound
West Midwest -47.81* 11.31 -81.68 -13.94
South 1.93 11.26 -30.74 34.60
Northeast 24.69* 7.73 1.37 48.02
Midwest West 47.81* 11.31 13.94 81.68
South 49.74* 12.63 12.65 86.82
Northeast 72.50* 9.62 43.31 101.69
South West -1.93 11.26 -34.60 30.74
Midwest -49.74* 12.63 -86.82 -12.65
Northeast 22.77 9.56 -5.02 50.55
Northeast West -24.69* 7.73 -48.02 -1.37
Midwest -72.50* 9.62 -101.69 -43.31
South -22.77 9.56 -50.55 5.02
*. The mean difference is significant at the .05 level.
Part 3
In the third part of this project we looked at scatter plots and regression equations for the
following pairs of variables: expenditure per student and verbal SAT scores, salary and verbal SAT scores,
and socioeconomic status and verbal SAT scores. Looking at these scatter plots provides us insight into
whether or not a relationship exists between variables, as well as the strength and direction of the
relationship.
STATISTICS EXCEL PROJECT 15
Examining Table 13, the data suggests that a moderate negative relationship exists between
expenditure per pupil and verbal SAT score, with the Pearson r value being -0.42. Furthermore, the p-
value is approaching zero, which suggests it is a significant relationship. The scatter plot for expenditure
per pupil and verbal SAT score (Figure 7) also suggests that a negative relationship exists because as the
expenditure per pupil increases, the average verbal SAT score goes down. Table 3 shows how practically
significant the relationship is between expenditure per pupil and verbal SAT score. The R Square for
expenditure per pupil and verbal SAT score is .17, meaning that 17 percent of the variance in verbal SAT
score among the regions can be attributed to expenditure per pupil. One may conclude that this R
Square is not very practically significant, since it only accounts for 17 percent of the variance; 83 percent
of the variance in verbal SAT score can be attributed other variables.
Analysis of the data found in Table 15, points to the fact that the slope in the scatter plot for
expenditure per pupil and verbal SAT score is -.01; this suggests that as the expenditure per pupil
increases by one unit, the verbal SAT score decreases by .01.
(Table 12: Descriptive Statistics for Expenditure Per Pupil)
Descriptive Statistics
Mean Std. Deviation
Expenditure/ pupil (1000$) 10,327.78 2,502.20
SAT score 2005-06 (verbal) 534.94 37.80
SAT score 2005-06 (math) 540.59 37.46
SAT score 2005-06 (writing) 525.37 37.63
STATISTICS EXCEL PROJECT 16
(Table 13: Correlation Matrix for Expenditure Per Pupil and Verbal SAT Score)
verbal SAT score
2005-06
math SAT score
2005-06
writing SAT score
2005-06
Expenditure/ pupil 2005-06 Pearson r -0.42** -0.39** -0.40**
p 0.00 0.00 0.00
n 51 51 51
(Figure 7: Current Expenditure Per Pupil Histogram)
(Table 14: Model Summary Table for Expenditure Per Pupil and Verbal SAT Score)
Model R R Square
Std. Error of the
Estimate
1 .39a .16 34.79
STATISTICS EXCEL PROJECT 17
(Table 15: Coefficients Table for Expenditure Per Pupil and Average Verbal SAT Score)
Coefficientsa
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.B Std. Error Beta
1 (Constant) 601.42 20.88 28.80 .00
Expenditure/ pupil 2005-06 -.01 .00 -.39 -3.00 .00
a. Dependent Variable: average math SAT score 2005-06
Salary and SAT Scores: Examining Table 16, the data suggests that a small negative relationship
exists between estimated average salary and verbal SAT score, with the Pearson r value being -0.48.
Furthermore, the p-value is approaching zero, which suggests it is a significant relationship. However it
should be noted that the scatter plot for estimated average salary and verbal SAT score (Figure 8) does
not show the pattern that the Pearson r value suggests. Table 18 provides insight into how practically
significant the relationship is between salary and verbal SAT score. The R Square for estimated average
salary and verbal SAT score is approaching zero, meaning that a very little percentage of the variance in
verbal SAT score among the regions can be attributed to salary. One may conclude that this R Square is
not very practically significant, since it does not account for any percentage of the variance; close to 100
percent of the variance in verbal SAT score can be attributed to other variables.
Analysis of the data found in Table 19 indicates that the slope in the scatter plot for estimated
average salary and verbal SAT score is negative and approaching zero; this suggests that as the
estimated average salary increases by one unit, the verbal SAT score decreases by very little. This slope
that is negative and approaching zero seems to contradict the Pearson r value of -0.48, which can be
described as a negative moderate relationship.
(Table 16: Descriptive Statistics for Average Salary and Verbal SAT Scores)
STATISTICS EXCEL PROJECT 18
Descriptive Statistics
Mean Std. Deviation
ave salary 2005-2006 47,679.08 6,942.01
writing SAT score 2005-06 525.37 37.63
verbal SAT score 2005-06 534.94 37.80
math SAT score 2005-06 540.59 37.46
(Table 17: Correlation Matrix for Average Salary and Verbal SAT Score)
average writing SAT
score 2005-06
average verbal SAT
score 2005-06
average math SAT
score 2005-06
estimated ave salary
2005-2006
Pearson r -.45** -.48** -.41**
p .00 .00 .00
n 51 51 51
(Figure 8: Scatterplot for Average salary and Average Verbal SAT Score)
STATISTICS EXCEL PROJECT 19
(Table 18: Model Summary Table for Average Salary & Verbal SAT Score)
Model R R Square
Std. Error of the
Estimate
1 .48a .23 33.61
(Table 19: Coefficient Table for Average Salary & Average Verbal SAT Score)
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig.B Std. Error Beta
1 (Constant) 658.20 32.98 19.96 .00
ave salary 2005-2006 -.00 .00 -.48 -3.78 .00
a. Dependent Variable: average verbal SAT score 2005-06
SES and SAT Scores: Examining Table 21, the data suggests that a small positive relationship
exists between percentage of students eligible for free/reduced priced lunch and verbal SAT score, with
the Pearson r value being .02. Furthermore, the p-value is 0.87, which suggests it is not significant. The
scatter plot for percentage of students eligible for free/reduced lunch and verbal SAT score (Figure 9)
also suggests a very weak relationship or pattern between percentage of students eligible for
free/reduced price lunch and verbal SAT score. Table 22 provides insight into how practically significant
the relationship is between SES and verbal SAT score. The R Square for percentage of students eligible
for free or reduced-priced lunch and verbal SAT score is .0o, meaning that no percentage of the variance
in verbal SAT score among the regions can be explained by the percentage of students eligible for
free/reduced-price lunch. One may conclude that this R Square aligns nicely with the very slight positive
Pearson R value and is not practically significant, since it does not account for any of the variance. This
means that 100 percent of the variance in verbal SAT score may be explained by other variables.
STATISTICS EXCEL PROJECT 20
Analysis of the data found in Table 23 indicates that the slope in the scatter plot for percent of
students eligible for free/reduced lunch and verbal SAT Score is .09; this suggests that there is no
relationship between free and reduced lunch and verbal SAT score.
As mentioned earlier, selection bias may have occurred because fewer students in the Midwest
participate in the SAT and their scores may indicate that of the highest performing populations.
Furthermore, the Midwest’s scores are then compared to other regions like the Northeast, which has a
lower average SAT score, but a higher percentage of students participating in the SAT. The validity
behind the correlation and regression data may be slightly inaccurate. This may be due to an error or a
bias in the study.
(Table 20: Descriptive Statistics of % Eligible for Free/Reduced Lunch & Verbal SAT Score)
Descriptive Statistics
Mean Std. Deviation
% of students eligible for free/reduced
lunch 2006-07
39.82 10.64
writing SAT score 2005-06 525.37 37.63
verbal SAT score 2005-06 534.94 37.80
math SAT score 2005-06 540.59 37.46
(Table 21: Correlation Matrix for % of Students Eligible for Free/Reduced Lunch & Verbal SAT Score)
writing SAT score
2005-06
verbal SAT score
2005-06
math SAT score
2005-06
% of students eligible
for free/reduced lunch
2006-07
Pearson r .08 .02 -.08
p .58 .87 .57
n 50 50 50
STATISTICS EXCEL PROJECT 21
(Figure 9: Scatterplot for % of Students Eligible for Free/Reduced Lunch & Average Verbal SAT Score)
(Table 22: Model Summary Table for % of Students Eligible for Free/Reduced Lunch & Average Verbal SAT Score)
Model R R Square
Std. Error of the
Estimate
1 .02a .00 38.19
(Table 23: Coefficient Table for % of Students Eligible for Free/Reduced Lunch & Average Verbal SAT Score)
STATISTICS EXCEL PROJECT 22
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig.B Std. Error Beta
1 (Constant) 532.29 21.12 25.21 .00
% of students eligible
for free/reduced lunch
2006-07
.09 .51 .02 .17 .87
a. Dependent Variable: average verbal SAT score 2005-06