Download - elleparsons.files.wordpress.com€¦ · Web viewThis dataset was compiled in response to recent controversy over equity in public school expenditures. While some argue that the

Running head: STATISTICS EXCEL PROJECT 1

Statistics Excel Project

Elle Parsons (Sauro) and Mackenzie Quartly

Seattle Pacific University

Interpreting & Apply Educational Research

Spring Quarter, 2012

Running head: STATISTICS EXCEL PROJECT1

This dataset was compiled in response to recent controversy over equity in

public school expenditures. While some argue that the prevailing system of

financing local schools is unfair, aggregate data reported in the media seems

to suggest, paradoxically, that school spending and academic performance are

statistically unrelated. The data was collected from the 50 United States,

plus the District of Columbia. The variables chosen for analysis were:

estimated average annual salary of teachers in public elementary and secondary

schools, percent of students in elementary and secondary schools who are

eligible for free or reduced-price lunch (socioeconomic status, or SES), and

average verbal SAT scores.

Part I

The data collected from the 50 United States, plus the District of

Columbia, includes: estimated average annual salary of teachers in public

elementary and secondary schools, percent of students in elementary and

secondary schools who are eligible for free or reduced-price lunch (SES), and

average verbal SAT scores. To analyze the distributions of this data, the

data has been put into histograms, box plots, and a table (see below).

(Table 1. Descriptive Statistics for the Variables of Interest)

Estimated Average Salary SES Average Verbal SAT Score

N Valid 51 50 51

Missing 0 1 0

Mean 47679.08 39.82 534.94

Median 45575.00 37.35 523.00

Std. Deviation

6942.01 10.64 37.80

STATISTICS EXCEL PROJECT 3

Skewness .60 .62 .31

Std. Error of Skewness

.33 .34 .33

Minimum 35607.00 17.70 482.00

Maximum 61372.00 67.50 610.00

In the histogram for estimated average annual salary of teachers

(Figure 1), the frequency distribution appears to be positively skewed. Most

of the scores for the estimated average salary are clustered around $40,000

and $45,000. The positive skewness is also supported by Table 1, which shows

that the mean estimated average salary of $47,679.08 is greater than the

median estimated average salary of $45,575.00. The skewness level is 0.60.

There is also a smaller cluster of scores around $55,000 to around $62,500,

which contributes to the positive skew of the frequency distribution of

estimated average salary, as well as the mean being higher than the median.

Because of this, we can conclude that the data did not follow a normal

distribution. As you can see in Figure 2, there doesn’t appear to be any

major outliers affecting the skew. Data presented in Table 1 for estimated

average salary shows that a wide range exists among the 50 United States and

District of Columbia, with the minimum estimated average salary being

$35,607.00 (South Dakota) and the maximum estimated average salary being

$61,372.00 (California).


Figure 1: Estimated Average Salary Histogram

0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000

(Figure 2: Estimated Average Salary Boxplot)

In the histogram for socioeconomic status (SES) as measured by percent

of students on free and reduced lunch (Figure 3), the frequency distribution

appears to be positively skewed. Most of the scores for the estimated SES are

clustered around 30% to 40%. Table 1 also supports this conclusion. Table 1

shows that the mean SES of 39.82% is greater than the median SES of 37.85%.

The skewness level is 0.62. There is also a smaller cluster of scores around

40% to around 50%, which contributes to the positive skewness of the frequency

distribution of estimated SES. Because of this, we can conclude that the data

did not follow a normal distribution. In Figure 4, there is an outlier of

zero, because Nevada has no information available. This may contribute to the


mean score, which may have affected the skew because the distribution

frequency could have been more positively skewed if there wasn’t this outlier.

Data presented in Table 1 for estimated SES shows that a wide range exists

among the 50 United States and District of Columbia, with the minimum SES

being 17.70% (New Hampshire) and the maximum SES being 67.5% (Mississippi).

(Figure 3: Percentage of Students Eligible for Free/Reduced Lunch Histogram)

-40 -20 0 20 40 60 80 100 120

(Figure 4: Percentage of Students Eligible for Free/Reduced Lunch Boxplot)

In the histogram for verbal SAT scores (Figure 5), the frequency

distribution appears to be slightly positively skewed. The mean score for the

verbal SAT is 534.94 and the median score is 523.00. This shows that the

average verbal SAT score frequency distribution is positively skewed, with a

skewness level of 0.31. Because of this, we can conclude that the data did


not follow a normal distribution. Data presented in Table 1 for estimated

average verbal SAT scores shows that a wide range exists among the 50 United

States and District of Columbia, with the minimum average verbal SAT score

being 482.00 (Hawaii) and the maximum average verbal SAT score being 610.00

(North Dakota).

Figure 5: Average Verbal SAT Score Histogram

200 300 400 500 600 700 800 900

Figure 6: Average Verbal SAT Score Boxplot

Part 2

We have examined the distribution of the variables of average salary,

socioeconomic status as determined by free and reduced lunch, and the average

verbal SAT scores. By looking at the four regions of West, Midwest, South and

Northeast and comparing the data, we have now analyzed any differences within


these regions. We determined whether a significant and/or practical

difference existed in each area by running an Analysis of Variance, or ANOVA.

The first area we looked at was estimated average salary. Looking at Table 3, the p-value is

approximately at .02, which indicates the p-value is smaller than .05 but larger than .01. This data

supports that a significant difference in group means exists. Further examination of Table 2 shows a

Partial Eta Squared value of .18. This data supports that an effect size of .18 exists, which means that 18

percent of the variance in current estimated average salary is due to the difference in regions. While

this data supports that an effect is present and that this effect has significance, it does not tell us where

the effect exists.

We can look at Table 4 (Tukey) to compare the regions and see where the effects are happening.

Examination of Table 4 shows that when comparing the Northeast to the other three regions, a

consistent p-value that is approaching zero exists, at approximately .0001. This data supports that the

significant difference lies between the Northeast and the other three regions, in terms of estimated

average salary. When comparing the other three regions–West, Midwest, and South–to each other in

terms of current estimated average salary, a p-value of .92 or greater exists. This data does not support

that a significant difference exists between the West, Midwest or South; the effect appears to lie

between the Northeast and the other three regions. It is important to examine the direction of the

difference between the Northeast and the other three regions. As we can see in Table 2, the Northeast

has a larger mean average salary of $53,864.89, while the other regions’ means range from $45,717.35

to $ 47,223.38.

(Table 2: Descriptive Statistics for average salary)


Descriptive Statistics

Dependent Variable:estimated ave salary 2005-2006

region MeanStd.

Deviation N

West 47223.38 5969.66 13

Midwest 46312.50 7302.27 12

South 45717.35 6086.72 17

Northeast 53864.89 6779.55 9

Total 47679.08 6942.01 51

(Table 3: Test of Between Subject Effects)

Tests of Between-Subjects Effects

Dependent Variable:estimated ave salary 2005-2006

Source

Type III Sum of

Squares df Mean Square F P Partial Eta Squared

region 434910474.84 3 144970158.28 3.45 .02 .18

Error 1974666324.85 47 42014177.12

Total 118347597323.00 51

Corrected Total

2409576799.69 50

(Table 4: Multiple Comparisons of Estimated Average Salary)

Multiple Comparisons

estimated ave salary 2005-2006Tukey HSD

(I) region (J) region

Mean Difference

(I-J) Std. Error Sig.

95% Confidence Interval

Lower BoundUpper Bound

West Midwest 910.88 2594.81 .99 -6000.11 7821.88


South 1506.03 2388.15 .92 -4854.55 7866.62

Northeast -6641.50 2810.72 .10 -14127.52 844.52

Midwest West -910.88 2594.81 .99 -7821.87 6000.11

South 595.15 2443.89 1.00 -5913.89 7104.18

Northeast -7552.39 2858.22 .05 -15164.94 60.16

South West -1506.03 2388.15 .92 -7866.62 4854.55

Midwest -595.15 2443.89 1.00 -7104.18 5913.89

Northeast -8147.54* 2672.01 .02 -15264.15 -1030.92

Northeast West 6641.50 2810.71 .10 -844.52 14127.52

Midwest 7552.39 2858.22 .05 -60.16 15164.94

South 8147.54* 2672.01 .02 1030.92 15264.15

In the next section, we looked at socioeconomic status as determined by percentage of free and

reduced lunch. Looking at Table 6, the p-value is at .0001, which indicates that this is smaller than .01 or

.05. This data supports that a significant difference in group means exists. Further examination of Table

7 shows a Partial Eta Squared value of .49. This data supports that an effect size of .49 exists, which

means that 49 percent of the variance in current estimated average salary is due to the difference in

regions. While this data supports that an effect is present and that it is of significance, it does not tell us

where the effect exists.

We can look at Table 7 (Tukey) to compare the regions and see where the effects are happening.

Examination of Table 7 shows that there is a significant difference between the Midwest and the West

(.38) and between the Midwest and the Northeast (.54), but not a significance between the Midwest

and the South (.00). When comparing the South to the other three regions, a constant p-value that is

getting close to zero exists. This data supports that the significant difference is between the Midwest

and the Northeast and between the Midwest and the West, in terms of socioeconomic status. There is


not a significant difference between the South and any of the other regions indicated, as the p-value

is .01 or lower. It is important to look at the direction of the difference between the Midwest and the

West and between the Midwest and the Northeast. In Table 5, the South has a larger mean SES of

49.14, while the other regions’ means range from 29.78 to 39.57. In the South the percentage of people

with free and reduced lunch is higher than in the other three regions.

(Table 5: Descriptive Statistics: % of Students Eligible for Free/Reduced Lunch)


Dependent Variable:% of students eligible for free/reduced lunch 2006-07

region MeanStd.

Deviation N

West 39.57 8.600 12

Midwest 34.43 3.77 12

South 49.14 9.53 17

Northeast 29.78 7.02 9

Total 39.82 10.64 50

(Table 6: Tests of Between-Subject Effects)


Dependent Variable:% of students eligible for free/reduced lunch 2006-07

SourceType III Sum of Squares df

Mean Square F P

Partial Eta Squared

region 2732.83 3 910.94 14.87 .00 .49

Error 2817.70 46 61.25

Total 84848.08 50

Corrected Total 5550.53 49

a. R Squared = .492 (Adjusted R Squared = .459)

(Table 7: Multiple Comparisons of % of Students Eligible for Free/Reduced Lunch)



% of students eligible for free/reduced lunch 2006-07Tukey HSD


Mean Difference

(I-J)Std. Error P


Lower Bound

Upper Bound

West Midwest 5.14 3.20 .38 -3.38 13.66

South -9.57* 2.95 .01 -17.43 -1.70

Northeast 9.79* 3.45 .03 .59 18.99

Midwest West -5.14 3.20 .38 -13.66 3.38

South -14.71* 2.95 .00 -22.58 -6.84

Northeast 4.65 3.45 .54 -4.55 13.85

South West 9.57* 2.95 .01 1.70 17.43

Midwest 14.71* 2.95 .00 6.84 22.58

Northeast 19.36* 3.23 .00 10.76 27.96

Northeast West -9.79* 3.45 .03 -18.99 -.59

Midwest -4.65 3.45 .54 -13.85 4.55

South -19.36* 3.23 .00 -27.96 -10.76

*. The mean difference is significant at the .05 level.

When looking at verbal SAT scores, in Table 8 we can see the mean percentage of all eligible

students taking the SAT in 2006-07, which ranges from 12.67 percent in the Midwest to 81.44 percent in

the Northeast, with a total overall mean of 39.33.

Analyzing Table 9 shows us that the average verbal SAT scores vary across the different regions.

While the West has an average verbal SAT score of 528.6, the Midwest has an average verbal SAT score

of 576.50, the South has an average verbal SAT score of 526.76, and the Northeast has an average verbal


SAT score of 504.00. This data supports the conclusion that the Midwest has the highest average mean

SAT score of 576.50, which is higher than the other three regions.

It could easily be said that since the Midwest only has 12.67 percent of eligible students taking

the SAT, their sample size is smaller and has a lower population than the Northeast. Since the Northeast

has a high percentage of eligibility, this could present a bias (i.e. due to the large population size, not all

students may be interested in getting good scores). While in the Midwest, students who are taking the

test may be very interested in doing well. In the Northeast, since most students take the SAT, the

sample size could consist of all types of students, whereas in the Midwest only higher-performing

students may have taken the test. It would have been helpful to know the GPAs of students as a

predictor to their SAT scores, and/or how many students are in each grade in the different regions.

When thinking about the verbal SAT scores, we looked at Table 10, where a p-value of .0001

exists which is smaller than a significance level of .05 or .01. This data supports that a significant

difference in the group means exists. Further examination of Table10 shows a Partial Eta Squared value

of .43. This data supports that an effect size of .43 exists, which means that 43 percent of the variance

in regions can be attributed to average verbal SAT score. While this data supports that an effect is

present and that it is of significance, it does not tell us where the effect exists.

Calculating the Tukey (HSD) allows us to compare the regions and determine where the effect

lies. Examination of Table 11 illustrates that when comparing the Midwest to the other three regions,

the mean difference in the average verbal SAT score is significant at the .05 level. This data supports that

the significant difference lies within the Midwest, in terms of average verbal SAT score. When

comparing the three other regions–West, Northeast, and South–to each other in terms of average

verbal SAT score, a mean difference at the significance level of .05 only exists when comparing the West

to the Northeast. Table 9 indicates that the average verbal SAT for the West is higher than that of the


Northeast. Although this significance exists, it is still smaller than the difference between the Midwest

compared to the other three regions, and seems to have less practical significance. The biggest effect

appears to lie within the Midwest when compared to the other three regions.

(Table 8: Descriptive Statistics: % of Eligible Students Taking the Verbal SAT)


Dependent Variable:average verbal SAT score 2005-06

region MeanStd.

Deviation N

West 528.69 24.87 13

Midwest 576.50 31.05 12

South 526.76 36.69 17

Northeast 504.00 10.48 9

Total 534.94 37.80 51

(Table 10: Tests of Between Subject Effects)


Dependent Variable:average verbal SAT score 2005-06

SourceType III Sum of Squares df

Mean Square F P

Partial Eta Squared

region 30986.00 3 10328.67 12.00 .00 .43

Error 40450.83 47 860.66

Total 14665702.00 51

Corrected Total 71436.82 50

a. R Squared = .434 (Adjusted R Squared = .398)

(Table 11: Multiple Comparisons of Average Verbal SAT Score)



average verbal SAT score 2005-06Dunnett C


Mean Difference

(I-J)Std. Error


Lower Bound

Upper Bound

West Midwest -47.81* 11.31 -81.68 -13.94

South 1.93 11.26 -30.74 34.60

Northeast 24.69* 7.73 1.37 48.02

Midwest West 47.81* 11.31 13.94 81.68

South 49.74* 12.63 12.65 86.82

Northeast 72.50* 9.62 43.31 101.69

South West -1.93 11.26 -34.60 30.74

Midwest -49.74* 12.63 -86.82 -12.65

Northeast 22.77 9.56 -5.02 50.55

Northeast West -24.69* 7.73 -48.02 -1.37

Midwest -72.50* 9.62 -101.69 -43.31

South -22.77 9.56 -50.55 5.02

*. The mean difference is significant at the .05 level.

Part 3

In the third part of this project we looked at scatter plots and regression equations for the

following pairs of variables: expenditure per student and verbal SAT scores, salary and verbal SAT scores,

and socioeconomic status and verbal SAT scores. Looking at these scatter plots provides us insight into

whether or not a relationship exists between variables, as well as the strength and direction of the

relationship.


Examining Table 13, the data suggests that a moderate negative relationship exists between

expenditure per pupil and verbal SAT score, with the Pearson r value being -0.42. Furthermore, the p-

value is approaching zero, which suggests it is a significant relationship. The scatter plot for expenditure

per pupil and verbal SAT score (Figure 7) also suggests that a negative relationship exists because as the

expenditure per pupil increases, the average verbal SAT score goes down. Table 3 shows how practically

significant the relationship is between expenditure per pupil and verbal SAT score. The R Square for

expenditure per pupil and verbal SAT score is .17, meaning that 17 percent of the variance in verbal SAT

score among the regions can be attributed to expenditure per pupil. One may conclude that this R

Square is not very practically significant, since it only accounts for 17 percent of the variance; 83 percent

of the variance in verbal SAT score can be attributed other variables.

Analysis of the data found in Table 15, points to the fact that the slope in the scatter plot for

expenditure per pupil and verbal SAT score is -.01; this suggests that as the expenditure per pupil

increases by one unit, the verbal SAT score decreases by .01.

(Table 12: Descriptive Statistics for Expenditure Per Pupil)


Mean Std. Deviation

Expenditure/ pupil (1000$) 10,327.78 2,502.20

SAT score 2005-06 (verbal) 534.94 37.80

SAT score 2005-06 (math) 540.59 37.46

SAT score 2005-06 (writing) 525.37 37.63


(Table 13: Correlation Matrix for Expenditure Per Pupil and Verbal SAT Score)

verbal SAT score

2005-06

math SAT score

2005-06

writing SAT score

2005-06

Expenditure/ pupil 2005-06 Pearson r -0.42** -0.39** -0.40**

p 0.00 0.00 0.00

n 51 51 51

(Figure 7: Current Expenditure Per Pupil Histogram)

(Table 14: Model Summary Table for Expenditure Per Pupil and Verbal SAT Score)

Model R R Square

Std. Error of the

Estimate

1 .39a .16 34.79


(Table 15: Coefficients Table for Expenditure Per Pupil and Average Verbal SAT Score)

Coefficientsa

Model

Unstandardized

Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant) 601.42 20.88 28.80 .00

Expenditure/ pupil 2005-06 -.01 .00 -.39 -3.00 .00

a. Dependent Variable: average math SAT score 2005-06

Salary and SAT Scores: Examining Table 16, the data suggests that a small negative relationship

exists between estimated average salary and verbal SAT score, with the Pearson r value being -0.48.

Furthermore, the p-value is approaching zero, which suggests it is a significant relationship. However it

should be noted that the scatter plot for estimated average salary and verbal SAT score (Figure 8) does

not show the pattern that the Pearson r value suggests. Table 18 provides insight into how practically

significant the relationship is between salary and verbal SAT score. The R Square for estimated average

salary and verbal SAT score is approaching zero, meaning that a very little percentage of the variance in

verbal SAT score among the regions can be attributed to salary. One may conclude that this R Square is

not very practically significant, since it does not account for any percentage of the variance; close to 100

percent of the variance in verbal SAT score can be attributed to other variables.

Analysis of the data found in Table 19 indicates that the slope in the scatter plot for estimated

average salary and verbal SAT score is negative and approaching zero; this suggests that as the

estimated average salary increases by one unit, the verbal SAT score decreases by very little. This slope

that is negative and approaching zero seems to contradict the Pearson r value of -0.48, which can be

described as a negative moderate relationship.

(Table 16: Descriptive Statistics for Average Salary and Verbal SAT Scores)



Mean Std. Deviation

ave salary 2005-2006 47,679.08 6,942.01

writing SAT score 2005-06 525.37 37.63

verbal SAT score 2005-06 534.94 37.80

math SAT score 2005-06 540.59 37.46

(Table 17: Correlation Matrix for Average Salary and Verbal SAT Score)

average writing SAT

score 2005-06

average verbal SAT

score 2005-06

average math SAT

score 2005-06

estimated ave salary

2005-2006

Pearson r -.45** -.48** -.41**

p .00 .00 .00

n 51 51 51

(Figure 8: Scatterplot for Average salary and Average Verbal SAT Score)


(Table 18: Model Summary Table for Average Salary & Verbal SAT Score)

Model R R Square

Std. Error of the

Estimate

1 .48a .23 33.61

(Table 19: Coefficient Table for Average Salary & Average Verbal SAT Score)

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients


1 (Constant) 658.20 32.98 19.96 .00

ave salary 2005-2006 -.00 .00 -.48 -3.78 .00

a. Dependent Variable: average verbal SAT score 2005-06

SES and SAT Scores: Examining Table 21, the data suggests that a small positive relationship

exists between percentage of students eligible for free/reduced priced lunch and verbal SAT score, with

the Pearson r value being .02. Furthermore, the p-value is 0.87, which suggests it is not significant. The

scatter plot for percentage of students eligible for free/reduced lunch and verbal SAT score (Figure 9)

also suggests a very weak relationship or pattern between percentage of students eligible for

free/reduced price lunch and verbal SAT score. Table 22 provides insight into how practically significant

the relationship is between SES and verbal SAT score. The R Square for percentage of students eligible

for free or reduced-priced lunch and verbal SAT score is .0o, meaning that no percentage of the variance

in verbal SAT score among the regions can be explained by the percentage of students eligible for

free/reduced-price lunch. One may conclude that this R Square aligns nicely with the very slight positive

Pearson R value and is not practically significant, since it does not account for any of the variance. This

means that 100 percent of the variance in verbal SAT score may be explained by other variables.


Analysis of the data found in Table 23 indicates that the slope in the scatter plot for percent of

students eligible for free/reduced lunch and verbal SAT Score is .09; this suggests that there is no

relationship between free and reduced lunch and verbal SAT score.

As mentioned earlier, selection bias may have occurred because fewer students in the Midwest

participate in the SAT and their scores may indicate that of the highest performing populations.

Furthermore, the Midwest’s scores are then compared to other regions like the Northeast, which has a

lower average SAT score, but a higher percentage of students participating in the SAT. The validity

behind the correlation and regression data may be slightly inaccurate. This may be due to an error or a

bias in the study.

(Table 20: Descriptive Statistics of % Eligible for Free/Reduced Lunch & Verbal SAT Score)


Mean Std. Deviation

% of students eligible for free/reduced

lunch 2006-07

39.82 10.64

writing SAT score 2005-06 525.37 37.63

verbal SAT score 2005-06 534.94 37.80

math SAT score 2005-06 540.59 37.46

(Table 21: Correlation Matrix for % of Students Eligible for Free/Reduced Lunch & Verbal SAT Score)

writing SAT score

2005-06

verbal SAT score

2005-06

math SAT score

2005-06

% of students eligible

for free/reduced lunch

2006-07

Pearson r .08 .02 -.08

p .58 .87 .57

n 50 50 50


(Figure 9: Scatterplot for % of Students Eligible for Free/Reduced Lunch & Average Verbal SAT Score)

(Table 22: Model Summary Table for % of Students Eligible for Free/Reduced Lunch & Average Verbal SAT Score)

Model R R Square

Std. Error of the

Estimate

1 .02a .00 38.19

(Table 23: Coefficient Table for % of Students Eligible for Free/Reduced Lunch & Average Verbal SAT Score)


Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients


1 (Constant) 532.29 21.12 25.21 .00

% of students eligible

for free/reduced lunch

2006-07

.09 .51 .02 .17 .87

a. Dependent Variable: average verbal SAT score 2005-06

Download - elleparsons.files.wordpress.com€¦ · Web viewThis dataset was compiled in response to recent controversy over equity in public school expenditures. While some argue that the

Top Related