analyzing data using spss. testing for difference
Post on 21-Dec-2015
229 views
TRANSCRIPT
Analyzing Data using SPSS
Testing for difference
Parametric Test
t-test
-Is used in a variety of situations involving interval and ratio variables.
-Independent – Samples
-Dependent - Samples
Independent-Samples T-Test
• What it does: The Independent Samples T Test compares the mean scores of two groups on a given variable.
• Where to find it: Under the Analyze menu, choose Compare Means, the Independent Samples T Test. Move your dependent variable into the box marked "Test Variable." Move your independent variable into the box marked "Grouping Variable." Click on the box marked "Define Groups" and specify the value labels of the two groups you wish to compare.
• Assumptions:-The dependent variable is normally distributed. You can check for normal distribution with a Q-Q plot.-The two groups have approximately equal variance on the dependent variable. You can check this by looking at the Levene's Test. See below.-The two groups are independent of one another
• Hypotheses:Null: The means of the two groups are not significantly different.Alternate: The means of the two groups are significantly different.
SPSS Output
• Following is a sample output of an independent samples T test. We compared the mean blood pressure of patients who received a new drug treatment vs. those who received a placebo (a sugar pill).
• First, we see the descriptive statistics for the two groups. We see that the mean for the "New Drug" group is higher than that of the "Placebo" group. That is, people who received the new drug have, on average, higher blood pressure than those who took the placebo.
• Finally, we see the results of the Independent Samples T Test. Read the TOP line if the variances are approximately equal. Read the BOTTOM line if the variances are not equal. Based on the results of our Levene's test, we know that we have approximately equal variance, so we will read the top line
Our
• Our T value is 3.796.
• We have 10 degrees of freedom.
• There is a significant difference between the two groups (the significance is less than .05).
• Therefore, we can say that there is a significant difference between the New Drug and Placebo groups. People who took the new drug had significantly higher blood pressure than those who took the placebo.
•
Example Independent – samples t – test
• A study to determine the effectiveness of an integrated statistics/experimental methods course as opposed to the traditional method of taking the two courses separately was conducted.
• It was hypothesized that the students taking the integrated course would conduct better quality research projects than students in the traditional courses as a result of their integrated training.
• Ho : there is no difference in students performance as a result of the integrated versus traditional courses.
• H1 : students taking the integrated course would conduct better quality research projects than students in the traditional courses
Output SPSSGroup Statistics
20 85.65 8.242 1.843
20 79.45 10.782 2.411
Conditionintegrated method
traditional method
ScoreN Mean Std. Deviation
Std. ErrorMean
Independent Samples Test
3.880 .056 2.043 38 .048 6.200 3.035 .057 12.343
2.043 35.551 .049 6.200 3.035 .043 12.357
Equal variancesassumed
Equal variancesnot assumed
ScoreF Sig.
Levene's Test forEquality of Variances
t df Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
Students taking the integrated course would conduct better quality research projects than students in the traditional courses
Exercise1 • The following data were obtained in an experiment designed to
check whether there is a systematic difference in the weights (in grams) obtained with two different scales.
Rock specimen Scale I Scale II
1
2
3
4
5
6
7
8
9
10
12.13
17.56
9.33
11.40
28.62
10.25
23.37
16.27
12.40
24.78
12.17
17.61
9.35
11.42
28.61
10.27
23.42
13.26
12.45
24.75
• Use the 0.01 level of significance to test whether the difference between the means of the weights obtained with the two scales is significant
• Ho : there is no significant difference between the means of the weight obtained with the two scales.
• H1 : there is significant difference between the means of the weight obtained with the two scales.
Exercise 2
• The following are the scores for random samples of size ten which are taken from large group of trainees instructed by the two methods.
• Method 1 : teaching machine as well as some personal attention by an instructor• Method 2 : straight teaching-machine instruction
Method 1 81 71 79 83 76 75 84 90 83 78
Method 2 69 75 72 69 67 74 70 66 76 72
What we can conclude about the claim that the average amount by which the personal attention of an instructor will improve trainee’s score. Use =5%.
Paired samples t-test
Paired Samples T Test
• What it does: The Paired Samples T Test compares the means of two variables. It computes the difference between the two variables for each case, and tests to see if the average difference is significantly different from zero.
Paired Samples T Test
• Where to find it: Under the Analyze menu, choose Compare Means, then choose Paired Samples T Test. Click on both variables you wish to compare, then move the pair of selected variables into the Paired Variables box.
Paired Samples T Test
• Assumption:-Both variables should be normally distributed. You can check for normal distribution with a Q-Q plot.
Paired Samples T Test
• Hypothesis:Null: There is no significant difference between the means of the two variables.Alternate: There is a significant difference between the means of the two variables
SPSS Output
• Following is sample output of a paired samples T test. We compared the mean test scores before (pre-test) and after (post-test) the subjects completed a test preparation course. We want to see if our test preparation course improved people's score on the test.
First, we see the descriptive statistics for both variables.
• The post-test mean scores are higher than pre-test scores
Next, we see the correlation between the two variables
• There is a strong positive correlation. People who did well on the pre-test also did well on the post-test.
• Finally, we see the results of the Paired Samples T Test. Remember, this test is based on the difference between the two variables. Under "Paired Differences" we see the descriptive statistics for the difference between the two variables
To the right of the Paired Differences, we see the t, degrees of freedom, and significance.
The t value = -2.171We have 11 degrees of freedomOur significance is .053If the significance value is less than .05, there is a significant difference.If the significance value is greater than. 05, there is no significant difference.Here, we see that the significance value is approaching significance, but it is not a significant difference. There is no difference between pre- and post-test scores. Our test preparation course did not help!
Example
• Twenty first-grade children and their parents were selected for a study to determine whether a seminar instructing on inductive parenting techniques improve social competency in children. The parents attended the seminar for one month. The children were tested for social competency before the course began and were retested six months after the completion of the course.
Hypothesis
• Ho : there is no significant difference between the means of pre and post seminar social competency scores
• In other words, the parenting seminar has no effect on child social competency scores
Paired Samples Statistics
34.20 20 6.066 1.356
30.45 20 4.019 .899
Post-Score
Pre-Score
Pair1
Mean N Std. DeviationStd. Error
Mean
Paired Samples Correlations
20 .771 .000Post-Score & Pre-ScorePair 1N Correlation Sig.
Paired Samples Test
3.750 3.919 .876 1.916 5.584 4.280 19 .000Post-Score - Pre-ScorePair 1Mean Std. Deviation
Std. ErrorMean Lower Upper
95% ConfidenceInterval of the
Difference
Paired Differences
t df Sig. (2-tailed)
•There is a strong positive correlation. children who did well on the pre-test also did well on the post-test.
There is significant difference between pre- and post-test scores. the parenting seminar has effect on child social competency scores!
Exercise 3• The table below shows the number of words per minute
readings of 20 student before and after following a particular method that can improve reading.
Student Pre Post
1 48 57
2 89 102
3 78 81
4 50 61
5 70 74
6 98 100
7 78 83
8 98 86
9 58 67
10 61 71
Student Pre Post
11 50 64
12 56 62
13 75 87
14 49 62
15 66 62
16 86 90
17 90 84
18 58 62
19 41 40
20 82 77
• Using a 0.05 level of significance, test the claim that the method is effective in improve reading.
Exercise 4
• The table below shows the weight of seven subjects before and after following a particular diet for two months
• Subject A B C D E F G
• After 156 165 196 198 167 199 164
• Before 149 156 194 203 153 201 152
• Using a 0.01 level of significance, test the claim that the diet is effective in reducing weight.
One-WayANOVA • Similar to a t-test, in that it is concerned with
differences in means, but the test can be applied on two or more means.
• The test is usually applied to interval and ratio data types. For example differences between two factors (1 and 2).
• The test can be undertaken using the Analyze - Compare Means - One-Way ANOVA menu items, then select for appropriate variables.
• You will observe the One-Way ANOVA for factor 1 and factor 2
Procedure
• 1. You will need one column of group codes labelling which group your data belongs to. The codes need to be numerical, but can be labelled with text.
• 2. You will also need a column containing the data points or scores you wish to analyze.
• 3. Select One-way ANOVA from the Analyze and Compare Means menus.
• 4. Click on your dependent variables (data column) and click on the top arrow so that the selected column appears in the dependent list box.
• 5. Click on your code column (your condition labels) and click on the bottom arrow so that the selected column appears in the factor box.
• 6. Click on Post Hoc if you wish to perform post-hoc tests.(optional).
• 7. Choose the type of post-hoc test(s) you wish to perform by clicking in the small box next to your choice until a tick appears. Tukey's and Scheffe's tests are commonly used.
• 8. Click on Dunnett to perform a Dunnett's test which allows you to compare experimental groups with a control group.Choose whether your control category is the first or last code entered in your code column.
• The main output table is labelled ANOVA. The F-ratio of the ANOVA, the degrees of freedom and the significance are all displayed. The top value of the df column is the df of the factor, the bottom value is the df of the error term.
• Tukey's test will also try to find combinations of similar groups or conditions.
• In the Score table there will be one column for each pair of conditions that are shown to be 'similar'. The mean of each condition within the pair are given in the appropriate column. The p-value for the difference between the means of each pair of groups is given at the bottom of the appropriate column.
Example – one-way ANOVA
• We would like to determine whether the scores on a test of aggression are different across 4 groups of children (each with 5 subjects)
• Each child group has been exposes to differing amounts of time watching cartoons depicting ‘toon violence’
At the 0.05 significance level, test the claim that the four groups have the same mean if the following sample resultshave been obtained.
ANOVA
score
28.950 3 9.650 4.825 .014
32.000 16 2.000
60.950 19
Between Groups
Within Groups
Total
Sum ofSquares df Mean Square F Sig.
Output SPSS
Exercise 5
• At the same time each day, a researcher records the temperature in each of three greenhouses. The table shows the temperatures in degree Fahrenheit recorded for one week.
• Greenhouse #1 greenhouse #2 greenhouse #373 71 6172 69 6373 72 6266 72 6168 65 6071 73 6272 71 59
Use a 0.05 significance level to test the claim that the average temperature is the same in each greenhouse.
Nonparametric Test
Sign Test
• A sign test compares the number of positive and negative differences between related conditions
Procedure• 1. You should have data in two or more columns - one for each
condition tested. • 2. Select 2 Related Samples from the Analyze - Nonparametric Tests
menu. • 3. Click on the first variable in the pair and the second variable in the
pair. • The names of the variables appear in the current selections section of
the dialogue box. • 5. Click on the central selection arrow when you are happy with the
variable pair selection. • The chosen pair appairs in the Test Pair(s) List. • Make sure the Sign box is ticked and remove the tick from the
Wilcoxon box
Example
• The data in table on the next slide are matched pairs of heights obtained from a random sample of 12 male statistics students. Each student reported his height, then his weight was measured. Use a 0.05 significance level to test the claim that there is no difference between reported height and measured height.
Reported and measured height of male statistics student
Reported height
68 74 82.25
66.5
69 68 71 70 70 67 68 70
Measured height
66.8
73.9
74.3 66.1
67.2 67.9
69.4
69.9
68.6
67.9
67.6
68.8
Ho: there is no significant difference between reported heights and measured heightsH1 : there is a difference
Output Test Statisticsb
.006aExact Sig. (2-tailed)
measuredheight -reportedheight
Binomial distribution used.a.
Sign Testb.
Reject Ho. There is sufficient evidence to reject the claim that no significant difference between the reported and measured heights.
Exercise 6
• Listed here are the right- and left-hand reaction times collected from 14 subject with right handed. Use 0.05 significance level to test the claim of no difference between the right hand- and left-hand reaction times.
Right/left reaction times
Right 191 97 116 165 116 129 171 155 112 102 188 158 121 133
Left 224 171 191 207 196 165 171 165 140 188 155 219 177 174
Wilcoxon
• The Wilcoxon test is used with two columns of non-parametric related (linked) data.
• Either one person has taken part in two conditions or paired participants (e.g. brother and sister) have taken part in the same condition.
• This is the non-parametric equivelant of the paired sample t-test
Procedure • 1. Put your data in two or more columns, one for each condition
tested. • 2. Select 2 Related Samples from Analyze - Nonparametric Tests
menu. • 3. Click on the first variable in the pair. • 4. Click on the second variable in the pair.• 5. Make sure the Wilcoxon box is ticked• The Ranks table produced in the output window summarises the
ranking process. • In the Test Statistics table the Z statistic is the result of the
Wilcoxon test. • The p-value for this statistic is shown below it. This is the two tailed
significance.
Example
• Use the previous data to test the claim that there is no difference between reported heights and measured heights using Wilcoxon test at 0.05 significance level.
Output
Test Statisticsb
-2.595a
.009
Z
Asymp. Sig. (2-tailed)
reportedheight -
measuredheight
Based on negative ranks.a.
Wilcoxon Signed Ranks Testb.
Reject Ho. There is sufficient evidence to reject the claim that no difference between reported and measured heights.
Mann-Whitney
• The Mann-Whitney test is used with two columns of independent (unrelated) non-paramteric data.This is the non-parametric equivalent of the independent samples t-test.
Procedure
• Put all of your measured data into one column.
• 2. Make a second column that contains codes to indicate the group from which each value was obtained.
• 3. Select 2 Independent Samples from the Analyze - Nonparametric Tests menu.
• 4. Select the column containing the data you want to analyse and click the top arrow.
• 5. Select the Grouping Variable - the column which contains your group codes - and click the bottom arrow.
• 6. Make sure the Mann-Whitney U option is selected.
• The output is produced in the output window.
• The top table summarises the ranking process.
• The result of the Mann-Whitney test is given at the top of the Test Statistics table.
• The two-tailed significance of the result is given in the same table.
Example
• One study used x-ray computed tomography (CT) to collect data on brain volumes for a group of patients with obsessive-compulsive disorders and a control group of healthy persons. The following data shows sample results (in mm) for volumes of the right cordate.
Volumes of the right cordate
Obsessive-compulsive patients
0.308 0.210 0.304 0.344 0.407 0.455 0.287 0.288 0.463 0.334 0.340 0.305
Control group 0.519 0.476 0.413 0.429 0.501 0.402 0.349 0.594 0.334 0.483 0.460 0.445
Output Ranks
10 7.35 73.50
12 14.96 179.50
22
groupexperimental group
control group
Total
volumes of theright cordate
N Mean Rank Sum of Ranks
Test Statisticsb
18.500
73.500
-2.737
.006
.004a
Mann-Whitney U
Wilcoxon W
Z
Asymp. Sig. (2-tailed)
Exact Sig. [2*(1-tailedSig.)]
volumesof the right
cordate
Not corrected for ties.a.
Grouping Variable: groupb.
Kruskal-Wallis
• examines differences between 3 or more independent groups or conditions.
Procedure
• 1 Put all your measured data into one column.
• 2. Make a second column that contains codes to indicate the group from which each value was obtained.
• 3. Select K Independent Samples from the Analyze - Non-parametric Tests menu.
• 4. Select the grouping variable, the column that contains your group codes, then click on the bottom arrow.
• Make sure the Kruskal-Wallis box is checked
• In the output window the chi-square statistic is shown in the test statistic section, as is the P-value.
Example
• We would like to determine whether the scores on a test of Spanish are different across three different methods of learning
• Method 1 : classroom instruction and language laboratory
• Method 2: only classroom instruction• Method3: only self-study in language
laboratory.
The following are the final examination scores of samples of students from the three group
Method 1 : 94 88 91 74 86 97Method 2 : 85 82 79 84 61 72 80Method 3 : 89 67 72 76 69
At the 0.05 level of significance, test the null hypothesis that thepopulation sampled are identical .
Output SPSS
Test Statisticsa,b
6.673
2
.036
Chi-Square
df
Asymp. Sig.
SCORE
Kruskal Wallis Testa.
Grouping Variable: METHODb.
Exercise 7
• The following are the miles per gallon which a test driver got in random samples of six tankfuls of each of three kinds of gasoline:
• Gasoline 1 : 30 15 32 27 24 29• Gasoline 2 : 17 28 20 33 32 22• Gasoline 3 : 19 23 32 22 18 25• Test the claim that there is no difference in the
true average mileage yield of the three kinds of gasoline. (use 0.05 level of significance)
Testing for Relationships
Pearson's Correlation
Pearson's correlation is a parametric test for the strength of the
relationship between pairs of variables.
• What it does: The Pearson R correlation tells you the magnitude and direction of the association between two variables that are on an interval or ratio scale.
• Where to find it: Under the Analyze menu, choose Correlations. Move the variables you wish to correlate into the "Variables" box. Under the "Correlation Coefficients," be sure that the "Pearson" box is checked off.
• Assumption: -Both variables are normally distributed. You can check for normal distribution with a Q-Q plot.
• Hypotheses:Null: There is no association between the two variables.Alternate: There is an association between the two variables.
• SPSS Output
• Following is a sample output of a Pearson R correlation between the Rosenberg Self-Esteem Scale and the Assessing Anxiety Scale.
SPSS creates a correlation matrix of the two variables. All the information we need is in the cell that represents the intersection of the two variables
SPSS gives us three pieces of information: -the correlation coefficient-the significance-the number of cases (N
• The correlation coefficient is a number between +1 and -1. This number tells us about the magnitude and direction of the association between two variables.
• The MAGNITUDE is the strength of the correlation. The closer the correlation is to either +1 or -1, the stronger the correlation. If the correlation is 0 or very close to zero, there is no association between the two variables. Here, we have a moderate correlation (r = -.378).
• The DIRECTION of the correlation tells us how the two variables are related. If the correlation is positive, the two variables have a positive relationship (as one increases, the other also increases). If the correlation is negative, the two variables have a negative relationship (as one increases, the other decreases). Here, we have a negative correlation (r = -.378). As self-esteem increases, anxiety decreases
Example
• The following data were obtained in a study of the relationship between the resistance (ohms) and the failure time (minutes) of certain overloaded resistors.
• Resistance 48 28 33 40 36 39 46 40 30 42 44 48 39 34 47
• Failure time 45 25 39 45 36 35 36 45 34 39 51 41 38 32 45
• Test the null hypothesis that there is a significant correlation between resistance and failure time.
Output SPSS
Correlations
1.000 .704**
. .003
15 15
.704** 1.000
.003 .
15 15
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
RESIST
FAILTIME
RESIST FAILTIME
Correlation is significant at the 0.01 level (2-tailed).**.
There is significant positive correlation between resistance and failure time, indicating that failure time increases as resistance increases.
Exercise 8
• An aerobics instructor believes that regular aerobic exercise is related to greater mental acuity, stress reduction, high self-esteem, and greater overall life satisfaction.
• She asked a random sample of 30 adult to fill out a series of questionnaire.
• The result are as follows:test whether there is significant correlation between aerobic exercise and high self-esteem
Subject
Exercise
Self-esteem
Satisfaction
stress
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
10
33
9
14
3
12
7
15
3
21
2
20
4
8
0
25
37
12
32
22
31
30
30
15
34
18
37
19
33
10
45
40
30
39
27
44
39
40
46
50
29
47
31
38
25
20
10
13
15
29
22
13
20
25
10
33
5
23
21
30
Subject
Exercise
Self-esteem
Satisfaction
stress
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
17
25
2
18
3
27
4
8
10
0
12
5
7
30
14
35
39
13
35
15
35
17
20
22
14
35
20
29
40
30
42
40
30
47
28
39
32
34
41
27
35
30
30
48
45
13
10
27
9
25
7
34
20
15
35
20
23
12
14
15
The Spearman Rho correlation
The Spearman Rho correlation
• What it does: The Spearman Rho correlation tells you the magnitude and direction of the association between two variables that are on an interval or ratio scale.
The Spearman Rho correlation
• Where to find it: Under the Analyze menu, choose Correlations. Move the variables you wish to correlate into the "Variables" box. Under the "Correlation Coefficients," be sure that the "Spearman" box is checked off.
The Spearman Rho correlation
• Assumption: -Both variables are NOT normally distributed. You can check for normal distribution with a Q-Q plot. If the variables are normally distributed, use a Pearson R correlation.
The Spearman Rho correlation
• Hypotheses:Null: There is no association between the two variables.Alternate: There is an association between the two variables.
SPSS Output
• Following is a sample output of a Spearman Rho correlation between the Rosenberg Self-Esteem Scale and the Assessing Anxiety Scale.
• SPSS creates a correlation matrix of the two variables. All the information we need is in the cell that represents the intersection of the two variables.
• SPSS gives us three pieces of information: -the correlation coefficient-the significance-the number of cases (N)
• The correlation coefficient is a number between +1 and -1. This number tells us about the magnitude and direction of the association between two variables.
• The MAGNITUDE is the strength of the correlation. The closer the correlation is to either +1 or -1, the stronger the correlation. If the correlation is 0 or very close to 0, there is no association between the two variables. Here, we have a moderate correlation (r = -.392).
• The DIRECTION of the correlation tells us how the two variables are related. If the correlation is positive, the two variables have a positive relationship (as one increases, the other also increases). If the correlation is negative, the two variables have a negative relationship (as one increases, the other decreases). Here, we have a negative correlation (r = -.392). As self-esteem increases, anxiety decreases.
Example
• The following are the numbers of hours which ten students studied for an examination and the grades which they received:
•Number of hour studied grade in examination
9
5
11
13
10
5
18
15
2
8
56
44
79
72
70
54
94
85
33
65
Is there any relationship between number of our studied and grade in examination
Output SPSS
Correlations
1.000 .973**
. .000
10 10
.973** 1.000
.000 .
10 10
Correlation Coefficient
Sig. (2-tailed)
N
Correlation Coefficient
Sig. (2-tailed)
N
GRADE
HOUR
Spearman's rhoGRADE HOUR
Correlation is significant at the .01 level (2-tailed).**.
Exercise 9
• The following table shows the twelve weeks’ sales of a downtown department store, x, and its suburban branch, y
• X 71 64 67 58 80 63 69 59 76 60 66 55• Y 49 31 45 24 68 30 40 37 62 22 35 19
• Is there any significant relationship between x and y?
Two way chi-square from frequencies
• A chi-square test is a non-parametric test for nominal (frequency data).
• The test will calculate expected values for each combination of category codes based on the null hypothesis that there is no association between the two variables.
Procedure
• 1. You will need two columns of codes. Each value in each column provides a code to a group or criteria category within the appropriate variable. You should have one row for each combination of category code.
• 2. You will also need a column giving the frequency that each combination of codes is observed.
• Before carrying out your chi-square test you first need to tell SPSS that the numbers in your frequency column are indeed frequencies. You do this using weight cases...
• 3. Select Weight Cases from the Data menu. • 4. Click the Weight cases by button. • 5. Select the column containing your frequencies and click on the
across arrow.
• Click Crosstabs from the Analyze - Descriptive Statistics menu. • 8. Select the first variable and click on the top arrow to move it into
the Rows box. • 9. Select the second variable and click on the middle arrow to move
it into the Columns box.• Click on Statistics to choose to perform a chi-square test on your
data. • 11. Select the chi-square option from the Crosstabs: Statistics
dialogue box. • 12. Click on Continue when ready. • 13. Click on Cells to choose to output the chi-square expected values. • 14. Select the top left boxes to display both the Observed and the
Expected values
Two way chi-square from raw data
• 1. You will need two columns of codes. Each value in each column provides a code to a group or criteria category within the appropriate variable.
• 2. Click Crosstabs from the Analyze - Descriptive Statistics menu. • 3. Select the first variable and click on the top arrow to move it
into the Rows box. • 4. Select the second variable and click on the middle arrow to
move it into the Columns box.• Click on Statistics to choose to perform a chi-square test on your
data. • 6. Select the chi-square option from the Crosstabs: Statistics
dialogue box
Example
• Suppose we want to investigate whether there is a relationship between the intelligence of employees who have through a certain job training program and their subsequent performance on the job.
• A random sample of 50 cases from files yielded the following results:
8 8 3
5 10 7
1 3 5
Poor fair good
Performance
Below average
Average
Above average
IQ
Test at the 0.01 level of significance whether on the job performanceof persons who have gone through the training program is independentof their IQ
Exercise 10
• Suppose that a store carries two different brands, A and B, of a certain type of breakfast cereal. During a one-week, 44 packages were purchased and the results shows below
brand A brand BMen 9 6Women 13 16
Test the hypothesis that the brand purchased and the sex of the purchaser are independent.