analyzing data using spss. testing for difference

Analyzing Data using SPSS

Testing for difference

Parametric Test

t-test

-Is used in a variety of situations involving interval and ratio variables.

-Independent – Samples

-Dependent - Samples

Independent-Samples T-Test

• What it does: The Independent Samples T Test compares the mean scores of two groups on a given variable.

• Where to find it: Under the Analyze menu, choose Compare Means, the Independent Samples T Test. Move your dependent variable into the box marked "Test Variable." Move your independent variable into the box marked "Grouping Variable." Click on the box marked "Define Groups" and specify the value labels of the two groups you wish to compare.

• Assumptions:-The dependent variable is normally distributed. You can check for normal distribution with a Q-Q plot.-The two groups have approximately equal variance on the dependent variable. You can check this by looking at the Levene's Test. See below.-The two groups are independent of one another

• Hypotheses:Null: The means of the two groups are not significantly different.Alternate: The means of the two groups are significantly different.

SPSS Output

• Following is a sample output of an independent samples T test. We compared the mean blood pressure of patients who received a new drug treatment vs. those who received a placebo (a sugar pill).

• First, we see the descriptive statistics for the two groups. We see that the mean for the "New Drug" group is higher than that of the "Placebo" group. That is, people who received the new drug have, on average, higher blood pressure than those who took the placebo.

• Finally, we see the results of the Independent Samples T Test. Read the TOP line if the variances are approximately equal. Read the BOTTOM line if the variances are not equal. Based on the results of our Levene's test, we know that we have approximately equal variance, so we will read the top line

Our

• Our T value is 3.796.

• We have 10 degrees of freedom.

• There is a significant difference between the two groups (the significance is less than .05).

• Therefore, we can say that there is a significant difference between the New Drug and Placebo groups. People who took the new drug had significantly higher blood pressure than those who took the placebo.

•

Example Independent – samples t – test

• A study to determine the effectiveness of an integrated statistics/experimental methods course as opposed to the traditional method of taking the two courses separately was conducted.

• It was hypothesized that the students taking the integrated course would conduct better quality research projects than students in the traditional courses as a result of their integrated training.

• Ho : there is no difference in students performance as a result of the integrated versus traditional courses.

• H1 : students taking the integrated course would conduct better quality research projects than students in the traditional courses

Output SPSSGroup Statistics

20 85.65 8.242 1.843

20 79.45 10.782 2.411

Conditionintegrated method

traditional method

ScoreN Mean Std. Deviation

Std. ErrorMean

Independent Samples Test

3.880 .056 2.043 38 .048 6.200 3.035 .057 12.343

2.043 35.551 .049 6.200 3.035 .043 12.357

Equal variancesassumed

Equal variancesnot assumed

ScoreF Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

Students taking the integrated course would conduct better quality research projects than students in the traditional courses

Exercise1 • The following data were obtained in an experiment designed to

check whether there is a systematic difference in the weights (in grams) obtained with two different scales.

Rock specimen Scale I Scale II

1

2

3

4

5

6

7

8

9

10

12.13

17.56

9.33

11.40

28.62

10.25

23.37

16.27

12.40

24.78

12.17

17.61

9.35

11.42

28.61

10.27

23.42

13.26

12.45

24.75

• Use the 0.01 level of significance to test whether the difference between the means of the weights obtained with the two scales is significant

• Ho : there is no significant difference between the means of the weight obtained with the two scales.

• H1 : there is significant difference between the means of the weight obtained with the two scales.

Exercise 2

• The following are the scores for random samples of size ten which are taken from large group of trainees instructed by the two methods.

• Method 1 : teaching machine as well as some personal attention by an instructor• Method 2 : straight teaching-machine instruction

Method 1 81 71 79 83 76 75 84 90 83 78

Method 2 69 75 72 69 67 74 70 66 76 72

What we can conclude about the claim that the average amount by which the personal attention of an instructor will improve trainee’s score. Use =5%.

Paired samples t-test

Paired Samples T Test

• What it does: The Paired Samples T Test compares the means of two variables. It computes the difference between the two variables for each case, and tests to see if the average difference is significantly different from zero.


• Where to find it: Under the Analyze menu, choose Compare Means, then choose Paired Samples T Test. Click on both variables you wish to compare, then move the pair of selected variables into the Paired Variables box.


• Assumption:-Both variables should be normally distributed. You can check for normal distribution with a Q-Q plot.


• Hypothesis:Null: There is no significant difference between the means of the two variables.Alternate: There is a significant difference between the means of the two variables

SPSS Output

• Following is sample output of a paired samples T test. We compared the mean test scores before (pre-test) and after (post-test) the subjects completed a test preparation course. We want to see if our test preparation course improved people's score on the test.

First, we see the descriptive statistics for both variables.

• The post-test mean scores are higher than pre-test scores

Next, we see the correlation between the two variables

• There is a strong positive correlation. People who did well on the pre-test also did well on the post-test.

• Finally, we see the results of the Paired Samples T Test. Remember, this test is based on the difference between the two variables. Under "Paired Differences" we see the descriptive statistics for the difference between the two variables

To the right of the Paired Differences, we see the t, degrees of freedom, and significance.

The t value = -2.171We have 11 degrees of freedomOur significance is .053If the significance value is less than .05, there is a significant difference.If the significance value is greater than. 05, there is no significant difference.Here, we see that the significance value is approaching significance, but it is not a significant difference. There is no difference between pre- and post-test scores. Our test preparation course did not help!

Example

• Twenty first-grade children and their parents were selected for a study to determine whether a seminar instructing on inductive parenting techniques improve social competency in children. The parents attended the seminar for one month. The children were tested for social competency before the course began and were retested six months after the completion of the course.

Hypothesis

• Ho : there is no significant difference between the means of pre and post seminar social competency scores

• In other words, the parenting seminar has no effect on child social competency scores

Paired Samples Statistics

34.20 20 6.066 1.356

30.45 20 4.019 .899

Post-Score

Pre-Score

Pair1

Mean N Std. DeviationStd. Error

Mean

Paired Samples Correlations

20 .771 .000Post-Score & Pre-ScorePair 1N Correlation Sig.

Paired Samples Test

3.750 3.919 .876 1.916 5.584 4.280 19 .000Post-Score - Pre-ScorePair 1Mean Std. Deviation

Std. ErrorMean Lower Upper

95% ConfidenceInterval of the

Difference

Paired Differences

t df Sig. (2-tailed)

•There is a strong positive correlation. children who did well on the pre-test also did well on the post-test.

There is significant difference between pre- and post-test scores. the parenting seminar has effect on child social competency scores!

Exercise 3• The table below shows the number of words per minute

readings of 20 student before and after following a particular method that can improve reading.

Student Pre Post

1 48 57

2 89 102

3 78 81

4 50 61

5 70 74

6 98 100

7 78 83

8 98 86

9 58 67

10 61 71

Student Pre Post

11 50 64

12 56 62

13 75 87

14 49 62

15 66 62

16 86 90

17 90 84

18 58 62

19 41 40

20 82 77

• Using a 0.05 level of significance, test the claim that the method is effective in improve reading.

Exercise 4

• The table below shows the weight of seven subjects before and after following a particular diet for two months

• Subject A B C D E F G

• After 156 165 196 198 167 199 164

• Before 149 156 194 203 153 201 152

• Using a 0.01 level of significance, test the claim that the diet is effective in reducing weight.

One-WayANOVA • Similar to a t-test, in that it is concerned with

differences in means, but the test can be applied on two or more means.

• The test is usually applied to interval and ratio data types. For example differences between two factors (1 and 2).

• The test can be undertaken using the Analyze - Compare Means - One-Way ANOVA menu items, then select for appropriate variables.

• You will observe the One-Way ANOVA for factor 1 and factor 2

Procedure

• 1. You will need one column of group codes labelling which group your data belongs to. The codes need to be numerical, but can be labelled with text.

• 2. You will also need a column containing the data points or scores you wish to analyze.

• 3. Select One-way ANOVA from the Analyze and Compare Means menus.

• 4. Click on your dependent variables (data column) and click on the top arrow so that the selected column appears in the dependent list box.

• 5. Click on your code column (your condition labels) and click on the bottom arrow so that the selected column appears in the factor box.

• 6. Click on Post Hoc if you wish to perform post-hoc tests.(optional).

• 7. Choose the type of post-hoc test(s) you wish to perform by clicking in the small box next to your choice until a tick appears. Tukey's and Scheffe's tests are commonly used.

• 8. Click on Dunnett to perform a Dunnett's test which allows you to compare experimental groups with a control group.Choose whether your control category is the first or last code entered in your code column.

• The main output table is labelled ANOVA. The F-ratio of the ANOVA, the degrees of freedom and the significance are all displayed. The top value of the df column is the df of the factor, the bottom value is the df of the error term.

• Tukey's test will also try to find combinations of similar groups or conditions.

• In the Score table there will be one column for each pair of conditions that are shown to be 'similar'. The mean of each condition within the pair are given in the appropriate column. The p-value for the difference between the means of each pair of groups is given at the bottom of the appropriate column.

Example – one-way ANOVA

• We would like to determine whether the scores on a test of aggression are different across 4 groups of children (each with 5 subjects)

• Each child group has been exposes to differing amounts of time watching cartoons depicting ‘toon violence’

At the 0.05 significance level, test the claim that the four groups have the same mean if the following sample resultshave been obtained.

ANOVA

score

28.950 3 9.650 4.825 .014

32.000 16 2.000

60.950 19

Between Groups

Within Groups

Total

Sum ofSquares df Mean Square F Sig.

Output SPSS

Exercise 5

• At the same time each day, a researcher records the temperature in each of three greenhouses. The table shows the temperatures in degree Fahrenheit recorded for one week.

• Greenhouse #1 greenhouse #2 greenhouse #373 71 6172 69 6373 72 6266 72 6168 65 6071 73 6272 71 59

Use a 0.05 significance level to test the claim that the average temperature is the same in each greenhouse.

Nonparametric Test

Sign Test

• A sign test compares the number of positive and negative differences between related conditions

Procedure• 1. You should have data in two or more columns - one for each

condition tested. • 2. Select 2 Related Samples from the Analyze - Nonparametric Tests

menu. • 3. Click on the first variable in the pair and the second variable in the

pair. • The names of the variables appear in the current selections section of

the dialogue box. • 5. Click on the central selection arrow when you are happy with the

variable pair selection. • The chosen pair appairs in the Test Pair(s) List. • Make sure the Sign box is ticked and remove the tick from the

Wilcoxon box

Example

• The data in table on the next slide are matched pairs of heights obtained from a random sample of 12 male statistics students. Each student reported his height, then his weight was measured. Use a 0.05 significance level to test the claim that there is no difference between reported height and measured height.

Reported and measured height of male statistics student

Reported height

68 74 82.25

66.5

69 68 71 70 70 67 68 70

Measured height

66.8

73.9

74.3 66.1

67.2 67.9

69.4

69.9

68.6

67.9

67.6

68.8

Ho: there is no significant difference between reported heights and measured heightsH1 : there is a difference

Output Test Statisticsb

.006aExact Sig. (2-tailed)

measuredheight -reportedheight

Binomial distribution used.a.

Sign Testb.

Reject Ho. There is sufficient evidence to reject the claim that no significant difference between the reported and measured heights.

Exercise 6

• Listed here are the right- and left-hand reaction times collected from 14 subject with right handed. Use 0.05 significance level to test the claim of no difference between the right hand- and left-hand reaction times.

Right/left reaction times

Right 191 97 116 165 116 129 171 155 112 102 188 158 121 133

Left 224 171 191 207 196 165 171 165 140 188 155 219 177 174

Wilcoxon

• The Wilcoxon test is used with two columns of non-parametric related (linked) data.

• Either one person has taken part in two conditions or paired participants (e.g. brother and sister) have taken part in the same condition.

• This is the non-parametric equivelant of the paired sample t-test

Procedure • 1. Put your data in two or more columns, one for each condition

tested. • 2. Select 2 Related Samples from Analyze - Nonparametric Tests

menu. • 3. Click on the first variable in the pair. • 4. Click on the second variable in the pair.• 5. Make sure the Wilcoxon box is ticked• The Ranks table produced in the output window summarises the

ranking process. • In the Test Statistics table the Z statistic is the result of the

Wilcoxon test. • The p-value for this statistic is shown below it. This is the two tailed

significance.

Example

• Use the previous data to test the claim that there is no difference between reported heights and measured heights using Wilcoxon test at 0.05 significance level.

Output

Test Statisticsb

-2.595a

.009

Z

Asymp. Sig. (2-tailed)

reportedheight -

measuredheight

Based on negative ranks.a.

Wilcoxon Signed Ranks Testb.

Reject Ho. There is sufficient evidence to reject the claim that no difference between reported and measured heights.

Mann-Whitney

• The Mann-Whitney test is used with two columns of independent (unrelated) non-paramteric data.This is the non-parametric equivalent of the independent samples t-test.

Procedure

• Put all of your measured data into one column.

• 2. Make a second column that contains codes to indicate the group from which each value was obtained.

• 3. Select 2 Independent Samples from the Analyze - Nonparametric Tests menu.

• 4. Select the column containing the data you want to analyse and click the top arrow.

• 5. Select the Grouping Variable - the column which contains your group codes - and click the bottom arrow.

• 6. Make sure the Mann-Whitney U option is selected.

• The output is produced in the output window.

• The top table summarises the ranking process.

• The result of the Mann-Whitney test is given at the top of the Test Statistics table.

• The two-tailed significance of the result is given in the same table.

Example

• One study used x-ray computed tomography (CT) to collect data on brain volumes for a group of patients with obsessive-compulsive disorders and a control group of healthy persons. The following data shows sample results (in mm) for volumes of the right cordate.

Volumes of the right cordate

Obsessive-compulsive patients

0.308 0.210 0.304 0.344 0.407 0.455 0.287 0.288 0.463 0.334 0.340 0.305

Control group 0.519 0.476 0.413 0.429 0.501 0.402 0.349 0.594 0.334 0.483 0.460 0.445

Output Ranks

10 7.35 73.50

12 14.96 179.50

22

groupexperimental group

control group

Total

volumes of theright cordate

N Mean Rank Sum of Ranks

Test Statisticsb

18.500

73.500

-2.737

.006

.004a

Mann-Whitney U

Wilcoxon W

Z

Asymp. Sig. (2-tailed)

Exact Sig. [2*(1-tailedSig.)]

volumesof the right

cordate

Not corrected for ties.a.

Grouping Variable: groupb.

Kruskal-Wallis

• examines differences between 3 or more independent groups or conditions.

Procedure

• 1 Put all your measured data into one column.

• 2. Make a second column that contains codes to indicate the group from which each value was obtained.

• 3. Select K Independent Samples from the Analyze - Non-parametric Tests menu.

• 4. Select the grouping variable, the column that contains your group codes, then click on the bottom arrow.

• Make sure the Kruskal-Wallis box is checked

• In the output window the chi-square statistic is shown in the test statistic section, as is the P-value.

Example

• We would like to determine whether the scores on a test of Spanish are different across three different methods of learning

• Method 1 : classroom instruction and language laboratory

• Method 2: only classroom instruction• Method3: only self-study in language

laboratory.

The following are the final examination scores of samples of students from the three group

Method 1 : 94 88 91 74 86 97Method 2 : 85 82 79 84 61 72 80Method 3 : 89 67 72 76 69

At the 0.05 level of significance, test the null hypothesis that thepopulation sampled are identical .

Output SPSS

Test Statisticsa,b

6.673

2

.036

Chi-Square

df

Asymp. Sig.

SCORE

Kruskal Wallis Testa.

Grouping Variable: METHODb.

Exercise 7

• The following are the miles per gallon which a test driver got in random samples of six tankfuls of each of three kinds of gasoline:

• Gasoline 1 : 30 15 32 27 24 29• Gasoline 2 : 17 28 20 33 32 22• Gasoline 3 : 19 23 32 22 18 25• Test the claim that there is no difference in the

true average mileage yield of the three kinds of gasoline. (use 0.05 level of significance)

Testing for Relationships

Pearson's Correlation

Pearson's correlation is a parametric test for the strength of the

relationship between pairs of variables.

• What it does: The Pearson R correlation tells you the magnitude and direction of the association between two variables that are on an interval or ratio scale.

• Where to find it: Under the Analyze menu, choose Correlations. Move the variables you wish to correlate into the "Variables" box. Under the "Correlation Coefficients," be sure that the "Pearson" box is checked off.

• Assumption: -Both variables are normally distributed. You can check for normal distribution with a Q-Q plot.

• Hypotheses:Null: There is no association between the two variables.Alternate: There is an association between the two variables.

• SPSS Output

• Following is a sample output of a Pearson R correlation between the Rosenberg Self-Esteem Scale and the Assessing Anxiety Scale.

SPSS creates a correlation matrix of the two variables. All the information we need is in the cell that represents the intersection of the two variables

SPSS gives us three pieces of information: -the correlation coefficient-the significance-the number of cases (N

• The correlation coefficient is a number between +1 and -1. This number tells us about the magnitude and direction of the association between two variables.

• The MAGNITUDE is the strength of the correlation. The closer the correlation is to either +1 or -1, the stronger the correlation. If the correlation is 0 or very close to zero, there is no association between the two variables. Here, we have a moderate correlation (r = -.378).

• The DIRECTION of the correlation tells us how the two variables are related. If the correlation is positive, the two variables have a positive relationship (as one increases, the other also increases). If the correlation is negative, the two variables have a negative relationship (as one increases, the other decreases). Here, we have a negative correlation (r = -.378). As self-esteem increases, anxiety decreases

Example

• The following data were obtained in a study of the relationship between the resistance (ohms) and the failure time (minutes) of certain overloaded resistors.

• Resistance 48 28 33 40 36 39 46 40 30 42 44 48 39 34 47

• Failure time 45 25 39 45 36 35 36 45 34 39 51 41 38 32 45

• Test the null hypothesis that there is a significant correlation between resistance and failure time.

Output SPSS

Correlations

1.000 .704**

. .003

15 15

.704** 1.000

.003 .

15 15

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

RESIST

FAILTIME

RESIST FAILTIME

Correlation is significant at the 0.01 level (2-tailed).**.

There is significant positive correlation between resistance and failure time, indicating that failure time increases as resistance increases.

Exercise 8

• An aerobics instructor believes that regular aerobic exercise is related to greater mental acuity, stress reduction, high self-esteem, and greater overall life satisfaction.

• She asked a random sample of 30 adult to fill out a series of questionnaire.

• The result are as follows:test whether there is significant correlation between aerobic exercise and high self-esteem

Subject

Exercise

Self-esteem

Satisfaction

stress

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

10

33

9

14

3

12

7

15

3

21

2

20

4

8

0

25

37

12

32

22

31

30

30

15

34

18

37

19

33

10

45

40

30

39

27

44

39

40

46

50

29

47

31

38

25

20

10

13

15

29

22

13

20

25

10

33

5

23

21

30

Subject

Exercise

Self-esteem

Satisfaction

stress

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

17

25

2

18

3

27

4

8

10

0

12

5

7

30

14

35

39

13

35

15

35

17

20

22

14

35

20

29

40

30

42

40

30

47

28

39

32

34

41

27

35

30

30

48

45

13

10

27

9

25

7

34

20

15

35

20

23

12

14

15

The Spearman Rho correlation


• What it does: The Spearman Rho correlation tells you the magnitude and direction of the association between two variables that are on an interval or ratio scale.


• Where to find it: Under the Analyze menu, choose Correlations. Move the variables you wish to correlate into the "Variables" box. Under the "Correlation Coefficients," be sure that the "Spearman" box is checked off.


• Assumption: -Both variables are NOT normally distributed. You can check for normal distribution with a Q-Q plot. If the variables are normally distributed, use a Pearson R correlation.


• Hypotheses:Null: There is no association between the two variables.Alternate: There is an association between the two variables.

SPSS Output

• Following is a sample output of a Spearman Rho correlation between the Rosenberg Self-Esteem Scale and the Assessing Anxiety Scale.

• SPSS creates a correlation matrix of the two variables. All the information we need is in the cell that represents the intersection of the two variables.

• SPSS gives us three pieces of information: -the correlation coefficient-the significance-the number of cases (N)

• The correlation coefficient is a number between +1 and -1. This number tells us about the magnitude and direction of the association between two variables.

• The MAGNITUDE is the strength of the correlation. The closer the correlation is to either +1 or -1, the stronger the correlation. If the correlation is 0 or very close to 0, there is no association between the two variables. Here, we have a moderate correlation (r = -.392).

• The DIRECTION of the correlation tells us how the two variables are related. If the correlation is positive, the two variables have a positive relationship (as one increases, the other also increases). If the correlation is negative, the two variables have a negative relationship (as one increases, the other decreases). Here, we have a negative correlation (r = -.392). As self-esteem increases, anxiety decreases.

Example

• The following are the numbers of hours which ten students studied for an examination and the grades which they received:

•Number of hour studied grade in examination

9

5

11

13

10

5

18

15

2

8

56

44

79

72

70

54

94

85

33

65

Is there any relationship between number of our studied and grade in examination

Output SPSS

Correlations

1.000 .973**

. .000

10 10

.973** 1.000

.000 .

10 10

Correlation Coefficient

Sig. (2-tailed)

N

Correlation Coefficient

Sig. (2-tailed)

N

GRADE

HOUR

Spearman's rhoGRADE HOUR

Correlation is significant at the .01 level (2-tailed).**.

Exercise 9

• The following table shows the twelve weeks’ sales of a downtown department store, x, and its suburban branch, y

• X 71 64 67 58 80 63 69 59 76 60 66 55• Y 49 31 45 24 68 30 40 37 62 22 35 19

• Is there any significant relationship between x and y?

Two way chi-square from frequencies

• A chi-square test is a non-parametric test for nominal (frequency data).

• The test will calculate expected values for each combination of category codes based on the null hypothesis that there is no association between the two variables.

Procedure

• 1. You will need two columns of codes. Each value in each column provides a code to a group or criteria category within the appropriate variable. You should have one row for each combination of category code.

• 2. You will also need a column giving the frequency that each combination of codes is observed.

• Before carrying out your chi-square test you first need to tell SPSS that the numbers in your frequency column are indeed frequencies. You do this using weight cases...

• 3. Select Weight Cases from the Data menu. • 4. Click the Weight cases by button. • 5. Select the column containing your frequencies and click on the

across arrow.

• Click Crosstabs from the Analyze - Descriptive Statistics menu. • 8. Select the first variable and click on the top arrow to move it into

the Rows box. • 9. Select the second variable and click on the middle arrow to move

it into the Columns box.• Click on Statistics to choose to perform a chi-square test on your

data. • 11. Select the chi-square option from the Crosstabs: Statistics

dialogue box. • 12. Click on Continue when ready. • 13. Click on Cells to choose to output the chi-square expected values. • 14. Select the top left boxes to display both the Observed and the

Expected values

Two way chi-square from raw data

• 1. You will need two columns of codes. Each value in each column provides a code to a group or criteria category within the appropriate variable.

• 2. Click Crosstabs from the Analyze - Descriptive Statistics menu. • 3. Select the first variable and click on the top arrow to move it

into the Rows box. • 4. Select the second variable and click on the middle arrow to

move it into the Columns box.• Click on Statistics to choose to perform a chi-square test on your

data. • 6. Select the chi-square option from the Crosstabs: Statistics

dialogue box

Example

• Suppose we want to investigate whether there is a relationship between the intelligence of employees who have through a certain job training program and their subsequent performance on the job.

• A random sample of 50 cases from files yielded the following results:

8 8 3

5 10 7

1 3 5

Poor fair good

Performance

Below average

Average

Above average

IQ

Test at the 0.01 level of significance whether on the job performanceof persons who have gone through the training program is independentof their IQ

Exercise 10

• Suppose that a store carries two different brands, A and B, of a certain type of breakfast cereal. During a one-week, 44 packages were purchased and the results shows below

brand A brand BMen 9 6Women 13 16

Test the hypothesis that the brand purchased and the sex of the purchaser are independent.

analyzing data using spss. testing for difference

Documents

difference slide

spss slide

independentsamples ttest

independent samples

independent variable

parametric test slide

test variable

placebo groups