inferentialstats_spss

8/9/2019 InferentialStats_SPSS

1/14

12/9/20

Inferential Statistics using

SPSS

Chafik Bouhaddioui

Department of Statistics

Outline

• Hypothesis testing using SPSS

• Analysis of Variance: One and two way

ANOVA using SPSS

• Regression analysis using SPSS

• Time Series using SPSS*.

One mean

• Is there an evidence to say that the mean

salary of employees is greater than$32,000.00?

• Hypotheses:

• If we define by the mean salary ofemployees, then:

H0 : = 32000

H1 : > 32000

This test is called one sample t-test.


2/14

12/9/20

One mean t-test: SPSS

P-value method

– The p - value provides information about the amountof statistical evidence that supports the alternativehypothesis.

5

– The p-value of a test is the probability of observing atest statistic at least as extreme as the one computed,given that the null hypothesis is true.

–

Let us demonstrate the concept on exampleSPSS or any Statistical Software will give you the p-value



3/14

12/9/20


• Interpret?

• Decision:

• Conclusion:

Two independent means

• Is there any evidence to conclude that the

company is discriminating between malesand females in salaries?

• If we define by:

– =

– =

Hypotheses:

H0 : = H1 : >

This test is called 2 samples t-test.

Two independent means: SPSS


4/14

12/9/20

Two independent means: SPSS

How to extract age from date?

Exercise

• Compare the mean age of male and

female employees?


5/14

12/9/20

Analysis of Variance:

• Example:• A pharmaceutical manufacturer would like to be able to

claim that its new headache relief medication is better than

those of rivals. Also, it has two methods for formulatingits product, and it would like to compare these as well.• File: Headache.sav• The data is the result of an experiment where in the

column drug (1 is active compound #1, 2 for activecompound 2, 3 for rival product and 4 for control group(aspirin). We measured a pain relief score with a rangefrom 0 (no relief) to 50 (complete relief). Study was carriedout “double-blind”

• From the small experiment, what claims can the marketersoffer?

H0: m1 = m2= m3 =m4

H1: At least one of the means

differs

To perform the analysis of variance we need to build an “F” statistic.

To more easily follow the process we use

the following notation:


6/14

12/9/20

Descriptives

PainRelief

10 13.370 5.9183 1.8715 9.136 17.604 1.3 22.3

11 22.255 6.2943 1.8978 18.026 26.483 10.6 31.929 11.462 7.6760 1.4254 8.542 14.382 .5 25.1

14 14.250 6.6110 1.7669 10.433 18.067 3.3 25.2

64 14.225 7.8349 .9794 12.268 16.182 .5 31.9

Activ e1

Activ e2

Control

Rival

Total

N Mean Std. Dev iat ion Std. Error Lower Bound Upper Bound

95% Confidence Interval for

Mean

Minimum Maximum

ANOVA

PainRelief

937.908 3 312.636 6.403 .001

2929.412 60 48.824

3867.320 63

Between Groups

Within Groups

Total

Sum of

Squares df Mean Square F Sig.

P_ValueSSE

MSTreat

MSE

SSTreat


7/14

12/9/20

We can also use General Linear

Model. This way we do not need

to do any recoding.

We can also use general linearmodel/univariate

Tests of Between-Subjects Effects

Dependent Variable: PainRelief

937.908a 3 312.636 6.403 .001

12674. 937 1 12674.937 259.607 .000

937.908 3 312.636 6.403 .001

2929.412 60 48.824

16817. 760 64

3867.320 63

Source

Corrected Model

Intercept

Drug

Error

Total

Corrected Total

Type III Sum

of Squares df Mean Square F Sig.

R Squared = .243 (Adjusted R Squared = .205)a.

P-v alue


8/14

12/9/20

Interpretations

• Decision:

• Conclusion:

Multiple comparisons

• When the null hypothesis is rejected, it maybe desirable to find which mean(s) is (are)different, and at what ranking order.

• Three statistical inference procedures,geared at doing this, are discussed: – Fisher’s least significant difference (LSD)

method

– Bonferroni adjustment

– Tukey’s multiple comparison method

Multiple comparisons

• If you just need to verify 2 or 3 pairwise

comparisons use the Bonferroni method.• If you plan to do all possible comparisons,

use Tukey.

• Fisher might be used if you want to identifyareas that require further analysis.


9/14

12/9/20

• Multiple Comparisons

Dependent Variable: PainRelief

Tukey HSD

-8.8845* 3 .0530 .025 -16.952 -.817

1.9079 2.5624 .879 -4.863 8.679

-.8800 2.8931 .990 -8.525 6.765

8.8845* 3.0530 .025 .817 16.952

10.7925* 2.4743 .000 4.254 17.331

8.0045* 2.8153 .030 .565 15.444

-1.9079 2.5624 .879 -8.679 4.863

-10. 7925* 2 .4743 . 000 -17.331 -4.254

-2.7879 2.2740 .613 -8.797 3.221

.8800 2.8931 .990 -6.765 8.525

-8.0045* 2 .8153 .030 -15.444 -.565

2.7879 2.2740 .613 -3.221 8.797

(J) drug_code Activ e2

Control

Rival

Activ e1

Control

Rival

Activ e1

Activ e2

Rival

Activ e1

Activ e2

Control

(I) drug_code Activ e1

Activ e2

Control

Rival

Mean

Difference

(I-J) Std. Error Sig. Lower Bound Upper Bound

95% Confidence Interval

The mean diff erence is significant at the .05 lev el.*.

Other fixed effects Analysis of Variance Models

• We are interested in studying the effect ofseveral factors on some dependent variable.

• Each characteristic investigated is called a factor.

• Each factor has several levels.

Levels of factor A

1 2 3

Level 1 of factor BLevel 2 of factor B

1 2 3

1 2 31 2 3

Level 1and 2 of factor B

Difference among the levels of factor A

No difference among the levels of factor B

Difference among the levels of factor A, anddifference among the levels of factor B; nointeraction

Levels of factor A

Levels of factor A Levels of factor A

No difference among the levels of factor A.Difference among the levels of factor B

Interaction

M Re sa pn o

nse

M R

e sa pn o

nse

M Re sa pn o

nse

M Re sa pn o

nse


10/14

12/9/20

Example: Evaluating Employee Time Schedules

• Should the clerical employees of a large insurance company be

switched to a four-day week, allowed to use flextime schedules, or

kept to the usual 9-to-5 workday?

• File: Flextime.sav

Department Condition

1 Claims 1 Flextime

2 Data

processing

2 Four-day

week

3 Investments 3 Regular

hours

The data measure the percentageefficiency gains over a four-week trial.

Claims DP Invest

Department

-5.00

0.00

5.00

10.00

15.00

E s t i m a t e d M a r g i n a l M e a n s

Condition

Flex

FourDay

Regular

Estimated Marginal Means of Improve


11/14

12/9/20

Box-Plot

Box-Plot

Exercise

• Study the effect of gender and the job

categories on salaries.


12/14

12/9/20

Regression Analysis Using

SPSS• We employ Regression Analysis to

examine the relationship among

quantitative variables.

• The technique is used to predict the

value of one variable (the dependent

variable - y)based on the value of other

variables (independent variables x1,

x2,…xk .)

34

Example 1: DataCom

• The human resources manager at DataCom, Inc.

wants to predict the annual salary of given

employees using the following explanatory

variables: The number of years of prior relevant

experience, the number of years of employment at

DataCom, the number of years of education beyond

high school, the employee's gender, the employee's

department, and the number of individuals

supervised by the given employee. These data are

collected for a sample of employees and areprovided in the file DataCom.sav.

35

The Model

• The first order linear model

y = dependent variable

x = independent variable

b0 = y-intercept

b1 = slope of the line

ε = error variable

36

0 1 y x

x

y

0 Run

Rise 1 = Rise/Run

b0 and b1 are unknown,

therefore, are estimated

from the data.


13/14

12/9/20

Coefficients

Dependent variable Independent variables

Random error variable

If we have more predictors• We allow for k independent variables to

potentially be related to the dependent

variable

y = 0 + 1x1+ 2x2 + …+ k xk +

Example 1

• DataCom example mentioned in the

first part of this unit.

• Use dummy variables to evaluate the

effect of the department on salaries.

Example 1 using SPSS


14/14

12/9/20



inferentialstats_spss

Documents