inferentialstats_spss
TRANSCRIPT
-
8/9/2019 InferentialStats_SPSS
1/14
12/9/20
Inferential Statistics using
SPSS
Chafik Bouhaddioui
Department of Statistics
Outline
• Hypothesis testing using SPSS
• Analysis of Variance: One and two way
ANOVA using SPSS
• Regression analysis using SPSS
• Time Series using SPSS*.
One mean
• Is there an evidence to say that the mean
salary of employees is greater than$32,000.00?
• Hypotheses:
• If we define by the mean salary ofemployees, then:
H0 : = 32000
H1 : > 32000
This test is called one sample t-test.
-
8/9/2019 InferentialStats_SPSS
2/14
12/9/20
One mean t-test: SPSS
P-value method
– The p - value provides information about the amountof statistical evidence that supports the alternativehypothesis.
5
– The p-value of a test is the probability of observing atest statistic at least as extreme as the one computed,given that the null hypothesis is true.
–
Let us demonstrate the concept on exampleSPSS or any Statistical Software will give you the p-value
One mean t-test: SPSS
-
8/9/2019 InferentialStats_SPSS
3/14
12/9/20
One mean t-test: SPSS
• Interpret?
• Decision:
• Conclusion:
Two independent means
• Is there any evidence to conclude that the
company is discriminating between malesand females in salaries?
• If we define by:
– =
– =
Hypotheses:
H0 : = H1 : >
This test is called 2 samples t-test.
Two independent means: SPSS
-
8/9/2019 InferentialStats_SPSS
4/14
12/9/20
Two independent means: SPSS
How to extract age from date?
Exercise
• Compare the mean age of male and
female employees?
-
8/9/2019 InferentialStats_SPSS
5/14
12/9/20
Analysis of Variance:
• Example:• A pharmaceutical manufacturer would like to be able to
claim that its new headache relief medication is better than
those of rivals. Also, it has two methods for formulatingits product, and it would like to compare these as well.• File: Headache.sav• The data is the result of an experiment where in the
column drug (1 is active compound #1, 2 for activecompound 2, 3 for rival product and 4 for control group(aspirin). We measured a pain relief score with a rangefrom 0 (no relief) to 50 (complete relief). Study was carriedout “double-blind”
• From the small experiment, what claims can the marketersoffer?
H0: m1 = m2= m3 =m4
H1: At least one of the means
differs
To perform the analysis of variance we need to build an “F” statistic.
To more easily follow the process we use
the following notation:
-
8/9/2019 InferentialStats_SPSS
6/14
12/9/20
Descriptives
PainRelief
10 13.370 5.9183 1.8715 9.136 17.604 1.3 22.3
11 22.255 6.2943 1.8978 18.026 26.483 10.6 31.929 11.462 7.6760 1.4254 8.542 14.382 .5 25.1
14 14.250 6.6110 1.7669 10.433 18.067 3.3 25.2
64 14.225 7.8349 .9794 12.268 16.182 .5 31.9
Activ e1
Activ e2
Control
Rival
Total
N Mean Std. Dev iat ion Std. Error Lower Bound Upper Bound
95% Confidence Interval for
Mean
Minimum Maximum
ANOVA
PainRelief
937.908 3 312.636 6.403 .001
2929.412 60 48.824
3867.320 63
Between Groups
Within Groups
Total
Sum of
Squares df Mean Square F Sig.
P_ValueSSE
MSTreat
MSE
SSTreat
-
8/9/2019 InferentialStats_SPSS
7/14
12/9/20
We can also use General Linear
Model. This way we do not need
to do any recoding.
We can also use general linearmodel/univariate
Tests of Between-Subjects Effects
Dependent Variable: PainRelief
937.908a 3 312.636 6.403 .001
12674. 937 1 12674.937 259.607 .000
937.908 3 312.636 6.403 .001
2929.412 60 48.824
16817. 760 64
3867.320 63
Source
Corrected Model
Intercept
Drug
Error
Total
Corrected Total
Type III Sum
of Squares df Mean Square F Sig.
R Squared = .243 (Adjusted R Squared = .205)a.
P-v alue
-
8/9/2019 InferentialStats_SPSS
8/14
12/9/20
Interpretations
• Decision:
• Conclusion:
Multiple comparisons
• When the null hypothesis is rejected, it maybe desirable to find which mean(s) is (are)different, and at what ranking order.
• Three statistical inference procedures,geared at doing this, are discussed: – Fisher’s least significant difference (LSD)
method
– Bonferroni adjustment
– Tukey’s multiple comparison method
Multiple comparisons
• If you just need to verify 2 or 3 pairwise
comparisons use the Bonferroni method.• If you plan to do all possible comparisons,
use Tukey.
• Fisher might be used if you want to identifyareas that require further analysis.
-
8/9/2019 InferentialStats_SPSS
9/14
12/9/20
• Multiple Comparisons
Dependent Variable: PainRelief
Tukey HSD
-8.8845* 3 .0530 .025 -16.952 -.817
1.9079 2.5624 .879 -4.863 8.679
-.8800 2.8931 .990 -8.525 6.765
8.8845* 3.0530 .025 .817 16.952
10.7925* 2.4743 .000 4.254 17.331
8.0045* 2.8153 .030 .565 15.444
-1.9079 2.5624 .879 -8.679 4.863
-10. 7925* 2 .4743 . 000 -17.331 -4.254
-2.7879 2.2740 .613 -8.797 3.221
.8800 2.8931 .990 -6.765 8.525
-8.0045* 2 .8153 .030 -15.444 -.565
2.7879 2.2740 .613 -3.221 8.797
(J) drug_code Activ e2
Control
Rival
Activ e1
Control
Rival
Activ e1
Activ e2
Rival
Activ e1
Activ e2
Control
(I) drug_code Activ e1
Activ e2
Control
Rival
Mean
Difference
(I-J) Std. Error Sig. Lower Bound Upper Bound
95% Confidence Interval
The mean diff erence is significant at the .05 lev el.*.
Other fixed effects Analysis of Variance Models
• We are interested in studying the effect ofseveral factors on some dependent variable.
• Each characteristic investigated is called a factor.
• Each factor has several levels.
Levels of factor A
1 2 3
Level 1 of factor BLevel 2 of factor B
1 2 3
1 2 31 2 3
Level 1and 2 of factor B
Difference among the levels of factor A
No difference among the levels of factor B
Difference among the levels of factor A, anddifference among the levels of factor B; nointeraction
Levels of factor A
Levels of factor A Levels of factor A
No difference among the levels of factor A.Difference among the levels of factor B
Interaction
M Re sa pn o
nse
M R
e sa pn o
nse
M Re sa pn o
nse
M Re sa pn o
nse
-
8/9/2019 InferentialStats_SPSS
10/14
12/9/20
Example: Evaluating Employee Time Schedules
• Should the clerical employees of a large insurance company be
switched to a four-day week, allowed to use flextime schedules, or
kept to the usual 9-to-5 workday?
• File: Flextime.sav
Department Condition
1 Claims 1 Flextime
2 Data
processing
2 Four-day
week
3 Investments 3 Regular
hours
The data measure the percentageefficiency gains over a four-week trial.
Claims DP Invest
Department
-5.00
0.00
5.00
10.00
15.00
E s t i m a t e d M a r g i n a l M e a n s
Condition
Flex
FourDay
Regular
Estimated Marginal Means of Improve
-
8/9/2019 InferentialStats_SPSS
11/14
12/9/20
Box-Plot
Box-Plot
Exercise
• Study the effect of gender and the job
categories on salaries.
-
8/9/2019 InferentialStats_SPSS
12/14
12/9/20
Regression Analysis Using
SPSS• We employ Regression Analysis to
examine the relationship among
quantitative variables.
• The technique is used to predict the
value of one variable (the dependent
variable - y)based on the value of other
variables (independent variables x1,
x2,…xk .)
34
Example 1: DataCom
• The human resources manager at DataCom, Inc.
wants to predict the annual salary of given
employees using the following explanatory
variables: The number of years of prior relevant
experience, the number of years of employment at
DataCom, the number of years of education beyond
high school, the employee's gender, the employee's
department, and the number of individuals
supervised by the given employee. These data are
collected for a sample of employees and areprovided in the file DataCom.sav.
35
The Model
• The first order linear model
y = dependent variable
x = independent variable
b0 = y-intercept
b1 = slope of the line
ε = error variable
36
0 1 y x
x
y
0 Run
Rise 1 = Rise/Run
b0 and b1 are unknown,
therefore, are estimated
from the data.
-
8/9/2019 InferentialStats_SPSS
13/14
12/9/20
Coefficients
Dependent variable Independent variables
Random error variable
If we have more predictors• We allow for k independent variables to
potentially be related to the dependent
variable
y = 0 + 1x1+ 2x2 + …+ k xk +
Example 1
• DataCom example mentioned in the
first part of this unit.
• Use dummy variables to evaluate the
effect of the department on salaries.
Example 1 using SPSS
-
8/9/2019 InferentialStats_SPSS
14/14
12/9/20
Example 1 using SPSS
Example 1 using SPSS