inferentialstats_spss

Upload: md-didarul-alam

Post on 01-Jun-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 InferentialStats_SPSS

    1/14

    12/9/20

    Inferential Statistics using 

    SPSS

    Chafik Bouhaddioui

    Department of Statistics

    Outline

    • Hypothesis testing using SPSS

    •  Analysis of Variance: One and two way

     ANOVA using SPSS

    • Regression analysis using SPSS

    • Time Series using SPSS*.

    One mean

    • Is there an evidence to say that the mean

    salary of employees is greater than$32,000.00?

    • Hypotheses:

    • If we define by  the mean salary ofemployees, then:

    H0 :  = 32000

    H1 :  > 32000

     This test is called one sample t-test.

  • 8/9/2019 InferentialStats_SPSS

    2/14

    12/9/20

    One mean t-test: SPSS

    P-value method 

     –  The p - value provides information about the amountof statistical evidence that supports the alternativehypothesis.

    5

     –  The p-value of a test is the probability of observing atest statistic at least as extreme as the one computed,given that the null hypothesis is true.

     –

      Let us demonstrate the concept on exampleSPSS or any Statistical Software will give you the p-value

    One mean t-test: SPSS

  • 8/9/2019 InferentialStats_SPSS

    3/14

    12/9/20

    One mean t-test: SPSS

    • Interpret?

    • Decision:

    • Conclusion:

     Two independent means

    • Is there any evidence to conclude that the

    company is discriminating between malesand females in salaries?

    • If we define by:

    – =  

    – =  

    Hypotheses:

    H0 : =  H1 : >  

     This test is called 2 samples t-test.

     Two independent means: SPSS

  • 8/9/2019 InferentialStats_SPSS

    4/14

    12/9/20

     Two independent means: SPSS

    How to extract age from date?

    Exercise

    • Compare the mean age of male and

    female employees?

  • 8/9/2019 InferentialStats_SPSS

    5/14

    12/9/20

     Analysis of Variance:

    • Example:•  A pharmaceutical manufacturer would like to be able to

    claim that its new headache relief medication is better than

    those of rivals. Also, it has two methods for formulatingits product, and it would like to compare these as well.• File: Headache.sav•  The data is the result of an experiment where in the

    column drug (1 is active compound #1, 2 for activecompound 2, 3 for rival product and 4 for control group(aspirin). We measured a pain relief score with a rangefrom 0 (no relief) to 50 (complete relief). Study was carriedout “double-blind” 

    • From the small experiment, what claims can the marketersoffer?

    H0: m1 = m2= m3 =m4 

    H1: At least one of the means

    differs

     To perform the analysis of variance we need to build an “F” statistic.

     To more easily follow the process we use

    the following notation:

  • 8/9/2019 InferentialStats_SPSS

    6/14

    12/9/20

    Descriptives

    PainRelief 

    10 13.370 5.9183 1.8715 9.136 17.604 1.3 22.3

    11 22.255 6.2943 1.8978 18.026 26.483 10.6 31.929 11.462 7.6760 1.4254 8.542 14.382 .5 25.1

    14 14.250 6.6110 1.7669 10.433 18.067 3.3 25.2

    64 14.225 7.8349 .9794 12.268 16.182 .5 31.9

     Activ e1

     Activ e2

    Control

    Rival

    Total

    N Mean Std. Dev iat ion Std. Error Lower Bound Upper Bound

    95% Confidence Interval for 

    Mean

    Minimum Maximum

    ANOVA

    PainRelief 

    937.908 3 312.636 6.403 .001

    2929.412 60 48.824

    3867.320 63

    Between Groups

    Within Groups

    Total

    Sum of 

    Squares df Mean Square F Sig.

    P_ValueSSE

    MSTreat

    MSE

    SSTreat

  • 8/9/2019 InferentialStats_SPSS

    7/14

    12/9/20

     We can also use General Linear

    Model. This way we do not need

    to do any recoding.

     We can also use general linearmodel/univariate

    Tests of Between-Subjects Effects

    Dependent Variable: PainRelief 

    937.908a 3 312.636 6.403 .001

    12674. 937 1 12674.937 259.607 .000

    937.908 3 312.636 6.403 .001

    2929.412 60 48.824

    16817. 760 64

    3867.320 63

    Source

    Corrected Model

    Intercept

    Drug

    Error 

    Total

    Corrected Total

    Type III Sum

    of Squares df Mean Square F Sig.

    R Squared = .243 (Adjusted R Squared = .205)a.

    P-v alue

  • 8/9/2019 InferentialStats_SPSS

    8/14

    12/9/20

    Interpretations

    • Decision:

    • Conclusion:

    Multiple comparisons

    •  When the null hypothesis is rejected, it maybe desirable to find which mean(s) is (are)different, and at what ranking order.

    •  Three statistical inference procedures,geared at doing this, are discussed: – Fisher’s least significant difference (LSD)

    method

     – Bonferroni adjustment

     – Tukey’s multiple comparison method

    Multiple comparisons

    • If you just need to verify 2 or 3 pairwise

    comparisons use the Bonferroni method.• If you plan to do all possible comparisons,

    use Tukey.

    • Fisher might be used if you want to identifyareas that require further analysis.

  • 8/9/2019 InferentialStats_SPSS

    9/14

    12/9/20

    •  Multiple Comparisons

    Dependent Variable: PainRelief 

    Tukey HSD

    -8.8845* 3 .0530 .025 -16.952 -.817

    1.9079 2.5624 .879 -4.863 8.679

    -.8800 2.8931 .990 -8.525 6.765

    8.8845* 3.0530 .025 .817 16.952

    10.7925* 2.4743 .000 4.254 17.331

    8.0045* 2.8153 .030 .565 15.444

    -1.9079 2.5624 .879 -8.679 4.863

    -10. 7925* 2 .4743 . 000 -17.331 -4.254

    -2.7879 2.2740 .613 -8.797 3.221

    .8800 2.8931 .990 -6.765 8.525

    -8.0045* 2 .8153 .030 -15.444 -.565

    2.7879 2.2740 .613 -3.221 8.797

    (J) drug_code Activ e2

    Control

    Rival

     Activ e1

    Control

    Rival

     Activ e1

     Activ e2

    Rival

     Activ e1

     Activ e2

    Control

    (I) drug_code Activ e1

     Activ e2

    Control

    Rival

    Mean

    Difference

    (I-J) Std. Error Sig. Lower Bound Upper Bound

    95% Confidence Interval

    The mean diff erence is significant at the .05 lev el.*.

    Other fixed effects Analysis of Variance Models

    •  We are interested in studying the effect ofseveral factors on some dependent variable.

    • Each characteristic investigated is called a factor.

    • Each factor has several levels.

    Levels of factor A

    1 2 3

    Level 1 of factor BLevel 2 of factor B

    1 2 3

    1 2 31 2 3

    Level 1and 2 of factor B

    Difference among the levels of factor A

    No difference among the levels of factor B

    Difference among the levels of factor A, anddifference among the levels of factor B; nointeraction

    Levels of factor A

    Levels of factor A Levels of factor A

    No difference among the levels of factor A.Difference among the levels of factor B

    Interaction

    M Re sa pn o

    nse

    M R

    e sa pn o

    nse

    M Re sa pn o

    nse

    M Re sa pn o

    nse

  • 8/9/2019 InferentialStats_SPSS

    10/14

    12/9/20

    Example: Evaluating Employee Time Schedules

    • Should the clerical employees of a large insurance company be

    switched to a four-day week, allowed to use flextime schedules, or

    kept to the usual 9-to-5 workday?

    • File: Flextime.sav

    Department Condition

    1 Claims 1 Flextime

    2 Data

    processing

    2 Four-day

    week

    3 Investments 3 Regular

    hours

     The data measure the percentageefficiency gains over a four-week trial.

    Claims DP Invest

    Department

    -5.00

    0.00

    5.00

    10.00

    15.00

       E   s   t   i   m   a   t   e   d   M   a   r   g   i   n   a   l   M   e   a   n   s

    Condition

    Flex

    FourDay

    Regular 

    Estimated Marginal Means of Improve

  • 8/9/2019 InferentialStats_SPSS

    11/14

    12/9/20

    Box-Plot

    Box-Plot

    Exercise

    • Study the effect of gender and the job

    categories on salaries.

  • 8/9/2019 InferentialStats_SPSS

    12/14

    12/9/20

    Regression Analysis Using

    SPSS• We employ Regression Analysis to

    examine the relationship among

    quantitative variables.

    • The technique is used to predict the

    value of one variable (the dependent

    variable - y)based on the value of other

    variables (independent variables x1,

    x2,…xk .)

    34

    Example 1: DataCom

    • The human resources manager at DataCom, Inc.

    wants to predict the annual salary of given

    employees using the following explanatory

    variables: The number of years of prior relevant

    experience, the number of years of employment at

    DataCom, the number of years of education beyond

    high school, the employee's gender, the employee's

    department, and the number of individuals

    supervised by the given employee. These data are

    collected for a sample of employees and areprovided in the file DataCom.sav.

    35

     The Model

    • The first order linear model

    y = dependent variable

    x = independent variable

    b0 = y-intercept

    b1 = slope of the line

    ε  = error variable

    36

    0 1 y x   

    x

    y

    0  Run

    Rise 1 = Rise/Run

    b0 and b1 are unknown,

    therefore, are estimated

    from the data.

  • 8/9/2019 InferentialStats_SPSS

    13/14

    12/9/20

    Coefficients

    Dependent variable Independent variables

    Random error variable

    If we have more predictors• We allow for k  independent variables to

    potentially be related to the dependent

    variable

     y = 0 + 1x1+ 2x2 + …+ k xk  +

    Example 1

    • DataCom example mentioned in the

    first part of this unit.

    • Use dummy variables to evaluate the

    effect of the department on salaries.

    Example 1 using SPSS

  • 8/9/2019 InferentialStats_SPSS

    14/14

    12/9/20

    Example 1 using SPSS

    Example 1 using SPSS