simple and multiple regression analysis

Upload: umair-khan-niazi

Post on 07-Aug-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/20/2019 Simple and Multiple Regression Analysis

    1/48

     Regression Analysis

    ε ++++++=   k k  xb xb xb xba y ...ˆ 332211

     X 1

     X 2

     X 3

     ŷ

  • 8/20/2019 Simple and Multiple Regression Analysis

    2/48

     COMMON TYPES OF ANALYSIS?COMMON TYPES OF ANALYSIS?

    1.Compare Groupsa. Compare Proportions (e.g. C!i S"uare Test#

     

    2$ %&' P1  P)  P*  + P,

     

    -. Compare Means (e.g. Ana/sis o0 arian2e$ %&' 31  3)  3*  + 3,

    ).E4amine Strengt! an5 6ire2tion o0 7eations!ips

    a. 8i9ariate (e.g. Pearson Correation#r$

    8et:een one 9aria-e an5 anot!er' Y a ; -1 41

    -. Muti9ariate (e.g. Mutipe 7egression Ana/sis$ 8et:een one 5ep. 9ar. an5 ea2! o0 se9era in5ep. 9aria-es

    :!ie !o5ing a ot!er in5ep. 9aria-es 2onstant'

      Y a ; -1 41 ; -) 4) ; -* 4* ; + ; -, 4,

    STATITICAL 6ATA ANALYSISSTATITICAL 6ATA ANALYSIS

  • 8/20/2019 Simple and Multiple Regression Analysis

    3/48

     Simple and Multiple Regression Analysis 

    Examines whether changes/differences in values of one variable(dependent variable Y are lin!ed to changes/differences in valuesof one or more other variables (independent variables "1# "2# etc.#while controlling for the changes in values of all other "s.

    E.g.# $elationship between salar% and gender for people who have the samelevels of education# wor! experience# position level# seniorit%# etc.

    &he ' (Y must be metric.

    &he )s ("s must be either metric or dumm% var.

    *entral +uestion ,ddressed-

    )s Y a function of "1# "2# etc. ow )s there a relationship between Y and "1# "2 # etc.# (in each case#after controlling for the effects of all other "s )n what wa%

    0hat is the relative impact of each " on Y# holding all other "sconstant (that is# all other "s being eual

    0hat does regression anal%sis do

  • 8/20/2019 Simple and Multiple Regression Analysis

    4/48

     Simple and Multiple Regression Analysis ore specificall%#

    'o values of Y tend to increase/decrease asvalues of "1# "2# etc. increase/decrease

    )f so#% how much

      ,nd

    ow strong is the connection/relationship between "s and Y

    4 what 5 of differences/variationsin Y values (e.g.# income amongstud% sub6ects can be explained b%(or attributed to differences in" values (e.g. %ears of education#%ears of experience# etc.

     X 

    1

     X 2

     X 3

     ŷ

  • 8/20/2019 Simple and Multiple Regression Analysis

    5/48

     Simple and Multiple Regression Analysis  78&E- 8nce we can determine how values of Y change as afunction of values of "

    1# "

    2# etc.# we will also be able to

    predict/estimate the value of Y from specific values of "1# "2#etc.

     Y a ; -1 41 ; -) 4) ; -* 4* ; + ; -, 4,;<

    &herefore# regression anal%sis# in a sense# is aboutE9&),&)7: values of Y# using information aboutvalues of "s-

    Estimation# b% definition# involves

    &he ob6ective

    &o minimi;e error in estimation.

    8r# to compute estimates that are

    as close to the true/actual values as possible.

     

  • 8/20/2019 Simple and Multiple Regression Analysis

    6/48

     Simple and Multiple Regression Analysis +

  • 8/20/2019 Simple and Multiple Regression Analysis

    7/48

      Simple and Multiple Regression Analysis

     Estimating Number of Credit Cards* 

    i

    ?amil% 7umber 

    %i 

    Actual @ of *redit*ards

    1 A

    2 B3 B

    A C

    D >

    B CC >

    > 1

    DB=∑ iY 

    Estimate

    +

    DBˆ   ===  y y

    = ŷ

    F &his example was adopted from air# lac!# abin# ,nderson# G &atham# (2B. Multivariate Data Analysis# Bth ed.# Hrentice all.

  • 8/20/2019 Simple and Multiple Regression Analysis

    8/48

      Simple and Multiple Regression Analysis

     Estimating Number of Credit Cardsi

    ?amil% 7umber 

     

    Actual @ of*redit *ards

    Estimate for @of *redit

    *ards

    Error  inEstimation

    1 A C ?2 B C ?

    3 B C ?

    A C C ?

    D > C ?

    B C C ?

    C > C ?

    > 1 C ?

    DB=∑ i y C>DB

    ˆ   ===  y y

     y y =ˆi y

  • 8/20/2019 Simple and Multiple Regression Analysis

    9/48

      Simple and Multiple Regression Analysis

     Estimating Number of Credit Cardsi

    ?amil% 7umber 

     

    Actual @ of*redit *ards

    Estimate for @of *redit

    *ards

    Error  inEstimation

    1 A C I32 B C I1

    3 B C I1

    A C C

    D > C J1

    B C C

    C > C J1

    > 1 C J3

    DB=∑ i y

     y yi  −

    C>

    DBˆ   ===  y y

     y y =ˆi y

    Kets now see all

    this graphicall%

  • 8/20/2019 Simple and Multiple Regression Analysis

    10/48

     Simple and Multiple Regression Analysis1

    L

    >

    C

    B

    D

    A

    3

    2

    1

    ?1?1

    ?2# ?3?2# ?3?A?A

    ?D?D

    ?B?B?C?C

    ?>?>

    Y Y  =M

       ,  c   t  u  a   l   @

      o   f  c  r  e   d   i   t  c  a  r   d

      s

       ,  c   t  u  a   l   @

      o   f  c  r  e   d   i   t  c  a  r   d

      s

    EstimateEstimate

    KetNs spread the dots awa% from each

    other to see things more clearl%O

  • 8/20/2019 Simple and Multiple Regression Analysis

    11/48

     Simple and Multiple Regression Analysis1

    L

    >

    C

    B

    D

    A

    3

    2

    1

    ?1?1

    ?2?2

    ?3?3?A?A

    ?D?D

    ?B?B

    ?C?C

    ?>?>

    Estimation ErrorEstimation Error

    Y Y  =M

       ,  c   t  u  a   l   @

      o   f  c  r  e   d   i   t  c  a  r   d

      s

       ,  c   t  u  a   l   @

      o   f  c  r  e   d   i   t  c  a  r   d

      s :raphic $epresentation

    EstimateEstimate

    A2tuaA2tua

    *an we determine the

    total estimation error

    for all > families

    EstimateEstimate

  • 8/20/2019 Simple and Multiple Regression Analysis

    12/48

      Simple and Multiple Regression Analysis

    i

    ?amil% 7umber 

     

    Actual @ of*redit *ards

    Estimate for @of *redit

    *ards

    Error  inEstimation

    1 A C I3

    2 B C I1

    3 B C I1

    A C C

    D > C J1

    B C C

    C > C J1

    > 1 C J3DB=∑ i y   ∑   − (   y yiC

    >

    DBˆ   ===  y y

    0hat would be the

    total estimationtotal estimation

    error error for all >

    families combined

     = 0

    Solution?

     y y =ˆi y  y yi  −

  • 8/20/2019 Simple and Multiple Regression Analysis

    13/48

      Simple and Multiple Regression Analysis

     Estimating Number of Credit Cardsi

    ?amil% 7umber 

     

    Actual @ of*redit *ards

    Estimate for @of *redit

    *ards

    Error  inEstimation

    Errors 9uared

    1 A C I3 L2 B C I1 1

    3 B C I1 1

    A C C

    D > C J1 1

    B C C

    C > C J1 1

    > 1 C J3 L

    2

    (   y yi  −

    ∑   =−   222(   y yi

    SST = Sum of Squares Total

    i y   y y =ˆ  y yi  −

    DB=∑   i y C>

    DBˆ   ===  y y E(   =−∑   y yi

  • 8/20/2019 Simple and Multiple Regression Analysis

    14/48

     Simple and Multiple Regression Analysis 

    22 = 99& = )ndex for total (combined amount of estimation error  for all families (observations in the sample when using the mean  as the estimate.

      99& is also the sum of suared deviations from the mean.

    o  $emember the formula for computing Variance

    4  8b6ective in Estimation

    inimi;e error# maximi;e precision.

    4  *an we cut down the amount of estimation error (99& ow*an we cut down the amount of estimation error (99& ow

    Yes# we can# b% using information about other variables b% using information about other variables suspectedto be strong predictors (strongl% related to @ of credit cards

     possessed b% families (e.g.# famil% si;efamil% si;e## famil% incomefamil% income# etc...

  • 8/20/2019 Simple and Multiple Regression Analysis

    15/48

     

     Simple and Multiple Regression Analysis

    i

    ?amil% 7umber 

     

    Actual @ of*redit *ards

    Family Sie

    1 A 2

    2 B 2

    3 B A

    A C A

    D > D

    B C D

    C > B

    > 1 B

     y   x

    0e now can attempt to

    estimate @ of credit cards

    from the information onfamil% si;e# rather than

    from its own mean. 

    KetNs first see this graphicall%O

  • 8/20/2019 Simple and Multiple Regression Analysis

    16/48

     Simple and Multiple Regression Analysis

    1

    L

    >

    C

    B

    D

    A

    3

    2

    1

    ?1?1

    ?2?2

    ?3?3

    ?A?A?D?D

    ?B?B

    ?C?C

    ?>?>

       !   "   f   #  r  e   d   i   t   #  a  r   d  s

       !   "   f   #  r  e   d   i   t   #  a  r   d  s

    $$

    1 2 3 A D B C

    Origina (8aseine$Origina (8aseine$

    EstimateEstimate

    %%

    Family SieFamily Sie

     y y =ˆ

    QUESTION:QUESTION: Does the mean appear to represent the

    closest estimate of the actual c.c. numbers for oursample families ?

    That is, is the green line the best line to represent the

    location of estimates of of !! for these families?

    ( yA#2   ==   y x

    Hlot actual numbers of **s

    against famil% 9i;e.

  • 8/20/2019 Simple and Multiple Regression Analysis

    17/48

     Simple and Multiple Regression Analysis

    1

    L

    >

    C

    B

    D

    A

    3

    2

    1

    ?1

    ?2

    ?3

    ?A ?D

    ?B

    ?C

    ?>

       !   "   f   #  r  e   d   i   t   #  a  r   d  s

       !   "   f   #  r  e   d   i   t   #  a  r   d  s

    $$

    1 2 3 A D B C

    Origina (8aseine$Origina (8aseine$

    EstimateEstimate

    %%

    Family SieFamily Sie

    &eneric Equation for any

    straight line' $= a ( )*

     xba y 11ˆ   +=

     xba y 22ˆ   +=

     xba y 33ˆ   +=

    +egression ,ine

     y y =ˆ

     y xa y   =+= Eˆ

    $egression Kine

    (Kine of est ?itII

    new improved

    location for **estimates (see next

    slide

  • 8/20/2019 Simple and Multiple Regression Analysis

    18/48

     Simple and Multiple Regression Analysis

    1

    L

    >

    C

    B

    D

    A

    3

    2

    1

    ?1?1

    ?2?2

    ?3?3

    ?A?A?D?D

    ?B?B

    ?C?C

    ?>?>

       !   "   f   #  r  e   d   i   t   #  a  r   d  s

       !   "   f   #  r  e   d   i   t   #  a  r   d  s

    $$

    1 2 3 A D B C

    OriginaOrigina(8aseine$(8aseine$EstimateEstimate

    %%

    Family SieFamily Sie

    $eg. Kine (Kine ofest ?itIInew

    improved location

    for ** estimates

     y

     Estimation E$$8$  ˆ(  y y −

    $egression Kine will

    inimi;e = total estimation error.2ˆ(   y y∑   −

    bxa y   +=ˆ

    ut# how do we !now the valuesthe values aa andand b b  in (the reg. linebxa y   +=ˆ

  • 8/20/2019 Simple and Multiple Regression Analysis

    19/48

    ∑   −

    ∑   −−

    = 2(

    ((

     x x

     y y x x

    b

     xb ya   −=

     Actual # of credit cards

    bxa y   +=ˆ

    KetNs use above formulas to compute the values of  Pa Q

    and Pb Q for the regression line in our example.

    0e will need- and 

    EQUATION FOR REGRESSION LINE (LINE OF BESTFIT)--

    alues of a and b for the regression line-

    # y

    # x #((   y y x x  −−∑   ∑   −

    2

    (   x x

  • 8/20/2019 Simple and Multiple Regression Analysis

    20/48

      Simple and Multiple Regression Analysis

    i

    ?amil% 7umber 

     

    Actual ! of *redit

    *ards

    FamilySie

    1 A 2 ? ? ? ?

    2 B 2 ? ? ? ?

    3 B A ? ? ? ?

    A C A ? ? ? ?

    D > D ? ? ? ?

    B C D ? ? ? ?

    C > B ? ? ? ?

    > 1 B ? ? ? ?

    C>

    DB==Y  2D.A

    >

    3A== x

     x x −

    ((   =−−∑   y y x x

     y y − ((   y y x x   −−

    2(   =−∑   x x

    2

    (   x x −

    -e need'  #((   y y x x   −−∑   ∑   −2

    (   x xand# y # x

     y   x

  • 8/20/2019 Simple and Multiple Regression Analysis

    21/48

      Simple and Multiple Regression Analysis

    i

    ?amil% 7umber 

     

    Actual ! of *redit

    *ards

    FamilySie

    1 A 2 I2.2D I3 B.CD D.B2D

    2 B 2 I2.2D I1 2.2D D.B2D

    3 B A I.2D I1 .2D .B2D

    A C A I.2D .B2D

    D > D .CD 1 .CD .DB2D

    B C D .CD .DB2D

    C > B 1.CD 1 1.CD 3.B2D

    > 1 B 1.CD 3 D.2D 3.B2D

    C

    >

    DB==Y  2D.A

    >

    3A== x

     x x −

    1C((   =−−∑   y y x x

     y y − ((   y y x x   −−

    D.1C

    2

    (   =−∑   x x

    2

    (   x x −

    -e need'  #((   y y x x   −−∑   ∑   −2

    (   x xand# y # x

     y   x

  • 8/20/2019 Simple and Multiple Regression Analysis

    22/48

    REGRESSION LINE (LINE OF BEST FIT):

     

    a =2.87 b = .97

    LC1.D.1C

    1C

    2(

    ((==

    ∑   −

    ∑   −−=

     x x

     y y x xb

    >C.22D.A(LC1.C   =−=−=  xb ya

     x y LC.>C.2ˆ   +=

    bxa y   +=ˆ

    YI)ntercept

    $egression *oefficient

      Simple and Multiple Regression Analysis

  • 8/20/2019 Simple and Multiple Regression Analysis

    23/48

     Simple and Multiple Regression Analysis

    1

    L

    >

    C

    B

    D

    A

    3

    2

    1

    ?1?1

    ?2?2

    ?3?3

    ?A?A

    ?D?D

    ?B?B

    ?C?C

    ?>?>

       !   "   f   #  r  e   d   i   t   #  a  r   d  s

       !   "   f   #  r  e   d   i   t   #  a  r   d  s

    $$

    1 2 3 A D B C

    OriginaOrigina(8aseine$(8aseine$

    EstimateEstimate

    %%

    Family SieFamily Sie

     x y LC.>C.2ˆ   +=

     y

    *an we tell how much estimation error how much estimation error we havecommitted b% using the new regression line

    Ne:Ne:Impro9e5Impro9e5EstimatesEstimates

    "es, e#amine differences bet$een our househol%&s

    actual  of !!s an% their ne$'regression estimates.

  • 8/20/2019 Simple and Multiple Regression Analysis

    24/48

     Simple and Multiple Regression Analysis

    i?amil%

     7umber 

     Actual ! of *redit

    *ards

    FamilySie

    $egressionEstimate

    Error 

    ($esidual

    Errors9uared

    1 A 2 ? ? ?

    2 B 2 ? ? ?

    3 B A ? ? ?

    A C A ? ? ?

    D > D ? ? ?

    B C D ? ? ?C > B ? ? ?

    > 1 B ? ? ?

     y y ˆ− ŷ 2ˆ(   y y −

     x y LC.>C.2ˆ   +=

    ∑   −2ˆ(   y y

     x y

     ŷ

  • 8/20/2019 Simple and Multiple Regression Analysis

    25/48

     Simple and Multiple Regression Analysis

    i?amil%

     7umber 

     Actual ! of *redit

    *ards

    ?amil%9i;e

    $egression$egressionEstimateEstimate

    Error 

    ($esidual

    Errors9uared

    1 A 2 A.>1 I.>1 .BB

    2 B 2 A.>1 1.1L 1.A2

    3 B A B.CB I.CB .D>

    A C A B.CB .2A .B

    D > D C.C3 .2C .C

    B C D C.C3 I.C3 .D3C > B >.C I.C .AL

    > 1 B >.C 1.3 1.BL

     y y ˆ− ŷ2

    ˆ(   y y −

     x y LC.>C.2ˆ   += >1.A2(LC.>C.2ˆ   =+= y

    ∑   −= 2ˆ(   y y5.486

    SSE = Sum of Squares Error .SS +esidual

     x y

  • 8/20/2019 Simple and Multiple Regression Analysis

    26/48

     Simple and Multiple Regression Analysis &otal aseline Error using the mean (SS Total 10

     7ew or $emaining Error (SS Error or SS +esidual 21345 6 212

    78EST9":'  ow much of the original estimation error have we explained awa% (eliminated b% using the regression model (instead of the mean

    1B.D1A / 22 = .CD1 or CD5CD5  0hat is this called

    5 of differences in @ of **s among households that isexplained b% differences in their famil% si;e.

    0hat does the remaining 2D5 represent

    22 R D.A>B = 1B.D1A1B.D1A (99 $egression99 $egression or 99 Explained 

    78EST9":'  -hat ; of estimation error have we explained (eliminated b%using the regression model

    Hercent of variation (differences in number of credit cards owned b% families

    that can be accounted for b%- (a all other potential predictors not included in the

    model# be%ond famil% si;e# and (b unexplainable random/chance variations.

    "1

    &otal ar.

    in Y = 22

    1B.D

    D.D

    Y

    $ $ 22 =

  • 8/20/2019 Simple and Multiple Regression Analysis

    27/48

     Simple and Multiple Regression Analysis 

    $ $ 22 is a measure of our success regarding accurac% of our estimation effort.

    $ 2 = 5 of estimation error that we have been able to explain awa% b%  using the regression model# instead of using the mean.

    $ 2  indicates how much better we can predict Y from information about

      "s# rather than from using its own mean. $ 2 = 5 of differences (variations in Y values that is explained b%

    (attributable to differences in " values.

     7ote- 0hen dealing with onl% two variables (a single " and Y-

     

    KetNs now examine all this graphicall%O

    >BB.CD.22

    D1A.1B2====   Rr 

    $ 2 = 99 $egression / 99 &otal = 1B.D/22 = CD5

    Pearson Correlationo ! "it# $%(NOT &ontrollin' oran ot#er ar*)

  • 8/20/2019 Simple and Multiple Regression Analysis

    28/48

     y y ˆ−??

     Simple and Multiple Regression Analysis

    1

    L

    >

    C

    B

    D

    A

    3

    2

    1

    ?1?1

    ?2?2

    ?3?3

    ?A?A?D?D

    ?B?B

    ?C?C

    ?>?>

       !   "   f   #  r  e   d   i   t   #  a  r   d  s

       !   "   f   #

      r  e   d   i   t   #  a  r   d  s

    $$

    1 2 3 A D B C

    OriginaOrigina(8aseine$(8aseine$EstimateEstimate

    %%

    Family SieFamily Sie

     x y LC.>C.2ˆ   +=

    8riginal

    aseline

    E$$8$

    for F<

     y y −ˆ y

     y y −

     7ew E$$8$

    (

  • 8/20/2019 Simple and Multiple Regression Analysis

    29/48

     Simple and Multiple Regression Analysis 

    D.D = 99E = &he amount of estimation error for the > sample familieswhen using simple regression (i.e.# a regression model that includesonl% information about famil% si;e.

    *an we reduce the amount of estimationerror (99E to an even lower level and#thus# improving the estimation process ow

    Yes# b% adding information on a second variables suspected to bestrongl% related to @ of credit cards (e.g.# famil% incomeII"2.

    .

  • 8/20/2019 Simple and Multiple Regression Analysis

    30/48

     

     Simple and Multiple Regression Analysisi

    ?amil% 7umber 

     %i

    Actual @ of*redit *ards

    Family Sie Family9ncome

    1 A 2 1A

    2 B 2 1B

    3 B A 1AA C A 1C

    D > D 1>

    B C D 21

    C > B 1C

    > 1 B 2D

    Generic Equation for aGeneric Equation for a linear planelinear  plane:: 2211ˆ   xb xba y   ++=

    1 x 2 x

    KetNs examine the regression plane for our example graphicall%.

    0e now can attempt

    to estimate @ of **s

    from our information

    on famil% si;e andfamil% incomeO 

    8ur regression model

    will now be a linear

     plane# rather than astraight lineO

    Y # f C dit C d

  • 8/20/2019 Simple and Multiple Regression Analysis

    31/48

    21 21B.B3.A>2.ˆ   x x y   ++=

    KetNs now see

    how much error

    in estimation we

    are committing

     b% using this

    multiple

    regression

    model.

    Y = # of Credit Cards

    X1 = Family Size

    Family Income

    E 1 2 

    3 A D  B 

    C > 

    12

    11

    19

    8

    7

    !"

    $

    2

    1

    For+,las are aailale or&o+.,tin' al,es o a/ % an0 2 

    1ULTIPLE REGRESSION1OEL FOR OUR E$A1PLE:

    ,ctual

    $egression Estimate

    2211ˆ   xb xba y   ++=

  • 8/20/2019 Simple and Multiple Regression Analysis

    32/48

     Simple and Multiple Regression Analysis

    i?amil%

     7umber 

     Actual ! of *redit

    *ards

    FamilySie

    Family9ncome

    (S

    $egression EstimateEstimate

    Error 

    ($esidual

    Errors9uared

    1 A 2 1A ? ? ?

    2 B 2 1B ? ? ?

    3 B A 1A ? ? ?

    A C A 1C ? ? ?

    D > D 1> ? ? ?

    B C D 21 ? ? ?C > B 1C ? ? ?

    > 1 B 2D ? ? ?

     y y ˆ−Y ̂2

    ˆ(   y y −

    ∑   −2ˆ(   y y

    21 21B.B3.A>2.ˆ   x x y   ++=   ŷ

     y 1 x 2 x

  • 8/20/2019 Simple and Multiple Regression Analysis

    33/48

     Simple and Multiple Regression Analysis

    i?amil%

     7umber 

     Actual ! of *redit

    *ards

    FamilySie

    Family9ncome

    (S

    $egressionEstimate

    Error 

    ($esidual

    Errors9uared

    1 A 2 1A A.CC I.CC .DL

    2 B 2 1B D.2 .> .BA

    3 B A 1A B.3 I.3 .

    A C A 1C B.B> .32 .1

    D > D 1> C.D3 .AC .22

    B C D 21 >.1> I1.1> 1.3LC > B 1C C.LD .D .

    > 1 B 2D L.BC .33 .11

     y y ˆ−Y ̂2

    ˆ(   y y −

    ∑   −=2

    ˆ(   y y3.05SSE = Sum of Squares Error .+esidual

    21 21B.B3.A>2.ˆ   x x y   ++= CC.A1A(21B.2(B3.A>2.ˆ   =++= y

     y 1 x 2 x

    8nique .additional contri)ution of %% .family income.family income )eyond %102 = 132

  • 8/20/2019 Simple and Multiple Regression Analysis

    34/48

    $9ntercept$9ntercept# PaQ 

    (:"TE'  8nl% when all "s

    can meaningfull% ta!e onvalue of ;ero# the intercept

    will have a meaningful/direct/

     practical interpretation.

    8therwise# it is simpl% an aid

    in increasing accurac% of

    estimation.

    bb and bb!! = +egression #oefficients= +egression #oefficients

    015>- ,mong families of the same income# an increase in

    famil% si;e b% one person would# on average# result in .B3

    more credit cards.

    012.ˆ   x x y   ++=

     T#e 1ULTIPLE REGRESSION 1OEL FOR OUR E$A1PLE:

  • 8/20/2019 Simple and Multiple Regression Analysis

    35/48

    SST 3 22 SSE 3 4*56

    7#at is o,r ne" R

    2

    8

      Simple and Multiple Regression Analysis

    21 21B.B3.A>2.ˆ   x x y   ++=

    99 $egression = 22 R 3.D = 1>.LD

    $ $ 22= 1>.LD / 22 = .>B1 or >B5

     T#e 1ULTIPLE REGRESSION 1OEL FOR OUR E$A1PLE:

    Hercent of differences in householdsN

    number of **s that is explained b%

    differences in famil% si;e and famil%

    income.

    &he $emaining 1A5

    (3.D / 22 = .1A

    Hercent of variation in number of credit

    cards that can be accounted for b% (a all

    other relevant factors not included in the

    model# be%ond famil% si;e and income# and

    (b unexplainable random/chance

    variations.

  • 8/20/2019 Simple and Multiple Regression Analysis

    36/48

    d cba

    car  yx

    +++

    +=

    1

    d cba

    cbr  yx

    +++

    +=

    2

    "1=?amil%  9i;e

    "2 = ?amil%

      )ncome

    ddaa

     b bcc

    $= @ of **

    Pearson9si+.leCorrelationo ! "it# $%(not&ontrollin' or

    $2)

    Pearson9si+.le Correlationo ! "it# $2(not

    &ontrollin' or$ ?

    Total Variation/Error in $ = SS Total = a ( ) ( c ( d =

    >2L.22

    11.1D

    2  ==

     yxr 

    >BC.CD.22

    D.1B1 === yxr 

    23L>.EB3.Eˆ   X  y   +−=

    1LC.>C.2ˆ   X  y   += r 2 = $ 2 = (aJc / (aJbJcJd

    r 2 = (bJc / (aJbJcJd = 1D.12 / 22 = .B>C

    "1=?amil%

      si;e

    $

    SS+ SS+  == 

    a ( c  a ( c  =

  • 8/20/2019 Simple and Multiple Regression Analysis

    37/48

    99$ = a J b Jc = 1>.LD

    99& = a J b J c J d = 22

    $ $ 22 = 99$ / 99&= 99$ / 99& = (a J b J c / (a J b J c J d = 1>.LD / 221>.LD / 22 = >B5

    99E =

    99E = d = 22 R 1>.LD = 3.D

    21 21B.B3.A>2.ˆ   x x y   ++=

    "1=?amil%  9i;e

    "2 = ?amil%  )ncome

    ddaa

     b bcc

      78&E-  c is explained b%

      both "1 and "2

    $ 2  :raphicall% =

  • 8/20/2019 Simple and Multiple Regression Analysis

    38/48

     Simple and Multiple Regression Analysis

    i?amil%

     7umber 

     ,ctual @ of *redit

    *ards

    ?amil%9i;e

    ?amil%)ncome

    (S

    $egressionEstimate

    Error 

    ($esidual

    Errors9uared

    1 A 2 1A A.CC I.CC .DL

    2 B 2 1B D.2 .> .BA

    3 B A 1A B.3 I.3 .

    A C A 1C B.B> .32 .1

    D > D 1> C.D3 .AC .22

    B C D 21 >.1> I1.1> 1.3LC > B 1C C.LD .D .

    > 1 B 2D L.BC .33 .11

     y y ˆ−Y ̂2

    ˆ(   y y −

    ∑   −=2

    ˆ(   y y3.05SSE = Sum of Squares Error .+esidual

    21 21B.B3.A>2.ˆ   x x y   ++= CC.A1A(21B.2(B3.A>2.ˆ   =++= y

     y 1 x 2 x

    8nique .additional contri)ution of % = 212 >102 = 132$emember-

  • 8/20/2019 Simple and Multiple Regression Analysis

    39/48

    Eer&ise %: Re0o t#e &re0it &ar0analsis "it# SPSS*

    First/ Correlations an0 Si+.le Re'ression

    Net/ 1,lti.le Re'ression (also as; or .art

    an0 .artial &orrelations*)

    SPSS CREIT CAR FILE

    http://smb//datastore01/website/mqm497/MQM%20497%20SPSS%20Data%20Files/MQM497%20Credit%20Card%20Regression%20Model.savhttp://smb//datastore01/website/mqm497/MQM%20497%20SPSS%20Data%20Files/MQM497%20Credit%20Card%20Regression%20Model.sav

  • 8/20/2019 Simple and Multiple Regression Analysis

    40/48

     Simple and Multiple Regression Analysis

    E"E$*)9E 2-

  • 8/20/2019 Simple and Multiple Regression Analysis

    41/48

      No%* Is oerall F si'ni

  • 8/20/2019 Simple and Multiple Regression Analysis

    42/48

     Simple and Multiple Regression Analysis $egression ,nal%sis

  • 8/20/2019 Simple and Multiple Regression Analysis

    43/48

     Simple and Multiple Regression Analysis 

    E",HKE 1- )ncome = 2A J 1A gender .

    *oded- ?emale = # ale = 1

    )ncome = 12 J 1 Education Years J > :ender 

    *oded- ?emale = # ale = 1

    eaningeaning

    ,verage income of females

    with no education is S12.

    ,mong people of the same gender# ever%additional %ear of education results in an

    average additional income of S1#.

    ales ma!e# on average# S> more in

    comparison with females who have the

    same number of %ears of education.

    ,verage income of females is S2A#.

    ales on average ma!e S1A more than females

  • 8/20/2019 Simple and Multiple Regression Analysis

    44/48

    Eer&ise : S,..ose "e are intereste0 in;no"in' "#at role/ i an/ 0e+o'ra.#i&ara&teristi&s (i*e*/ age, sex_Dummy,educ, sibs, agewed, incomdol)/ as "ell as

     ?o satisa&tion (satjob-2)/ an0 +arria'esatisa&tion (hapmar-2) .la in 0eter+inin'one>s oerall #a..iness in lie (#a..-2)*

    Use t#e 'ss*2 0ata

  • 8/20/2019 Simple and Multiple Regression Analysis

    45/48

    Eer&ise 4: S,..ose "e are intereste0in ;no"in' "#at role/ i an/ t#e ollo"in'0e+o'ra.#i& ara&teristi&s .la in0eter+inin' one>s in&o+e (rin&+0ol):

     Age,ex_Dummy  !"#male, 1#$emale%,

    age &rst married ! agewed %,

    'ears o$ education completed ! educ %, and

    (olitical party a)liation--republic!"#Democrat, 1#*epublican% *

    Use t#e 'ss*2 0ata

  • 8/20/2019 Simple and Multiple Regression Analysis

    46/48

     Assignment 5

    'ata file 9alar%.sav contains information about ACA emplo%ees hired b% a idwestern ban!

     between 1LBL and 1LC1 (78&E- 'ue to 9H99 site license restrictions# this h%perlin! will

    not wor! if %ou are off campus. 8f the ACA emplo%ees# 2D> were men# 21B women# 3Cwhite# and 1A nonIwhite. &he ban! was subseuentl% involved in EE8* litigationU the

     ban! was accused of gender and race discrimination in its hiring and compensation

     practices. &he two issues that were of particular interest in the litigation were alleged

    gender and racial ineualities not onl% in the ban!Ns beginning salaries (variable salbeg#

     but also in its later salaries (variable salnow.

    1. Hrint# examine# and interpret correlation coefficients between beginning salar%(salbeg and age in years (age#  education in years (edlevel# employment category or 6ob

    classification levelIIrated from 1=lowest to >=highest (6obcat# and work experience in

    months (wor!.

    2. *onduct the appropriate anal%sis to see- (a 0hat role each of the variables age#

    education (edlevel# employment category (6obcat# and work experience (wor! pla%ed#

    holding all other variables constant# in determining the ban!Ns beginning salaries ?orexample# what was the differential pa% for one additional %ear of education among new

    hires who otherwise had the same age# emplo%ment categor%# and wor! experience (b

    0hich of the above demographic characteristics had the strongest influence on beginning

     pa% ow can %ou tell (c 0hat percent of the differences in emplo%eesN beginning

    salaries can be explained b%/attributed to difference in all of the above characteristics

    A i t 5

    http://www.cob.ilstu.edu/udrive/MQM/MQM%20497/Hemmasi/MQM497_Data_Files/SALARY.savhttp://www.cob.ilstu.edu/udrive/MQM/MQM%20497/Hemmasi/MQM497_Data_Files/SALARY.sav

  • 8/20/2019 Simple and Multiple Regression Analysis

    47/48

     Assignment 5

    3. 7ow conduct the appropriate anal%sis to indicate# holding all other variables

    constant# what roles gender  (sex# male=! female=" pla%ed in determining beginning

    salaries at the ban!. &hat is# what was the differential beginning pa% between male andfemale emplo%ees who otherwise had the same age# education# emplo%ment categor%# and

    wor! experience 'oes this evidence support the charges of gender discrimination in the

     ban!Ns practices regarding initial compensation

    A. 'uring litigation# it was charged that the ban!Ns unfair compensation practices had

    continued be%ond its initial salar% decisions. &hat is# the prosecution claimed that with

    time# not onl% the beginning salar% disparities between men and women did not shrin!# butfurther widened. *onduct the appropriate anal%sis to indicate (a ever%thing else being

    eual# what roles gender pla%ed in determining emplo%eesN later salaries at the ban!

    (salnow. &hat is# what was the average differential pa% between male and female

    emplo%ees who otherwise had the same age# education# employment category# work

    experience# and #ob seniority $variable time represents seniority in terms of number of

    months employed at the bank% (b *ompare the later pa% disparities %ou have 6ustidentified with the beginning pa% disparities %ou had found in uestion 3 above to explain

    if the evidence supports the prosecutionNs charges of continued gender discrimination

     be%ond initial salar% decisions# resulting in widening disparities in later pa%.

     78&E- ?or each uestion# provide thorough explanations on corresponding pages and

     parts of %our printout.

  • 8/20/2019 Simple and Multiple Regression Analysis

    48/48

     Simple and Multiple Regression Analysis 

    +./00*

    0.4