15103903 multiple regression (1)

Upload: muhammad-haris

Post on 05-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 15103903 Multiple Regression (1)

    1/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-1

    Multiple Regression Analysisand Model Building

  • 7/31/2019 15103903 Multiple Regression (1)

    2/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-2

    Chapter Goals

    After completing this chapter, you should be ableto:

    explain model building using multiple regression

    analysis apply multiple regression analysis to business

    decision-making situations

    analyze and interpret the computer output for amultiple regression model

    test the significance of the independent variables ina multiple regression model

  • 7/31/2019 15103903 Multiple Regression (1)

    3/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-3

    Chapter Goals

    After completing this chapter, you should beable to:

    recognize potential problems in multipleregression analysis and take steps to correct theproblems

    incorporate qualitative variables into the

    regression model by using dummy variables use variable transformations to model nonlinear

    relationships

    (continued)

  • 7/31/2019 15103903 Multiple Regression (1)

    4/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-4

    The Multiple Regression Model

    Idea: Examine the linear relationship between1 dependent (y) & 2 or more independent variables (xi)

    xxxy kk22110 +++++=

    kk22110 xbxbxbby ++++=

    Population model:

    Y-intercept Population slopes Random Error

    Estimated(or predicted)value of y

    Estimated slope coefficients

    Estimated multiple regression model:

    Estimatedintercept

  • 7/31/2019 15103903 Multiple Regression (1)

    5/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-5

    Multiple Regression Model

    Two variable model

    y

    x1

    x2

    22110 xbxbby ++=

    Slopef

    orvariabl

    ex1

    Slopeforvar

    iablex2

  • 7/31/2019 15103903 Multiple Regression (1)

    6/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-6

    Multiple Regression Model

    Two variable model

    y

    x1

    x2

    22110 xbxbby ++=yi

    yi

  • 7/31/2019 15103903 Multiple Regression (1)

    7/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-7

    Multiple Regression Assumptions

    The model errors are independent andrandom

    The errors are normally distributed

    The mean of the errors is zero Errors have a constant variance

    e = (y y)

  • 7/31/2019 15103903 Multiple Regression (1)

    8/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-8

    Model Specification

    Decide what you want to do and select thedependent variable

    Determine the potential independent variables foryour model

    Gather sample data (observations) for all variables

  • 7/31/2019 15103903 Multiple Regression (1)

    9/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-9

    The Correlation Matrix

    Correlation between the dependent variable andselected independent variables can be found usingExcel: Formula Tab: Data Analysis / Correlation

    Can check for statistical significance of correlationwith a t test

  • 7/31/2019 15103903 Multiple Regression (1)

    10/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-10

    Example

    A distributor of frozen desert pies wants toevaluate factors thought to influence demand

    Dependent variable: Pie sales (units per week)

    Independent variables: Price (in $)

    Advertising ($100s)

    Data are collected for 15 weeks

  • 7/31/2019 15103903 Multiple Regression (1)

    11/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-11

    Pie Sales Model

    Sales = b0 + b1 (Price)

    + b2 (Advertising)

    Week Pie Sales Price($) Advertising($100s)

    1 350 5.50 3.3

    2 460 7.50 3.3

    3 350 8.00 3.0

    4 430 8.00 4.5

    5 350 6.80 3.06 380 7.50 4.0

    7 430 4.50 3.0

    8 470 6.40 3.7

    9 450 7.00 3.5

    10 490 5.00 4.0

    11 340 7.20 3.5

    12 300 7.90 3.2

    13 440 5.90 4.0

    14 450 5.00 3.5

    15 300 7.00 2.7

    Pie Sales Price Advertising

    Pie Sales 1

    Price -0.44327 1

    Advertising 0.55632 0.03044 1

    Correlation matrix:

    Multiple regression model:

  • 7/31/2019 15103903 Multiple Regression (1)

    12/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-12

    Interpretation of EstimatedCoefficients

    Slope (bi)

    Estimates that the average value of y changes by bi

    units for each 1 unit increase in Xi holding all other

    variables constant Example: if b1 = -20, then sales (y) is expected to

    decrease by an estimated 20 pies per week for each $1increase in selling price (x1), net of the effects of

    changes due to advertising (x2) y-intercept (b0)

    The estimated average value of y when all xi = 0

    (assuming all xi = 0 is within the range of observed

    values)

  • 7/31/2019 15103903 Multiple Regression (1)

    13/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-13

    Pie Sales Correlation Matrix

    Price vs. Sales : r = -0.44327 There is a negative association between

    price and sales

    Advertising vs. Sales : r = 0.55632 There is a positive association between

    advertising and sales

    Pie Sales Price Advertising

    Pie Sales 1

    Price -0.44327 1

    Advertising 0.55632 0.03044 1

  • 7/31/2019 15103903 Multiple Regression (1)

    14/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-14

    Scatter Diagrams

    Sales vs. Price

    0

    100

    200

    300

    400

    500

    600

    0 2 4 6 8 10

    Sales vs. Advertising

    0

    100

    200

    300

    400

    500

    600

    0 1 2 3 4 5

    Sales

    Sales

    Price

    Advertising

  • 7/31/2019 15103903 Multiple Regression (1)

    15/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-15

    Estimating a Multiple LinearRegression Equation

    Computer software is generally used to generatethe coefficients and measures of goodness of fitfor multiple regression

    Excel: Data / Data Analysis / Regression

    PHStat: Add-Ins / PHStat / Regression / Multiple Regression

  • 7/31/2019 15103903 Multiple Regression (1)

    16/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-16

    Estimating a Multiple LinearRegression Equation

    Excel: Data / Data Analysis / Regression

  • 7/31/2019 15103903 Multiple Regression (1)

    17/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-17

    Estimating a Multiple LinearRegression Equation

    PHStat: Add-Ins / PHStat / Regression / Multiple Regression

  • 7/31/2019 15103903 Multiple Regression (1)

    18/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-18

    Multiple Regression Output

    Regression Statistics

    Multiple R 0.72213

    R Square 0.52148

    Adjusted R Square 0.44172

    Standard Error 47.46341

    Observations 15

    ANOVA df SS MS F Significance F

    Regression 2 29460.027 14730.013 6.53861 0.01201

    Residual 12 27033.306 2252.776

    Total 14 56493.333

    ertising)74.131(Advce)24.975(Pri-306.526Sales +=

  • 7/31/2019 15103903 Multiple Regression (1)

    19/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-19

    The Multiple Regression Equation

    ertising)74.131(Advce)24.975(Pri-306.526Sales +=

    b1 = -24.975: sales

    will decrease, onaverage, by 24.975

    pies per week foreach $1 increase inselling price, net ofthe effects of changesdue to advertising

    b2 = 74.131: sales will

    increase, on average,by 74.131 pies per

    week for each $100increase inadvertising, net of theeffects of changesdue to price

    whereSales is in number of pies per week

    Price is in $Advertising is in $100s.

  • 7/31/2019 15103903 Multiple Regression (1)

    20/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-20

    Using The Model to MakePredictions

    Predict sales for a week in which the sellingprice is $5.50 and advertising is $350:

    Predicted salesis 428.62 pies

    428.62

    (3.5)74.131(5.50)24.975-306.526

    ertising)74.131(Advce)24.975(Pri-306.526Sales

    =

    +=

    +=

    Note that Advertising isin $100s, so $350means that x2 = 3.5

  • 7/31/2019 15103903 Multiple Regression (1)

    21/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-21

    Predictions in PHStat

    PHStat | regression | multiple regression

    Check theconfidence andprediction intervalestimates box

  • 7/31/2019 15103903 Multiple Regression (1)

    22/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-22

    Input values

    Predictions in PHStat(continued)

    Predicted y value

  • 7/31/2019 15103903 Multiple Regression (1)

    23/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-23

    Multiple Coefficient ofDetermination (R2)

    Reports the proportion of total variation in yexplained by all x variables taken together

    squaresofsumTotal

    regressionsquaresofSum

    SST

    SSRR2 ==

  • 7/31/2019 15103903 Multiple Regression (1)

    24/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-24

    Regression Statistics

    Multiple R 0.72213

    R Square 0.52148

    Adjusted R Square 0.44172

    Standard Error 47.46341

    Observations 15

    ANOVA df SS MS F Significance F

    Regression 2 29460.027 14730.013 6.53861 0.01201

    Residual 12 27033.306 2252.776

    Total 14 56493.333

    .5214856493.3

    29460.0

    SST

    SSRR2 ===

    52.1% of the variation in pie sales

    is explained by the variation inprice and advertising

    Multiple Coefficient ofDetermination

    (continued)

  • 7/31/2019 15103903 Multiple Regression (1)

    25/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-25

    Adjusted R2

    R2 never decreases when a new x variable isadded to the model This can be a disadvantage when comparing

    models What is the net effect of adding a new variable?

    We lose a degree of freedom when a new xvariable is added

    Did the new x variable add enough explanatorypower to offset the loss of one degree offreedom?

  • 7/31/2019 15103903 Multiple Regression (1)

    26/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-26

    Shows the proportion of variation in y explained byall x variables adjusted for the number of xvariablesused

    (where n = sample size, k = number of independent variables)

    Penalize excessive use of unimportant independentvariables

    Smaller than R2

    Useful in comparing among models

    Adjusted R2

    (continued)

    =

    1kn1n)R1(1R 22A

  • 7/31/2019 15103903 Multiple Regression (1)

    27/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-27

    Regression Statistics

    Multiple R 0.72213

    R Square 0.52148

    Adjusted R Square 0.44172

    Standard Error 47.46341

    Observations 15

    ANOVA df SS MS F Significance F

    Regression 2 29460.027 14730.013 6.53861 0.01201

    Residual 12 27033.306 2252.776

    Total 14 56493.333

    .44172R2A =44.2% of the variation in pie sales isexplained by the variation in price and

    advertising, taking into account the samplesize and number of independent variables

    Multiple Coefficient ofDetermination

    (continued)

  • 7/31/2019 15103903 Multiple Regression (1)

    28/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-28

    Is the Model Significant?

    F-Test for Overall Significance of the Model

    Shows if there is a linear relationship between all of

    the x variables considered together and y

    Use F test statistic

    Hypotheses:

    H0:

    1=

    2= =

    k= 0 (no linear relationship)

    HA: at least one i 0 (at least one independentvariable affects y)

  • 7/31/2019 15103903 Multiple Regression (1)

    29/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-29

    F-Test for Overall Significance

    Test statistic:

    where F has (numerator) D1 = k and

    (denominator) D2 = (n k 1)

    degrees of freedom

    (continued)

    MSE

    MSR

    1kn

    SSEk

    SSR

    F =

    =

  • 7/31/2019 15103903 Multiple Regression (1)

    30/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-30

    6.53862252.8

    14730.0

    MSE

    MSRF ===

    Regression Statistics

    Multiple R 0.72213

    R Square 0.52148

    Adjusted R Square 0.44172

    Standard Error 47.46341

    Observations 15

    ANOVA df SS MS F Significance F

    Regression 2 29460.027 14730.013 6.53861 0.01201

    Residual 12 27033.306 2252.776

    Total 14 56493.333

    (continued)

    F-Test for Overall Significance

    With 2 and 12 degreesof freedom P-value forthe F-Test

  • 7/31/2019 15103903 Multiple Regression (1)

    31/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-31

    H0: 1 = 2 = 0

    HA: 1 and 2 not both zero

    = .05

    df1= 2 df2 = 12

    Test Statistic:

    Decision:

    Conclusion:

    Reject H0 at = 0.05

    The regression model does explaina significant portion of thevariation in pie sales

    (There is evidence that at least oneindependent variable affects y )

    0 = .05

    F.05

    = 3.885Reject H0Do not

    reject H0

    6.5386MSE

    MSRF ==

    CriticalValue:

    F

    = 3.885

    F-Test for Overall Significance(continued)

    F

    A I di id l V i bl

  • 7/31/2019 15103903 Multiple Regression (1)

    32/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-32

    Are Individual VariablesSignificant?

    Use t-tests of individual variable slopes

    Shows if there is a linear relationship between the

    variable xi and y Hypotheses:

    H0: i = 0 (no linear relationship)

    HA: i 0 (linear relationship does existbetween xi and y)

    A I di id l V i bl

  • 7/31/2019 15103903 Multiple Regression (1)

    33/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-33

    Are Individual VariablesSignificant?

    H0: i = 0 (no linear relationship)

    HA: i 0 (linear relationship does exist

    between xi and y )

    Test Statistic:

    (df = n k 1)

    ib

    i

    s0bt =

    (continued)

    A I di id l V i bl

  • 7/31/2019 15103903 Multiple Regression (1)

    34/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-34

    Regression Statistics

    Multiple R 0.72213

    R Square 0.52148

    Adjusted R Square 0.44172

    Standard Error 47.46341

    Observations 15

    ANOVA df SS MS F Significance F

    Regression 2 29460.027 14730.013 6.53861 0.01201

    Residual 12 27033.306 2252.776

    Total 14 56493.333

    t-value for Price is t = -2.306, withp-value .0398

    t-value for Advertising is t = 2.855,

    with p-value .0145

    (continued)

    Are Individual VariablesSignificant?

    I f b t th Sl

  • 7/31/2019 15103903 Multiple Regression (1)

    35/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-35

    d.f. = 15-2-1 = 12

    = .05

    t /2 = 2.1788

    Inferences about the Slope:tTest Example

    H0: i = 0

    HA: i 0

    The test statistic for each variable falls inthe rejection region (p-values < .05)

    There is evidence that bothPrice and Advertising affectpie sales at = .05

    From Excel output:

    Reject H0 for each variable

    Coefficients Standard Error t Stat P-value

    Price -24.97509 10.83213 -2.30565 0.03979

    Advertising 74.13096 25.96732 2.85478 0.01449

    Decision:

    Conclusion:Reject H0Reject H0

    /2=.025

    -t/2Do not reject H0

    0t/2

    /2=.025

    -2.1788 2.1788

    C fid I t l E ti t

  • 7/31/2019 15103903 Multiple Regression (1)

    36/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-36

    Confidence Interval Estimatefor the Slope

    Confidence interval for the population slope 1(the effect of changes in price on pie sales):

    Example: Weekly sales are estimated to be reduced bybetween 1.37 to 48.58 pies for each increase of $1 in theselling price

    ib2/istb

    Coefficients Standard Error Lower 95% Upper 95%

    Intercept 306.52619 114.25389 57.58835 555.46404

    Price -24.97509 10.83213 -48.57626 -1.37392

    Advertising 74.13096 25.96732 17.55303 130.70888

    where t has(n k 1) d.f.

    St d d D i ti f th

  • 7/31/2019 15103903 Multiple Regression (1)

    37/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-37

    Standard Deviation of theRegression Model

    The estimate of the standard deviation of theregression model is:

    MSEkn

    SSEs =

    =

    1

    Is this value large or small? Must compare to themean size of y for comparison

    St d d D i ti f th

  • 7/31/2019 15103903 Multiple Regression (1)

    38/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-38

    Regression Statistics

    Multiple R 0.72213

    R Square 0.52148

    Adjusted R Square 0.44172

    Standard Error 47.46341

    Observations 15

    ANOVA df SS MS F Significance F

    Regression 2 29460.027 14730.013 6.53861 0.01201

    Residual 12 27033.306 2252.776

    Total 14 56493.333

    The standard deviation of theregression model is 47.46

    (continued)

    Standard Deviation of theRegression Model

    St d d D i ti f th

  • 7/31/2019 15103903 Multiple Regression (1)

    39/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-39

    The standard deviation of the regression model is47.46

    A rough prediction range for pie sales in a given

    week is Pie sales in the sample were in the 300 to 500 per

    week range, so this range is probably too large to

    be acceptable. The analyst may want to look foradditional variables that can explain more of thevariation in weekly sales

    (continued)

    Standard Deviation of theRegression Model

    94.22(47.46) =

  • 7/31/2019 15103903 Multiple Regression (1)

    40/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-40

    Multicollinearity

    Multicollinearity: High correlation exists

    between two independent variables

    This means the two variables contributeredundant information to the multiple regression

    model

  • 7/31/2019 15103903 Multiple Regression (1)

    41/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-41

    Multicollinearity

    Including two highly correlated independent

    variables can adversely affect the regression

    results

    No new information provided

    Can lead to unstable coefficients (large

    standard error and low t-values)

    Coefficient signs may not match prior

    expectations

    (continued)

    S I di ti f S

  • 7/31/2019 15103903 Multiple Regression (1)

    42/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-42

    Some Indications of SevereMulticollinearity

    Incorrect signs on the coefficients Large change in the value of a previous

    coefficient when a new variable is added to the

    model A previously significant variable becomes

    insignificant when a new independent variableis added

    The estimate of the standard deviation of themodel increases when a variable is added tothe model

  • 7/31/2019 15103903 Multiple Regression (1)

    43/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-43

    Qualitative (Dummy) Variables

    Categorical explanatory variable (dummyvariable) with two or more levels: yes or no, on or off, male or female

    coded as 0 or 1 Regression intercepts are different if the

    variable is significant Assumes equal slopes for other variables The number of dummy variables needed is

    (number of levels 1)

    Dummy Variable Model Example

  • 7/31/2019 15103903 Multiple Regression (1)

    44/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-44

    Dummy-Variable Model Example(with 2 Levels)

    Let:

    y = pie sales

    x1 = price

    x2 = holiday (X2 = 1 if a holiday occurred during the week)(X2 = 0 if there was no holiday that week)

    210 xbxbby 21 ++=

    Dummy Variable Model Example

  • 7/31/2019 15103903 Multiple Regression (1)

    45/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-45

    Sameslope

    Dummy-Variable Model Example(with 2 Levels)

    (continued)

    x1 (Price)

    y (sales)

    b0 + b2

    b0

    1010

    12010

    xbb(0)bxbby

    xb)b(b(1)bxbby

    121

    121

    +=++=

    ++=++= Holiday

    No Holiday

    Differentintercept

    Holiday

    NoHoliday

    If H0: 2 = 0 is

    rejected, then

    Holiday has asignificant effecton pie sales

    Interpreting the Dummy Variable

  • 7/31/2019 15103903 Multiple Regression (1)

    46/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-46

    Sales: number of pies sold per week

    Price: pie price in $

    Holiday:

    Interpreting the Dummy VariableCoefficient (with 2 Levels)

    Example:

    1 If a holiday occurred during the week

    0 If no holiday occurred

    b2 = 15: on average, sales were 15 pies greater inweeks with a holiday than in weeks without aholiday, given the same price

    )15(Holiday30(Price)-300Sales +=

    Dummy Variable Models

  • 7/31/2019 15103903 Multiple Regression (1)

    47/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-47

    Dummy-Variable Models(more than 2 Levels)

    The number of dummy variables is one less thanthe number of levels

    Example:

    y = house price ; x1 = square feet

    The style of the house is also thought to matter:

    Style = ranch, split level, condo

    Three levels, so two dummyvariables are needed

    Dummy Variable Models

  • 7/31/2019 15103903 Multiple Regression (1)

    48/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-48

    Dummy-Variable Models(more than 2 Levels)

    =

    =

    notif0

    levelsplitif1x

    notif0

    ranchif1x 32

    3210 xbxbxbby 321 +++=

    b2 shows the impact on price if the house is aranch style, compared to a condo

    b3 shows the impact on price if the house is a split

    level style, compared to a condo

    (continued)

    Let the default category be condo

    Interpreting the Dummy Variable

  • 7/31/2019 15103903 Multiple Regression (1)

    49/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-49

    Interpreting the Dummy VariableCoefficients (with 3 Levels)

    With the same square feet, aranch will have an estimatedaverage price of 23.53thousand dollars more than acondo

    With the same square feet, aranch will have an estimatedaverage price of 18.84thousand dollars more than acondo.

    Suppose the estimated equation is

    321 18.84x23.53x0.045x20.43y +++=

    18.840.045x20.43y 1 ++=

    23.530.045x20.43y1

    ++=

    10.045x20.43y +=For a condo: x2 = x3 = 0

    For a ranch: x3 = 0

    For a split level: x2 = 0

  • 7/31/2019 15103903 Multiple Regression (1)

    50/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-50

    Model Building

    Goal is to develop a model with the best set ofindependent variables Easier to interpret if unimportant variables are removed Lower probability of collinearity

    Stepwise regression procedure Provide evaluation of alternative models as variables

    are added

    Best-subset approach Try all combinations and select the best using the

    highest adjusted R2 and lowest s

  • 7/31/2019 15103903 Multiple Regression (1)

    51/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-51

    Idea: develop the least squaresregression equation in steps, either

    through forward selection, backwardelimination, or through standard stepwiseregression

    Stepwise Regression

  • 7/31/2019 15103903 Multiple Regression (1)

    52/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-52

    Best Subsets Regression

    Idea: estimate all possible regression equationsusing all possible combinations of independentvariables

    Choose the best fit by looking for the highestadjusted R2 and lowest standard error s

    Stepwise regression and best subsetsregression can be performed using PHStat,

    Minitab, or other statistical software packages

  • 7/31/2019 15103903 Multiple Regression (1)

    53/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-53

    Aptness of the Model

    Diagnostic checks on the model includeverifying the assumptions of multipleregression:

    Errors are independent and random Error are normally distributed Errors have constant variance

    Each xi is linearly related to y

    )yy(ei =Errors (or Residuals) are given by

  • 7/31/2019 15103903 Multiple Regression (1)

    54/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-54

    Residual Analysis

    Non-constant variance Constant variance

    x x

    residuals

    residuals

    Not Independent Independent

    x

    residua

    ls

    x

    residua

    ls

  • 7/31/2019 15103903 Multiple Regression (1)

    55/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-55

    The Normality Assumption

    Errors are assumed to be normally distributed

    Standardized residuals can be calculated bycomputer

    Examine a histogram or a normal probability plot ofthe standardized residuals to check for normality

  • 7/31/2019 15103903 Multiple Regression (1)

    56/57

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 15-56

    Chapter Summary

    Developed the multiple regression model Tested the significance of the multiple

    regression model Developed adjusted R2

    Tested individual regression coefficients Used dummy variables

  • 7/31/2019 15103903 Multiple Regression (1)

    57/57

    Chapter Summary

    Described multicollinearity Discussed model building

    Stepwise regression Best subsets regression

    Examined residual plots to check modelassumptions

    (continued)