a first order model with one binary and one quantitative predictor variable
TRANSCRIPT
![Page 1: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/1.jpg)
A first order model with one binary and one quantitative
predictor variable
![Page 2: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/2.jpg)
Examples of binary predictor variables
• Gender (male, female)
• Smoking status (smoker, nonsmoker)
• Treatment (yes, no)
• Health status (diseased, healthy)
![Page 3: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/3.jpg)
On average, do smoking mothers have babies with lower birth weight?
• Random sample of n = 32 births.
• y = birth weight of baby (in grams)
• x1 = length of gestation (in weeks)
• x2 = smoking status of mother (yes, no)
![Page 4: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/4.jpg)
Coding the binary (two-group qualitative) predictor
• Using a (0,1) indicator variable.– xi2 = 1, if mother smokes
– xi2 = 0, if mother does not smoke
• Other terms used: – dummy variable– binary variable
![Page 5: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/5.jpg)
On average, do smoking mothers have babies with lower birth weight?
0 1
424140393837363534
3500
3000
2500
Gestation (weeks)
Wei
ght (
gram
s)
![Page 6: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/6.jpg)
A first order model with one binary and one quantitative predictor
iiii xxy 22110
where …
• yi is birth weight of baby i
• xi1 is length of gestation of baby i
• xi2 = 1, if mother smokes and xi2 = 0, if not
and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2.
![Page 7: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/7.jpg)
An indicator variable for 2 groups yields 2 response functions
If mother is a smoker (xi2 = 1):
iiii xxy 22110
11201| )(2 ixY x
If mother is a nonsmoker (xi2 = 0):
1100| 2 ixY x
![Page 8: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/8.jpg)
Interpretation of the regression coefficients
1represents the change in the mean response μY for each additional unit increase in the quantitative predictor x1 … for both groups.
2represents how much higher (or lower) the mean response function for the second group is than the one for the first group… for any value of x2.
![Page 9: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/9.jpg)
The estimated regression function
0 1
424140393837363534
3700
3200
2700
2200
Gestation (weeks)
Wei
ght (
gram
s)
The regression equation isWeight = - 2390 + 143 Gest - 245 Smoking
xy 1432390ˆ
xy 1432635ˆ
![Page 10: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/10.jpg)
The regression equation isWeight = - 2390 + 143 Gest - 245 Smoking
Predictor Coef SE Coef T PConstant -2389.6 349.2 -6.84 0.000Gest 143.100 9.128 15.68 0.000Smoking -244.54 41.98 -5.83 0.000
S = 115.5 R-Sq = 89.6% R-Sq(adj) = 88.9%
A significant difference in mean birth weights for the two groups?
11201| )(2 ixY x 1100| 2 ixY x
![Page 11: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/11.jpg)
Why not instead fit two separate regression functions?
One for the smokers and one for the nonsmokers?
![Page 12: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/12.jpg)
Using indicator variable, fitting one function to 32 data points
The regression equation isWeight = - 2390 + 143 Gest - 245 Smoking
Predictor Coef SE Coef T PConstant -2389.6 349.2 -6.84 0.000Gest 143.100 9.128 15.68 0.000Smoking -244.54 41.98 -5.83 0.000
S = 115.5 R-Sq = 89.6% R-Sq(adj) = 88.9%
![Page 13: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/13.jpg)
Using indicator variable, fitting one function to 32 data points
Predicted Values for New ObservationsNew Obs Fit SE Fit 95.0% CI 95.0% PI1 2803.7 30.8 (2740.6, 2866.8) (2559.1, 3048.3) 2 3048.2 28.9 (2989.1, 3107.4) (2804.7, 3291.8)
Values of Predictors for New ObservationsNew Obs Gest Smoking1 38.0 1.002 38.0 0.00
![Page 14: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/14.jpg)
Fitting function to 16 nonsmokers
The regression equation isWeight = - 2546 + 147 Gest
Predictor Coef SE Coef T PConstant -2546.1 457.3 -5.57 0.000Gest 147.21 11.97 12.29 0.000
S = 106.9 R-Sq = 91.5% R-Sq(adj) = 90.9%
![Page 15: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/15.jpg)
Fitting function to 16 nonsmokers
Predicted Values for New ObservationsNew Obs Fit SE Fit 95.0% CI 95.0% PI1 3047.7 26.8 (2990.3, 3105.2) (2811.3, 3284.2)
Values of Predictors for New ObservationsNew Obs Gest1 38.0
![Page 16: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/16.jpg)
Fitting function to 16 smokers
The regression equation isWeight = - 2475 + 139 Gest
Predictor Coef SE Coef T PConstant -2474.6 554.0 -4.47 0.001Gest 139.03 14.11 9.85 0.000
S = 126.6 R-Sq = 87.4% R-Sq(adj) = 86.5%
![Page 17: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/17.jpg)
Fitting function to 16 smokers
Predicted Values for New ObservationsNew Obs Fit SE Fit 95.0% CI 95.0% PI1 2808.5 35.8 (2731.7, 2885.3) (2526.4, 3090.7)
Values of Predictors for New ObservationsNew Obs Gest1 38.0
![Page 18: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/18.jpg)
Summary table
Model estimated using…
SE(Gest)Length of CI for μY
32 data points 9.128(NS) 118.3
(S) 126.2
16 nonsmokers 11.97 114.9
16 smokers 14.11 153.6
![Page 19: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/19.jpg)
Reasons to “pool” the data and to fit one regression function
• Model assumes equal slopes for the groups and equal variances for all error terms.
• It makes sense to use all of the data to estimate these quantities.
• More degrees of freedom associated with MSE, so confidence intervals that are a function of MSE tend to be narrower.
![Page 20: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/20.jpg)
How to answer the research question using one regression function?
The regression equation isWeight = - 2390 + 143 Gest - 245 Smoking
Predictor Coef SE Coef T PConstant -2389.6 349.2 -6.84 0.000Gest 143.100 9.128 15.68 0.000Smoking -244.54 41.98 -5.83 0.000
S = 115.5 R-Sq = 89.6% R-Sq(adj) = 88.9%
![Page 21: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/21.jpg)
How to answer the research question using two regression functions?
The regression equation is Weight = - 2546 + 147 Gest
Predictor Coef SE Coef T PConstant -2546.1 457.3 -5.57 0.000Gest 147.21 11.97 12.29 0.000
Nonsmokers
The regression equation is Weight = - 2475 + 139 Gest
Predictor Coef SE Coef T PConstant -2474.6 554.0 -4.47 0.001Gest 139.03 14.11 9.85 0.000
Smokers
![Page 22: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/22.jpg)
Reasons to “pool” the data and to fit one regression function
• It allows you to easily answer research questions concerning the binary predictor variable.
![Page 23: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/23.jpg)
What if we instead tried to use two indicator variables?
One variable for smokers and one variable for nonsmokers?
![Page 24: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/24.jpg)
Definition of two indicator variables – one for each group
• Using a (0,1) indicator variable for nonsmokers– xi2 = 1, if mother smokes
– xi2 = 0, if mother does not smoke
• Using a (0,1) indicator variable for smokers– xi3 = 1, if mother does not smoke
– xi3 = 0, if mother smokes
![Page 25: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/25.jpg)
The modified regression functionwith two binary predictors
3322110 iiiY xxx
where …
• μY is mean birth weight for given predictors
• xi1 is length of gestation of baby i
• xi2 = 1, if smokes and xi2 = 0, if not
• xi3 = 1, if not smokes and xi3 = 0, if smokes
![Page 26: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/26.jpg)
Implication on data analysis
Regression Analysis: Weight versus Gest, x2*, x3*
* x3* is highly correlated with other X variables* x3* has been removed from the equation
The regression equation isWeight = - 2390 + 143 Gest - 245 x2*
Predictor Coef SE Coef T PConstant -2389.6 349.2 -6.84 0.000Gest 143.100 9.128 15.68 0.000x2* -244.54 41.98 -5.83 0.000
S = 115.5 R-Sq = 89.6% R-Sq(adj) = 88.9%
![Page 27: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/27.jpg)
To prevent problems with the data analysis
• A qualitative variable with c groups should be represented by c-1 indicator variables, each taking on values 0 and 1.– 2 groups, 1 indicator variable– 3 groups, 2 indicator variables– 4 groups, 3 indicator variables– and so on…
![Page 28: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/28.jpg)
What is the impact of using a different coding scheme?
… such as (1, -1) coding?
![Page 29: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/29.jpg)
The regression model defined using (1, -1) coding scheme
iiii xxy 22110
where …
• yi is birth weight of baby i
• xi1 is length of gestation of baby i
• xi2 = 1, if mother smokes and xi2 = -1, if not
and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2.
![Page 30: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/30.jpg)
The regression model yields 2 different response functions
If mother is a smoker (xi2 = 1):
iiii xxy 22110
1120 )( iY x
If mother is a nonsmoker (xi2 = -1):
1120 iY x
![Page 31: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/31.jpg)
Interpretation of the regression coefficients
1represents the change in the mean response μY for each additional unit increase in the quantitative predictor x1 … for both groups.
0 represents the “average” intercept
2 represents how far each group is “offset” from the “average”
![Page 32: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/32.jpg)
The estimated regression function
-1
1
34 35 36 37 38 39 40 41 42
2200
2700
3200
3700
Gestation (weeks)
We
ight
(gr
am
s)
The regression equation isWeight = - 2512 + 143 Gest - 122 Smoking2
xy 1432390ˆ
xy 1432635ˆ
![Page 33: A first order model with one binary and one quantitative predictor variable](https://reader033.vdocuments.us/reader033/viewer/2022061616/5697c0151a28abf838ccdedf/html5/thumbnails/33.jpg)
What is impact of using different coding scheme?
• Interpretation of regression coefficients changes.
• When reporting your results, make sure you explain what coding scheme was used!
• When interpreting others’ results, make sure you know what coding scheme was used!