![Page 1: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/1.jpg)
PubH 7405: REGRESSION ANALYSIS
SLR: INFERENCES, Part II
![Page 2: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/2.jpg)
We cover the topic of inference in two sessions; the first session focused on inferences concerning the slope and the intercept; this is a continuation on estimating the mean response – and more. Applications concerning the slope and the intercept are based on the following four (4) theorems
![Page 3: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/3.jpg)
SAMPLING DISTRIBUTION OF SLOPE
∑ −=
=
∈
++=
2_
2
12
11
210
)()b(
)E(b:
),0(
:Model" RegressionError Normal" Under the
xx
NxY
σσ
β
σε
εββ
Variance and Mean with Normal is bslope estimated the of ondistributi sampling The
1
Theorem 1A:
![Page 4: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/4.jpg)
IMPLICATION
)()(
)()( 1
1
1
11
1
11
bbs
bb
bsb
σσββ
÷−
=−
222
1−=− ndfn
χdistributed as N(0,1)
freedom of degrees 2)(n with t"" as ddistribute is )( 1
11 −−bs
b β:1B Theorem
![Page 5: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/5.jpg)
CONFIDENCE INTERVALS
freedom of degrees 2)(n with t"" as ddistribute is )( 1
11 −−bs
b β:1B Theorem
freedom of degrees 2)-(non with distributi t"" theof percentile α/2)100(1 theis 2)nα/2;t(1
)b()2;2/1(b:is
11
−−−−−±
−snt α
1βfor Interval Confidence α)100%(1
![Page 6: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/6.jpg)
SAMPLING DISTRIBUTION OF INTERCEPT
−+=
=
∈
++=
∑ 2_
2_
20
2
00
210
)(
1)b(
)E(b:
),0(
:Model" RegressionError Normal" Under the
xx
xn
NxY
σσ
β
σε
εββ
Variance and Mean with Normal is bintercept estimated the of ondistributi sampling The
0
Theorem 2A:
![Page 7: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/7.jpg)
IMPLICATION
)()(
)()( 0
0
0
00
0
00
bbs
bb
bsb
σσββ
÷−
=−
222
1−=− ndfn
χdistributed as N(0,1)
freedom of degrees 2)(n with t"" as ddistribute is )( 0
00 −−bs
b β:2B Theorem
![Page 8: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/8.jpg)
CONFIDENCE INTERVALS
freedom of degrees 2)(n with t"" as ddistribute is )( 0
00 −−bs
b β:2B Theorem
freedom of degrees 2)-(non with distributi t"" theof percentile α/2)100(1 theis 2)nα/2;t(1
)b()2;2/1(b:is
00
−−−−−±
−snt α
0βfor Interval Confidence α)100%(1
![Page 9: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/9.jpg)
xxXYE 10)|(:ResponseMean Theββ +==
A common objective in regression analysis is to estimate the mean response. For example: (1) we are interested to know the average blood pressure for women at certain age and how estimate it using the relationship between SBP and Age, and (2) in a study of the relationship between level of pay (salary, X) and worker productivity (Y), the mean productivity at high, medium, and low levels of pay may be of particular interest for any company.
![Page 10: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/10.jpg)
POINT ESTIMATE
xxXYE 10)|(:ResponseMean Theββ +==
Let X = xh denote the level of X for which we wish to estimate the mean response, i.e. E(Y|X=xh); this xh may be a value which occurred in the sample, or it may be some other value of the predictor variable within the scope of the model. The point estimate of the response is:
h
hh
xbbYxXYE
10
^)|(
:
+===
EstimatePoint
![Page 11: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/11.jpg)
SAMPLING DISTRIBUTION
−
−+=
+===
∈
++=
∑ 2_
2_
2h
^2
10h
^
210
)(
)(1)Y(
)|()YE(
:
),0(
:Model" RegressionError Normal" Under the
xx
xxn
xxXYE
NxY
h
hh
σσ
ββ
σε
εββ
Variance and Mean with Normal is Y Response
Mean estimated the of ondistributi sampling The
:#3A Theorem
h^
![Page 12: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/12.jpg)
iih
hh
ii
ii
ykxxn
xY
yk
ykxn
∑
∑
∑
−+=
+=
=
−=
−
)(1bb
b
1b
_
10
^
1
0
![Page 13: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/13.jpg)
The sampling distribution of Ŷh is “normal” because this estimated mean response, like the intercept and the slope, Ŷh is a linear combination of the observations yi and the distribution of each observation is normal under the “normal error regression model”:
![Page 14: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/14.jpg)
The estimated mean response is unbiased because the estimated intercept and estimated slope are both unbiased:
)|(
)()()(
10
10
^10
^
h
h
hh
hh
xXYEx
bExbEYE
xbbY
==+=
+=
+=
ββ
![Page 15: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/15.jpg)
−
−+=
−+−+=
−+−+=
−+=
−+=
∑
∑ ∑
∑
∑
∑
2_
2_
2
22__
2
22__
22
22_^
_^
)(
)(1
)()(21
)()(121
)(1)(
)(1
xx
xxn
kxxkxxnn
kxxkxxnn
kxxn
YVar
ykxxn
Y
i
h
ihih
ihih
ihh
iihh
σ
σ
σ
σ
![Page 16: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/16.jpg)
−
−+=
−
−+=
∑
∑
2_
2_
^2
2_
2_
2^
)(
)(1)(
)(
)(1)(
xx
xxn
MSEYs
xx
xxn
YVar
i
hh
i
hh σ
Taking square root to get Standard Error
![Page 17: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/17.jpg)
−
−+=
−
−+=
∑
∑
2_
i
2_
hh
^
)x(x
)x(xn1MSE)YSE(
2_
2_
^2
)(
)(1)(xx
xxn
MSEYsi
hh
Implication:
Our estimates are less precise toward the ends
![Page 18: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/18.jpg)
MORE ON SAMPLING DISTRIBUTION
)(
)(
)(
)(
)(
)(^
^
^
^
^
^
h
h
h
hh
h
hh
Y
Ys
Y
YEY
Ys
YEY
σσ÷
−=
−
222
1−=− ndfn
χdistributed as N(0,1)
freedom of degrees 2)(n with t"" as ddistribute is )(
)(^
^
−−
h
hh
Ys
YEY
:#3B Theorem
![Page 19: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/19.jpg)
CONFIDENCE INTERVALS
freedom of degrees 2)(n with t"" as ddistribute is)(
)(^
^
−−
h
hh
Ys
YEY
:#3B Theorem
freedom of degrees 2)-(non with distributi t"" theof percentile α/2)100(1 theis 2)nα/2;t(1
)()2;2/1(
:is ^^
−−−−−±
−
hh YsntY α
h^Yfor Interval Confidence α)100%(1
![Page 20: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/20.jpg)
x (oz) y (%)112 63111 66107 72119 5292 7580 11881 12084 114
118 42106 72103 9094 91
EXAMPLE #1: Birth weight data: Intercept = 256.972 Slope = -1.737 MSE = 75.982 Mean of X = 100.58 SS of X = 2,156.913 For children with birth weight of xh = 95 ounces, the point estimate and 95% Confidence Interval for the Mean growth between 70-100 days as % of BW is:
%)83.97%,69.85(43.7)(76.91
429.7913.156,2
)58.10095(121)982.75()(
%757.91)95)(737.1(972.2562^
2
^
=±
=
−+=
=−+=
2.228
h
h
Ys
Y
![Page 21: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/21.jpg)
EXAMPLE #2: Age and SBP Age (x) SBP (y)
42 13046 11542 14871 10080 15674 16270 15180 15685 16272 15864 15581 16041 12561 15075 165
Intercept = 99.958 Slope = .705 MSE = 278.554 Mean of X = 65.6 SS of X = 3403.6 For xh = 60 years old women, the point estimate and 95% Confidence Interval for the Mean SBP is:
)2.152,4.132(137.21)(3.142
137.216.3403
)6.6560(151)554.278()(
26.142)60)(705(.958.992^
2
^
=±
=
−+=
=+=
2.160
h
h
Ys
Y
![Page 22: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/22.jpg)
LotSize WorkHours80 39930 12150 22190 37670 36160 224
120 54680 352
100 35350 15740 16070 25290 38920 113
110 435100 42030 21250 26890 377
110 42130 27390 46840 24480 34270 323
EXAMPLE #3: Toluca Company Data
Intercept = 62.366 Slope = 3.570 MSE = 2,384 Mean of X = 70.0 SS of X = 19,800 For the lots’ size of xh = 65 units, the point estimate and 90% Confidence Interval for the Mean Work Hours is:
)4.311,4.277(47.98)(4.294
47.98800,19
)0.7065(251)384,2()(
4.294)65)(57.3(37.622^
2
^
=±
=
−+=
=+=
1.714
h
h
Ys
Y
![Page 23: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/23.jpg)
In regression analysis, besides estimating the mean response, sometimes one may want to estimate a new individual response. For example: (1) In addition to estimating the average blood pressure for women at certain age using the relationship between SBP and Age, we may be interested in estimating the SBP of a particular woman/patient at that age; and (2) In a study of the relationship between pay (salary, X) and worker productivity (Y), the interest may focus on the productivity of certain particular worker.
![Page 24: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/24.jpg)
POINT ESTIMATE Let X = xh denote the level of X under investigation, at which the mean response is E(Y|X=xh). Let Yh(new) be the value of the new individual response of interest. This new observation of Y to be predicted is often viewed as the result of a new trial independent of the trials on which the regression line is formed. The point estimate is still the same as that of the mean response:
)(
^10
^10)|(
newh
hh
hh
Y
xbbY
xxXYE
=
+=
+== ββ
Same as the mean
![Page 25: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/25.jpg)
VARIANCE The point estimates of the mean response and of an individual response are the same but the variances are different. In estimating an individual response, there are two layers of variation: (a) variation in the “position of the distribution” (that is of the mean response), and (b) the variation within that distribution (that is from the individual response to the mean response)
![Page 26: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/26.jpg)
normal. is Y ofon distributi sampling the
Model, RegressionError Normal" Under the:
)(
)(11
)(
)(1
)()()(
h(new)
^
2_
2_
2
2_
2_
22
^
)()(
^
#4A Theorem
−
−++=
−
−++=
+=
∑
∑
xx
xxn
xx
xxn
YVarYVarYVar
i
h
i
h
hnewhnewh
σ
σσ
![Page 27: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/27.jpg)
−
−++=
−
−++=
∑
∑
2_
2_
)(
^2
2_
2_
2)(
^
)(
)(11)(
)(
)(11)(
xx
xxn
MSEYs
xx
xxn
YVar
i
hnewh
i
hnewh σ
Taking square root to get Standard Error
![Page 28: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/28.jpg)
MORE ON SAMPLING DISTRIBUTION
freedom of degrees 2)(n with t"" as ddistribute is )(
:
)(
^
^
)(
^
−−
newh
hnewh
Ys
YY
#4B Theorem
Inferences on a new individual response is based on the following results:
![Page 29: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/29.jpg)
−
−++=
−
−++=
∑
∑
2_
i
2_
hh(new)
^
)x(x
)x(xn11MSE)YSE(
2_
2_
)(
^2
)(
)(11)(xx
xxn
MSEYsi
hnewh
Again:
Our estimates are less precise toward the ends
![Page 30: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/30.jpg)
{ }
MSE
NxY
^2
i
210
:zeromean with sample a is e),0(
:Model RegressionError Normal
=
∈
++=
σ
σε
εββ
2
22-ndf2
E(MSE)
χ as ddistribute is SSE
σσ
=
=
:#5 Theorem
![Page 31: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/31.jpg)
THE TEST FOR INDEPENDENCE
2
1
1
10
r12nrt
)s(bbt
:freedom of degrees 2)(nat test t""0β:H
−−
=
=
−=
+==
:r"" using test the toidentical iswhich
)|(:ResponseMean The
10 xxXYE ββ
The method we use most often is this “Test for Independence” which we are now approaching by a different way: ANOVA
![Page 32: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/32.jpg)
COMPONENTS OF VARIATION • The variation in Y is conventionally measured in
terms of the deviations (Yi - Y)'s; the total variation, denoted by SST, is the sum of squared deviations: SST = Σ(Yi - Y)2. For example, SST=0 when all observations are the same; SST is the numerator of the sample variance of Y, the greater SST the greater the variation among Y-values.
• In the regression analysis, the variation in Y is decomposed into two components: (Yi - Y) = (Yi - Ŷi) + (Ŷi - Y)
![Page 33: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/33.jpg)
DECOMPOSITION OF SST • In the decomposition: (Yi - Y) = (Yi - Ŷi) + (Ŷi - Y) • The first term (RHS) reflects the variation around
the regression line; the part than cannot be explained by the regression itself with the sum of squared errors SSE = Σ(Yi - Ŷi)2.
• The difference between the above two sums of squares, SSR = SST - SSE = Σ(Ŷi - Y)2, is called the regression sum of squares; SSR may be considered as a measure of the variation in Y associated with or explained by the regression line.
![Page 34: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/34.jpg)
Regression helps to “improve” the estimate of Y from Y (without any information) to Ŷ (with information provided by knowing X)
![Page 35: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/35.jpg)
SSRSSESST +=
+++=
−−+−+−=
−+−=−=
−=
∑∑∑∑∑
∑∑∑
iii eYYeSSRSSE
YYYYYYYY
YYYYYY
YYSST
_^
_^^2
_^2
^
2_^^
2_
2_
22
))((2)()(
)]()[()(
)(
0
![Page 36: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/36.jpg)
ANALYSIS OF VARIANCE • SST measures the “total variation” in the sample (of
values of the dependent variable) with (n-1) degrees of freedom, n is the sample size. It is decomposed into: SST=SSE+SSR
• (1) SSE measures the variation cannot be explained by the regression with (n-2) degrees of freedom, and
• (2) SSR measures the variation in Y associated with or explained by the regression line with 1 degree of freedom (representing the slope).
![Page 37: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/37.jpg)
∑∑
∑∑∑
−+=
−+=
=
−=
−+−=
−+=
2_
21
2
2_
211
2
2_
21
2_
1
_
1
_
2_
10
)(
)(])([
)()()(
])[(
)(
xx
xxb
SSREMSRExxb
yxbxby
yxbbSSR
βσ
βσ
Var(X) = E(X2) – {E(X)}2
E(X2) =Var(X) + {E(X)}2
![Page 38: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/38.jpg)
“ANOVA” TABLE • The breakdowns of the total sum of squares and its
associated degree of freedom are displayed in the form of an “analysis of variance table” (ANOVA table) for regression analysis as follows:
Source of Variation SS df MS F Statistic p-value Regression SSR 1 MSR MSR/MSE Error SSE n-2 MSE Total SST n-1
• Recall: MSE, the “error mean square”, serves as an estimate of the constant variance σ2 as stipulated by the regression model.
![Page 39: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/39.jpg)
∑ −+=
=
2_
21
2
2
)()(
)(
xxMSRE
MSEE
βσ
σ
Under the Null Hypothesis H0: β1 = 0, E(MSE) = E(MSR) so that F=MSR/MSE is expected to be near 1.0
Theorem 6: F is distributed, under H0, as F(1,n-2) following a theorem by Cochran.
![Page 40: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/40.jpg)
THE F-TEST
The test statistic F for the above analysis of variance approach compares MSR and MSE, a value near 1 supports the null hypothesis of independence. In fact, we have: F = t2, where t is the test statistic for testing whether or not β1=0; the F-test is equivalent to the two-sided t-test when refereed to the F-table in Appendix B (Table B.4) with (1,n-2) degrees of freedom.
![Page 41: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/41.jpg)
THE TEST FOR INDEPENDENCE
MSEMSRF
:freedom of degrees 2)n(1,at test F""r12nrt
)s(bbt
:freedom of degrees 2)(nat test t""
2
1
1
=
−−−
=
=
−
=
)2(
:r"" using test the toidentical iswhich
(1):choices identical Two
0::Hypothesis Null The
10 βH
![Page 42: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/42.jpg)
COEFFICIENT OF DETERMINATION • We can express the coefficient of determination
(the square of the coefficient of correlation r) as:
• That is the portion of total variation attributable to regression; Regression helps to “improve” the estimate of Y from Y (without any information) to Ŷ (with information provided by knowing X) – reducing the total variation by (100)(r2)%
SSTSSRr =2
![Page 43: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/43.jpg)
EXAMPLE #1: Birth Weight Data
SUMMARY OUTPUT
Regression StatisticsR Square 0.89546Observations 12
ANOVAdf SS MS F Significance F
Regression 1 6508 6508 85.66 3.21622E-06Residual 10 759.8 75.98Total 11 7268
x (oz) y (%)112 63111 66107 72119 5292 7580 11881 12084 114
118 42106 72103 9094 91
![Page 44: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/44.jpg)
EXAMPLE #2: AGE & SBP Age (x) SBP (y)
42 13046 11542 14871 10080 15674 16270 15180 15685 16272 15864 15581 16041 12561 15075 165
SUMMARY OUTPUT
Regression StatisticsR Square 0.3183Observations 15
ANOVAdf SS MS F Significance F
Regression 1 1691 1691 6.071 0.028453563Residual 13 3621 278.6Total 14 5312
![Page 45: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/45.jpg)
EXAMPLE #3: Toluca Company Data LotSize WorkHours
80 39930 12150 22190 37670 36160 224
120 54680 352
100 35350 15740 16070 25290 38920 113
110 435100 42030 21250 26890 377
110 42130 27390 46840 24480 34270 323
SUMMARY OUTPUT
Regression StatisticsR Square 0.3183Observations 15
ANOVAdf SS MS F Significance F
Regression 1 1691 1691 6.071 0.028453563Residual 13 3621 278.6Total 14 5312
![Page 46: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/46.jpg)
),0(
:ModelRegession Error Normal
210
σε
εββ
NxY
∈
++=
The normal regression model assumes that the X values are known constants. We do not impose any kind of distribution for the x-values
![Page 47: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/47.jpg)
In many cases, this is not true; for example, if we study the relationship between “height of a person” and weight of a person”, a sample of persons are taken but both measurements are random. Rather than a regression model, one should consider a “correlation model”; the most widely used is the “Bivariate Normal Distribution” with density:
![Page 48: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/48.jpg)
)])([(
),(
2)1(2
1exp12
1),(22
22
yx
xy
yx
xy
y
y
y
y
x
x
x
x
yx
YXEYXCov
YYXXYXf
µµ
σ
σσσ
ρ
σµ
σµ
σµρ
σµ
ρρσπσ
−−=
=
=
−+
−
−−
−−
−−
=
σxy is the Covariance and ρ is the Coefficient of Correlation between the two random variables X and Y; ρ is estimated by the (sample) Coefficient of Correlation r.
CORRELATION MODEL “Correlation Data” are often cross-sectional or observational. Instead of a regression model, one should consider a “correlation model”; the most widely used is the “Bivariate Normal Distribution” with density:
![Page 49: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/49.jpg)
The Coefficient of Correlation ρ between the two random variables X and Y is estimated by the (sample) Coefficient of Correlation r but the sampling distribution of r is far from being normal. Confidence intervals of is by first making the “Fisher’s z transformation”; the distribution of z is normal if the sample size is not too small
![Page 50: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/50.jpg)
CONDITIONAL DISTRIBUTION
2)21(2|
1
:x|yσdeviation standard andx 1mean with normal is Xgiven any for Y ofon distributi lconditiona The
0
0
2)1(2
1exp12
1),(22
22
yxy
x
yx
yxy
x
y
y
y
y
x
x
x
x
yx
YYXXYXf
σρσ
σ
σρβ
σ
σρµµβ
ββ
σµ
σµ
σµρ
σµ
ρρσπσ
−=
=
−=
+=
−+
−
−−
−−
−−
=
: Theorem
![Page 51: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/51.jpg)
Again, since Var(Y|X)=(1- ρ2)Var(Y), ρ is both a measure of linear association and a measure of “variance reduction” (in Y associated with knowledge of X) – that’s why we called r2, an estimate of ρ2, the “coefficient of determination”.
![Page 52: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/52.jpg)
Readings & Exercises • Readings: A thorough reading of the text’s
sections 2.4-2.5 (pp. 52-61), 2.7 (pp. 63-71), and 2.11 (pp. 78-82) is highly recommended.
• Exercises: The following exercises are good for practice, all from chapter 2 of text: 2.13, 2.23, 2.24, 2.28, and 2.29.
![Page 53: PubH 7405: BIOSTATISTICS: REGRESSIONchap/F09-InferencesPartII.pdf · PubH 7405: REGRESSION ANALYSIS SLR: INFERENCES, Part II . We cover the topic of inference in two sessions; the](https://reader036.vdocuments.us/reader036/viewer/2022062414/5fb39186758847233125ef86/html5/thumbnails/53.jpg)
Due As Homework #9.1 Refer to dataset “Cigarettes”, Y= Cotinine & X=CPD: a) Obtain the 95% confidence interval for the mean
Cotinine level for subjects who consumed X = 30 cigarettes per day and give your interpretation.
b) Obtain the 95% confidence interval for Cotinine level of a subject who consumed 30 cigarettes per day; why is the result is different from (a)?
c) Plot the residual against X; What would be your conclusion about their possible linear relationship? What would be the average residual?
d) Set up the ANOVA table and test whether or not a linear association exist between Cotinine and CPD.
#9.2 Answer the 4 questions of Exercise 9.1 using dataset “Vital Capacity” with X = Age and Y = (100)(Vital Capacity); use X= 35 years for questions (a) and (b).