topic 6: estimation and prediction of y h
DESCRIPTION
Topic 6: Estimation and Prediction of Y h. Outline. Estimation and inference of E(Y h ) Prediction of a new observation Construction of a confidence band for the entire regression line. Estimation of E(Y h ). - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/1.jpg)
Topic 6: Estimation and Prediction of Yh
![Page 2: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/2.jpg)
Outline
• Estimation and inference of E(Yh)
• Prediction of a new observation
• Construction of a confidence band for the entire regression line
![Page 3: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/3.jpg)
Estimation of E(Yh)
• E(Yh) = μh = β0 + β1Xh, the mean value of Y for the subpopulation with X=Xh
• We will estimate E(Yh) by
• KNNL use for this estimate, see equation (2.28) on pp 52
^
h 0 1ˆ hb b X hY
![Page 4: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/4.jpg)
Theory for Estimation of E(Yh)
• is Normal with mean μh and variance
• The Normality is a consequence of the fact that b0 + b1Xh is a linear combination of Yi’s
• See KNNL pp 52-54 for details
2
h2 22
i
X X1ˆ
n X Xh
h
![Page 5: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/5.jpg)
Application of the Theory• We estimate σ2( ) by
• It then follows that
• Details for confidence intervals and significance tests are consequences
h2
2 2 hh 2
i
X X1ˆ( )
n X X
( )( )
s s
h h
h
ˆ E(Y )~ t n 2
ˆ( )( )
s
![Page 6: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/6.jpg)
95% Confidence Interval for E(Yh)
• ± tcs( )
where tc = t(.975, n-2)
• NOTE: significance tests can be constructed but they are rarely used in practice
h h
![Page 7: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/7.jpg)
Toluca Company Example (pg 19)
• Manufactures refrigeration equipment
• One replacement part manufactured in lots of varying sizes
• Company wants to determine the optimum lot size
• To do this, company needs to first describe the relationship between work hours and lot size
![Page 8: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/8.jpg)
hours
100
200
300
400
500
600
lotsize
20 30 40 50 60 70 80 90 100 110 120
Scatterplot w/ regr line
![Page 9: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/9.jpg)
***Generating the data set***;data toluca; infile ‘../data/CH01TA01.txt'; input lotsize hours;data other; size=65; output; size=100; output;data toluca1; set toluca other;proc print data=toluca1; run;
SAS CODE
![Page 10: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/10.jpg)
***Generating the confidence intervals for all values of X in the data set***;
proc reg data=toluca1; model hours=size/clm; id lotsize;run;
SAS CODE
clm option generates confidence intervals for the
mean
![Page 11: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/11.jpg)
Parameter Estimates
Variable DFParameter
EstimateStandard
Error t Value Pr > |t|Intercept 1 62.36586 26.17743 2.38 0.0259
lotsize 1 3.57020 0.34697 10.29 <.0001
Output Statistics
Obs lotsizeDependent
VariablePredicted
ValueStd Error
Mean Predict 95% CL Mean1 80 399.0000 347.9820 10.3628 326.5449 369.4191
25 70 323.0000 312.2800 9.7647 292.0803 332.479726 65 . 294.4290 9.9176 273.9129 314.945127 100 . 419.3861 14.2723 389.8615 448.9106
![Page 12: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/12.jpg)
Notes
• Standard error affected by how far Xh is from (see Figure 2.6)
• Recall teeter-totter idea…a change in the slope has bigger impact on Y as you move away from
X
X
![Page 13: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/13.jpg)
Prediction of Yh(new)
• Want to predict value for a new observation at X=Xh
• Model: Yh(new) = β0 + β1Xh +
• Since E(e)=0 same value as for E(Yh)
• Prediction interval, however, relies heavily on assumption that e are Normally distributed
Note!!
![Page 14: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/14.jpg)
Prediction of Yh(new)
• Var(Yh(new))=Var( )+Var(ξ )
• Then follows that
2
h2 22
i
X X1pred 1
n X X( )s s
h
( ) hˆ( ) / (pred) ~ ( 2)h newY s t n
![Page 15: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/15.jpg)
Notes
• Procedure can be modified for the mean of m observations at X=Xh (see 2.39a and 239b on page 60)
• Standard error affected by how far Xh is from (see Figure 2.6)X
![Page 16: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/16.jpg)
***Generating the prediction intervals for all values of X in data set***;
proc reg data=toluca1; model hours=lotsize/cli; id lotsize;run;
SAS CODE
cli option generates prediction interval for a
new observation
![Page 17: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/17.jpg)
Output Statistics
Obs lotsizeDependent
VariablePredicted
ValueStd Error
Mean Predict 95% CL Predict1 80 399.0000 347.9820 10.3628 244.7333 451.2307
25 70 323.0000 312.2800 9.7647 209.2811 415.278926 65 . 294.4290 9.9176 191.3676 397.490427 100 . 419.3861 14.2723 314.1604 524.6117
These are wrong…same as before. Does not include variability about regression line
![Page 18: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/18.jpg)
Notes
• The standard error (Std Error Mean Predict)given in this output is the standard error of not s(pred)
• The prediction interval is correct and wider than the previous confidence interval
h
![Page 19: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/19.jpg)
Notes
• To get correct standard error need to add the variance about the regression line
2 ˆ( ) ( )hs pred s MSE
![Page 20: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/20.jpg)
Confidence band for regression line
• ± Ws( )
where W2=2F(1-α; 2, n-2) • This gives combined “confidence intervals”
for all Xh
• Boundary values of confidence bands define a hyperbola
• Will be wider at Xh than single CI
h h
![Page 21: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/21.jpg)
Confidence band for regression line
• Theory comes from the joint confidence region for (β0, β1 ) which is an ellipse (Stat 524)
• We can find an alpha for tc that gives the same results
• We find W2 and then find the alpha for tc
that will give W = tc
![Page 22: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/22.jpg)
data a1; n=25; alpha=.10; dfn=2; dfd=n-2; tsingle=tinv(1-alpha/2,dfd); w2=2*finv(1-alpha,dfn,dfd); w=sqrt(w2); alphat=2*(1-probt(w,dfd)); t_c=tinv(1-alphat/2,dfd); output;proc print data=a1;run;
SAS CODE
![Page 23: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/23.jpg)
n alpha dfn dfd tsingle w2 w alphat t_c25 0.1 2 23 1.71387 5.09858 2.25800 0.033740 2.25800
SAS OUTPUT
Used for single 90% CI
Used for 90%
confidence band
![Page 24: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/24.jpg)
symbol1 v=circle i=rlclm97;
proc gplot data=toluca; plot hours*lotsize; run;
SAS CODE
![Page 25: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/25.jpg)
hours
100
200
300
400
500
600
lotsize
20 30 40 50 60 70 80 90 100 110 120
![Page 26: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/26.jpg)
Estimation of E(Yh) and Prediction of Yh
• = b0 + b1Xh
•
•
h
2i
2h22
2i
2h2
h2
)()(
)()(
XX
XX
n
11pred)
XX
XX
n
1)(
(s
ˆs
![Page 27: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/27.jpg)
symbol1 v=circle i=rlclm95; proc gplot data=toluca; plot hours*lotsize;
symbol1 v=circle i=rlcli95; proc gplot data=toluca; plot hours*lotsize;
run;
SAS CODE
Confidence intervals
Prediction intervals
![Page 28: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/28.jpg)
hours
100
200
300
400
500
600
lotsize
20 30 40 50 60 70 80 90 100 110 120
Confidence band
![Page 29: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/29.jpg)
hours
100
200
300
400
500
600
lotsize
20 30 40 50 60 70 80 90 100 110 120
Confidence intervals
![Page 30: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/30.jpg)
hours
100
200
300
400
500
600
lotsize
20 30 40 50 60 70 80 90 100 110 120
Prediction intervals
![Page 31: Topic 6: Estimation and Prediction of Y h](https://reader033.vdocuments.us/reader033/viewer/2022051517/56815a89550346895dc7fced/html5/thumbnails/31.jpg)
Background Reading
• Program topic6.sas has the code for the various plots and calculations;
• Sections 2.7, 2.8, and 2.9