11 regression
Post on 04-Mar-2016
213 Views
Preview:
DESCRIPTION
TRANSCRIPT
-
171
Stat250GundersonLectureNotes11:RegressionAnalysis
Theinvalidassumptionthatcorrelationimpliescauseisprobablyamongthetwoorthreemostseriousandcommonerrorsofhumanreasoning.
StephenJayGould,TheMismeasureofMan Describingandassessingthesignificanceofrelationshipsbetweenvariablesisveryimportantinresearch.Wewillfirstlearnhowtodothisinthecasewhenthetwovariablesarequantitative.Quantitativevariableshavenumericalvaluesthatcanbeorderedaccordingtothosevalues.MainideaWewishtostudytherelationshipbetweentwoquantitativevariables.Generallyonevariableisthe_____RESPONSE______variable,denotedbyy.Thisvariablemeasurestheoutcomeofthestudy
andisalsocalledthe_____DEPENDENT_____variable.Theothervariableisthe_____EXPLANATORY______variable,denotedbyx.Itisthevariablethatisthoughttoexplainthechangesweseeintheresponsevariable.Theexplanatoryvariableisalsocalledthe__INDEPENDENT___variable. The first step inexamining the relationship is touseagraph a scatterplot todisplay therelationship.Wewill lookforanoverallpatternandsee ifthereareanydeparturesfromthisoverallpattern.Ifalinearrelationshipappearstobereasonablefromthescatterplot,wewilltakethenextstepoffindingamodel(anequationofaline)tosummarizetherelationship.Theresultingequationmaybeusedforpredictingtheresponseforvariousvaluesoftheexplanatoryvariable.Ifcertainassumptions hold,we can assess the significance of the linear relationship andmake someconfidenceintervalsforourestimationsandpredictions. Let'sbeginwithanexamplethatwewillcarrythroughoutourdiscussions.
-
172
GraphingtheRelationship:RestaurantBillvsTipHowwelldoesthesizeofarestaurantbillpredictthetiptheserverreceives?Belowarethebillsandtipsfromsixdifferentrestaurantvisitsindollars.
Bill 41 98 25 85 50 73Tip 8 17 4 12 5 14
Response(dependent)variabley=TipAmount.Explanatory(independent)variablex=AmountoftheBill.Step1:Examinethedatagraphicallywithascatterplot.Addthepointstothescatterplotbelow:
Interpretthescatterplotintermsof...
overallform(istheaveragepatternlooklikeastraightlineorisitcurved?) directionofassociation(positiveornegative) strengthofassociation(howmuchdothepointsvaryaroundtheaveragepattern?) anydeviationsfromtheoverallform?
-
173
DescribingaLinearRelationshipwithaRegressionLineRegression analysis is the area of statistics used to examine the relationship between aquantitative response variable andoneormoreexplanatory variables.A keyelement is theestimationofanequationthatdescribeshow,onaverage,theresponsevariable isrelatedtotheexplanatoryvariables.Aregressionequationcanalsobeusedtomakepredictions.Thesimplestkindofrelationshipbetweentwovariablesisastraightline,theanalysisinthiscaseiscalledlinearregression.RegressionLineforBillvs.TipRemembertheequationofaline?y=mx+bInstatisticswedenotetheregressionlineforasampleas:where:y yhat=thepredictedyorestimatedyvalue
0b yintercept=estimatedywhenx=0(notalwaysmeaningful)1b slope=howmuchofanincrease/decreaseweexpecttoseeinywhenxincreasesby1unit.
Goal:Tofindalinethatisclosetothedatapointsfindthebestfittingline.How?Whatdowemeanbybest?Onemeasureofhowgoodalinefitsistolookattheobservederrorsinprediction.
Observederrors=_____ yy _________arecalled____ residuals__________Sowewant tochoose the line forwhich thesumofsquaresof theobservederrors (the sumof squaredresiduals)istheleast.Thelinethatdoesthisiscalled:______ LeastSquaresRegressionLine _____
Apossibleline
Observederrorifweusedthislinetopredict=yyhat
-
174
Theequationsfortheestimatedslopeandinterceptaregivenby:
x
y
XX
XY
ss
rSS
xxyxx
xxyyxx
b
221
xbyb 10
Theleastsquaresregressionline(estimatedregressionfunction)is: xbbxy y 10)( Moreonthisdistinctionlaterwhentalkaboutpredictionintervalsvs.CIsforamean.
To find thisestimated regression line forourexamdatabyhand, it iseasier ifwe setup acalculationtable.Byfillinginthistableandcomputingthecolumntotals,wewillhaveallofthemainsummariesneededtoperformacomplete linearregressionanalysis. Notethatherewehaven=6observations.Thefirstfiverowshavebeencompletedforyou.Ingeneral,useRoracalculatortohelpwiththegraphingandnumericalcomputations!x=bill y=tip xx 2xx yxx yy 2yy 41 8 4162=21 (21)2=441 (21)(8)=168 810=2 (2)2=498 17 9862=36 (36)2=1296 (36)(17)=612 1710=7 (7)2=4925 4 2562=37 (37)2=1369 (37)(4)=148 410=6 (6)2=3685 12 8562=23 (23)2=529 (23)(12)=276 1210=2 (2)2=450 5 5062=12 (12)2=144 (12)(5)=60 510=5 (5)2=2573 14 7362=11 (11)2=121 (11)(14)=154 1410=4 (4)2=16372 60 0 3900 666 0 134
x 3726
606
SlopeEstimate: 1 2666 0.17077
3900x x y
bx x
yinterceptEstimate: 0 1 10 (0.17077)(62) 10 10.5877 0.5877b y b x EstimatedRegressionLine: 0 1 0.5877 0.17077( )y b b x x
Predictasingleyatgivenx
Estimatetheaverageyforallx
-
175
Predictthetipforadinnerguestwhohada$50bill. 0 1 0.5877 0.17077(50) 7.95y b b x
Note:The5thdinnerguestinsamplehadabillof$50andtheobservedtipwas$5.Findtheresidualforthe5thobservation.Notationforaresidual 555 yye 57.95=2.95TheresidualsYou found the residual for one observation. You could compute the residual for eachobservation.Thefollowingtableshowseachresidual. x=bill y=tip
predictedvalues 0.5877 0.17077
residualsyye
Squaredresiduals 22 )( yye
41 8 6.41 1.59 2.5298 17 16.15 0.85 0.7225 4 3.68 0.32 0.1085 12 13.93 1.93 3.7350 5 7.95 2.95 8.7073 14 11.88 2.12 4.49 0 ~20.27
SSE=sumofsquarederrors(orresiduals)20.27
-
176
MeasuringStrengthandDirectionofaLinearRelationshipwithCorrelation
Thecorrelationcoefficientrisameasureofstrengthofthelinearrelationshipbetweenyandx.
PropertiesabouttheCorrelationCoefficientr 1. r rangesfrom...1to+1(anditisunitless)2. Signof r indicates...directionoftheassociation3. Magnitudeof r indicates...strength
(r=0.8andr=+0.8indicateequallystronglinearassociations)
Astrongrisdisciplinespecific r=0.8mightbeanimportant(orstrong)correlationinengineering r=0.6mightbeastrongcorrelationinpsychologyormedicalresearch
4. r ONLYmeasuresthestrengthoftheLINEARrelationship. Somepictures: Theformulaforthecorrelation:(butwewillgetitfromcomputeroutputorfromr2)TipsExample:wewillsoonseethatr=____0.9213______Interpretation: Afairlystrongpositivelinearassociationbetweenamountofthebillandtheamountoftip.
y y y
x x x
r=+0.7 r=0.4 r0
-
177
Thesquareofthecorrelation 2r
The squared correlation coefficient 2r always has a value between __0 and 1___ and issometimespresentedasapercent.Itcanbeshownthatthesquareofthecorrelationisrelatedtothesumsofsquaresthatariseinregression.
Theresponses(theamountoftip)indatasetarenotallthesametheydovary.Wewouldmeasurethetotalvariationintheseresponsesas 2SSTO yy (lastcolumntotalincalculationtablesaidwewoulduselater).
Partofthereasonwhytheamountoftipvariesisbecausethereisalinearrelationshipbetweenamountoftipandamountofbill,andthestudyincludeddifferentamountsofbill.
Whenwefoundtheleastsquaresregressionline,therewasstillsomesmallvariationremainingoftheresponsesfromtheline.ThisamountofvariationthatisnotaccountedforbythelinearrelationshipiscalledtheSSE.Theamountofvariationthat isaccountedforbythe linearrelationship iscalledthesumofsquaresduetothemodel(orregression),denotedbySSM(orsometimesasSSR).Sowehave: SSTO=______SSM+SSE________Itcanbeshownthat 2r =
SSTOSSM
SSTOSSESSTO
= theproportionoftotalvariabilityintheresponsesthatcanbeexplainedbythelinearrelationshipwiththeexplanatoryvariable x .
Note: Thevalueof 2r andthesesumsofsquaresaresummarized inanANOVAtablethat isstandardoutputfromcomputerpackageswhendoingregression.
Totalvariationintheys
Variationnotaccountedfor
-
178
MeasuringStrengthandDirectionforExam2vsFinalFromourfirstcalculationtablewehave:SSTO=___134_____________Fromourresidualcalculationtablewehave:SSE=___20.27_______________Sothesquaredcorrelationcoefficientforourexamscoresregressionis:
SSTOSSESSTOr 2 =
134 20.27 113.73 0.84873134 134
Interpretation:
Weaccountedfor~84.9%ofthevariationin__AmountofTipsreceived_
bythelinearregressiononAmountoftheBill.
Thecorrelationcoefficientisr=2 0.84873 0.9213r
Afewmoregeneralnotes:
Nonlinearrelationships DetectingOutliersandtheirinfluenceonregressionresults. DangersofExtrapolation(predictingoutsidetherangeofyourdata) Dangersofcombininggroupsinappropriately(SimpsonsParadox) Correlationdoesnotprovecausation
-
179
RRegressionAnalysisforBillvsTipsLetslookattheRoutputforourBillandTipdata.Wewillseethatmuchofthecomputationsaredoneforus.
Call: lm(formula = Tip ~ Bill, data = Tips) Residuals: 1 2 3 4 5 6 1.5862 0.8523 0.3185 -1.9277 -2.9508 2.1215 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.58769 2.41633 -0.243 0.81980 Bill 0.17077 0.03604 4.738 0.00905 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2.251 on 4 degrees of freedom Multiple R-squared: 0.8487, Adjusted R-squared: 0.8109 F-statistic: 22.45 on 1 and 4 DF, p-value: 0.009052
Correlation Matrix
Bill Tip Bill 1.0000000 0.9212755 Tip 0.9212755 1.0000000
ANOVA Table
Response: Tip Df Sum Sq Mean Sq F value Pr(>F) Bill 1 113.732 113.732 22.446 0.009052 ** Residuals 4 20.268 5.067 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-
180
InferenceinLinearRegressionAnalysis Thematerialcoveredsofarfocusesonusingthedataforasampletographanddescribetherelationship.Theslopeandinterceptvalueswehavecomputedarestatistics,theyareestimatesoftheunderlyingtruerelationshipforthelargerpopulation.Nextweturntomaking inferencesabouttherelationshipforthe largerpopulation. Here isanice summary to help us distinguish between the regression line for the sample and theregressionlineforthepopulation.
RegressionLinefortheSample
RegressionLineforthePopulation
Allimages
Aside:E(Y)=Y(x)=meanresponseatagivenx;sometimescalledtheregressionfunction.Itcantakeonmanyforms,wewillconsiderthesimplelinearregressionfunction:0+1x
-
181
Todoformalinference,wethinkofourb0andb1asestimatesoftheunknownparameters0and1. Below we have the somewhat statistical way of expressing the underlying model thatproducesourdata:LinearModel:theresponsey=[0+1(x)]+ =[Populationrelationship]+RandomnessThisstatisticalmodelforsimplelinearregressionassumesthatforeachvalueofxtheobservedvaluesoftheresponse(thepopulationofyvalues)isnormallydistributed,varyingaroundsometruemean(thatmaydependonx ina linearway)andastandarddeviationthatdoesnotdependonx.ThistruemeanissometimesexpressedasE(Y)=0+1(x).Andthecomponentsandassumptionsregardingthisstatisticalmodelareshowvisuallybelow.
Therepresentsthetrueerrorterm.Thesewouldbethedeviationsofaparticularvalueoftheresponseyfromthetrueregressionline.Asthesearethedeviationsfromthemean,thentheseerrortermsshouldhaveanormaldistributionwithmean0andconstantstandarddeviation.
Now,wecannotobservethese s.Howeverwewillbeabletousetheestimated(observable)errors,namelytheresiduals,tocomeupwithanestimateofthestandarddeviationandtochecktheconditionsaboutthetrueerrors.
Trueregressionline
x
-
182
Sowhathavewedone,andwherearewegoing?1. Estimatetheregressionlinebasedonsomedata.DONE!2. Measurethestrengthofthelinearrelationshipwiththecorrelation. DONE!3. Usetheestimatedequationforpredictions. DONE!4. Assessifthelinearrelationshipisstatisticallysignificant.5. Provideintervalestimates(confidenceintervals)forourpredictions.6. Understandandchecktheassumptionsofourmodel.Wehavealreadydiscussedthedescriptivegoalsof1,2,and3.Fortheinferentialgoalsof4and5,wewillneedanestimateoftheunknownstandarddeviationinregression
EstimatingtheStandardDeviationforRegressionThestandarddeviationforregressioncanbethoughtofasmeasuringtheaveragesizeoftheresiduals.Arelativelysmallstandarddeviationfromtheregressionlineindicatesthatindividualdatapointsgenerallyfallclosetotheline,sopredictionsbasedonthelinewillbeclosetotheactualvalues.
It seems reasonable thatour estimateof this average sizeof the residualsbebasedon theresidualsusingthesumofsquaredresidualsanddividingbyappropriatedegreesoffreedom.Ourestimateofisgivenby:s= MSE
nSSE
n 22
residuals squared of sum
where 22 yyeSSE i
Note: Whyn2? Inestimatingthemeanresponsewehadtoestimate2quantities,theyinterceptandtheslope;sowelose2df.EstimatingtheStandardDeviation:BillvsTipBelowaretheportionsoftheRregressionoutputthatwecouldusetoobtaintheestimateofforourregressionanalysis.
FromSummary:
Residual standard error: 2.251 on 4 degrees of freedom Multiple R-squared: 0.8487, Adjusted R-squared: 0.8109 F-statistic: 22.45 on 1 and 4 DF, p-value: 0.009052
OrfromANOVA:Response: Tip Df Sum Sq Mean Sq F value Pr(>F) Bill 1 113.732 113.732 22.446 0.009052 ** Residuals 4 20.268 5.067
-
183
SignificantLinearRelationship?
Considerthefollowinghypotheses: 0: 10 H versus 0: 1 aH Whathappensifthenullhypothesisistrue?If1=0thenE(Y)=0=>aconstantnomatterwhatthevalueofxis.i.e.knowingxdoesnothelptopredicttheresponse.Sothesehypothesesaretestingifthereisasignificantnonzerolinearrelationshipbetweenyandx. Thereareanumberofwaystotestthishypothesis.Onewayisthroughatteststatistic(thinkaboutwhyitisatandnotaztest).Thegeneralformforatteststatisticis:
statistic sample theoferror standard valuenull - statistic samplet
Wehaveoursampleestimatefor 1 ,itis 1b .Andwehavethenullvalueof0.Soweneedthestandarderrorfor 1b .Wecouldderiveit,usingtheideaofsamplingdistributions(thinkaboutthepopulationofallpossible 1b valuesifweweretorepeatthisprocedureoverandovermanytimes).Hereistheresult:ttestforthepopulationslope 1 Totest 0: 10 H wewoulduse )(s.e.
0
1
1b
bt where 21 )( xx
sbSE andthedegreesoffreedomforthetdistributionaren2.This tstatistic couldbemodified to testa varietyofhypothesesabout thepopulation slope(differentnullvaluesandvariousdirectionsofextreme).
TryIt!SignificantRelationshipbetweenBillandTip?Isthereasignificant(nonzero)linearrelationshipbetweenthetotalcostofarestaurantbillandthetipthatisleft?(isthebillausefullinearpredictorforthetip?)Thatis,test 0: 10 H versus 0: 1 aH usinga5%levelofsignificance.1. 1 2
2.251( ) 0.0363900
sSE bx x
2. 11
0 0.17077 0 4.74s.e.( ) 0.0326bt
b
3.Usingthettablewithdf=62=4,wehavepvalue
-
184
Thinkaboutit:Basedontheresultsofthepreviousttestconductedatthe5%significancelevel,doyouthinka95%confidenceintervalforthetrueslope 1 wouldcontainthevalueof0?
ConfidenceIntervalforthepopulationslope 1 11 * bSEtb wheredf=n2forthe *t valueComputetheintervalandcheckyouranswer.Couldyouinterpretthe95%confidencelevelhere?0.17077(2.78)(0.036)0.170770.10008(0.07069,0.27085)(t*=2.78fromdf=4and95%confidence)Ifthisexperimentwererepeatedmanytimes,wedexpect95%oftheresultingconfidenceintervalstocontainthepopulationslope1.InferenceaboutthePopulationSlopeusingRBelowaretheportionsoftheRregressionoutputthatwecouldusetoperformthettestandobtaintheconfidenceintervalforthepopulationslope 1 . Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.58769 2.41633 -0.243 0.81980 Bill 0.17077 0.03604 4.738 0.00905 **
Note:Thereisathirdwaytotest 0: 10 H versus 0: 1 aH .ItinvolvesanotherFtestfromanANOVAforregression.Response: Tip Df Sum Sq Mean Sq F value Pr(>F) Bill 1 113.732 113.732 22.446 0.009052 ** Residuals 4 20.268 5.067 *ThettestismoreflexiblethantheFtest;Fonlytwosidedwithnull=0
-
185
PredictingforIndividualsversusEstimatingtheMeanConsidertherelationshipbetweenthebillandtipLeastsquaresregressionline(orestimatedregressionfunction):
y 0.5877+0.17077(x) also YE =0.5877+0.17077(x)
Wealsohave: s 2.251HowwouldyoupredictthetipforBarbwhohada$50restaurantbill?y 0.5877+0.17077(50)=$7.95
Howwouldyouestimatethemeantipforallcustomerswhohada$50restaurantbill? YE =0.5877+0.17077(50)=$7.95
Soourestimateforpredictingafutureobservationandforestimatingthemeanresponsearefoundusingthesameleastsquaresregressionequation.Whatabouttheirstandarderrors?(Wewouldneedthestandarderrorstobeabletoproduceanintervalestimate.) Idea:Considerapopulationofindividualsandapopulationofmeans:Whatisthestandarddeviationforapopulationofindividuals?
Whatisthestandarddeviationforapopulationofmeans?nWhichstandarddeviationislarger?Soapredictionintervalforanindividualresponsewillbe
(widerornarrower) thanaconfidenceintervalforameanresponse.
Populationofindividuals
n
Populationofmeans
-
186
Herearethe(somewhatmessy)formulas:
TryIt!BillvsTipConstructa95%confidenceintervalforthemeantipgivenforallcustomerswhohada$50bill(x).Recall:n=6,x 62, 2 XXSxx 3900, y 0.5877+0.17077(x),ands=2.251.y 0.5877+0.17077(50)=$7.95 t*=2.78(withdf=4)
2 2
21 ( ) 1 (50 62)s.e.(fit) 2.251 1.0157
6 3900i
x xsn x x
* s.e.(fit) 7.95 (2.78)(1.0157) 7.95 2.83y t =>($5.12,$10.78)
Constructa95%predictionintervalforthetipfromanindividualcustomerwhohada$50bill(x).
2 22 2s.e.(pred) s.e.(fit) (2.251) 1.0157 2.47s * s.e.(pred) 7.95 2.78(2.47)
7.95 6.87y t =>($1.08,$14.82)Itiswider!Showpredictionintervalandconfidenceintervalbandsonthescatterplot
-
187
CheckingAssumptionsinRegressionLetsrecallthestatisticalwayofexpressingtheunderlyingmodelthatproducesourdata:LinearModel:theresponsey=[0+1(x)]+ =[Populationrelationship]+Randomness
wherethes,thetrueerrortermsshouldbenormallydistributedwithmean0andconstantstandarddeviation,andthisrandomnessisindependentfromonecasetoanother.
Thustherearefouressentialtechnicalassumptionsrequiredforinferenceinlinearregression:(1)Relationshipisinfactlinear.(2)TRUEErrorsshouldbenormallydistributed.(3)TRUEErrorsshouldhaveconstantvariance.(4)TRUEErrorsshouldnotdisplayobviouspatterns.
Now,wecannotobservethese s.Howeverwewillbeabletousetheestimated(observable)errors,namelytheresiduals,tocomeupwithanestimateofthestandarddeviationandtochecktheconditionsaboutthetrueerrors.Sohowcanwechecktheseassumptionswithourdataandestimatedmodel?(1) Relationshipisinfactlinear.examinethescatterplotofyversusx(2) TRUEErrorsshouldbenormallydistributed.Histogramorqqplotofresiduals
(3) TRUEErrorsshouldhaveconstantvariance. Ifwesee(4)TRUEErrorsshouldnotdisplayobviouspatterns.
Now,ifwesaw
ResidualvsFittedPlot y :ifrandomscatterwithnopattern
inhorizontalband=>ok
ResidualvsFittedPlot:showsevidencethattrueerrorsdonothaveconstantvariance
ResidualvsFittedPlot:showsevidencethattheunderlyingrelationshipmaynotbelinear
(maybequadratic)
-
188
Let'sturntoonelastfullregressionproblemthatincludescheckingassumptions.RelationshipbetweenheightandfootlengthforCollegeMenThe heights (in inches) and foot lengths (incentimeters) of 32 college men were used todevelop amodel for the relationship betweenheight and foot length. The scatterplot andRregressionoutputareprovided.
mean sd n foot 27.78125 1.549701 32 height 71.68750 3.057909 32
Call: lm(formula = foot ~ height, data = heightfoot) Residuals: Min 1Q Median 3Q Max -1.74925 -0.81825 0.07875 0.58075 2.25075 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.25313 4.33232 0.058 0.954 height 0.38400 0.06038 6.360 5.12e-07 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.028 on 30 degrees of freedom Multiple R-squared: 0.5741, Adjusted R-squared: 0.5599 F-statistic: 40.45 on 1 and 30 DF, p-value: 5.124e-07 Correlation Matrix foot height foot 1.0000000 0.7577219 height 0.7577219 1.0000000 Analysis of Variance Table Response: foot Df Sum Sq Mean Sq F value Pr(>F) height 1 42.744 42.744 40.446 5.124e-07 *** Residuals 30 31.705 1.057 Alsonotethat:SXX= 2xx =289.87
-
189
a. Howmuchwould you expect foot length to increase foreach1inch increase inheight?Includetheunits.
Thisisaskingabouttheslope:0.384centimeters.
b. Whatisthecorrelationbetweenheightandfootlength?
r=0.7577(wouldyoubeabletointerpretthevalueofr2?
c. Givetheequationoftheleastsquaresregressionlineforpredictingfootlengthfromheight. predictedy=yhat=0.252+0.384(x)d. SupposeMaxis70inchestallandhasafootlengthof28.5centimeters.Basedontheleast
squaresregressionline,whatisthevalueofthepredicationerror(residual)forMax?Showallwork.
predictedy=yhat=0.252+0.384(70)=27.13cm observedypredictedy=28.527.13=1.37cm e. Use a 1% significance level to assess if there is a significant positive linear relationship
betweenheightandfootlength.Statethehypothesestobetested,theobservedvalueoftheteststatistic,thecorrespondingpvalue,andyourdecision.
Hypotheses:H0:_____1=0_____ Ha:_____1>0_______ TestStatisticValue:____6.36_______ pvalue:_0.0000005124/2=0.00000002562_
Decision:(circle) FailtorejectH0 RejectH0 Conclusion:Thusitappearsthereisasignificantpositivelinearrelationshipbetween
heightandfootlengthsforthepopulationofcollegemenrepresentedbythesample.
-
190
f. Calculatea95%confidenceintervalfortheaveragefootlengthforallcollegemenwhoare70inchestall.(Justclearlypluginallnumericalvalues.)
2
2* )(1
xx
xxn
styi
270 71.7127.132 (2.04) 1.028
32 289.87
27.1320.425(26.707,27.557)g. Considertheresidualsvsfittedplotshown.
Doesthisplotsupporttheconclusionthatthelinearregressionmodelisappropriate? Yes No
Explain:
Theplotshowsarandomscatterinahorizontalbandaround0withnopattern.
Note: onexam,studentswhosaidNO,becausethevariationappearstobechangingweremarkedasoktoo.
-
191
RegressionLinearRegressionModelPopulationVersion:
Mean: Individual:
where is SampleVersion:
Mean: Individual:
StandardErroroftheSampleSlope
ConfidenceIntervalfor
df=n2tTestfor Totest
df=n2or df=1,n2
ParameterEstimators
ConfidenceIntervalfortheMeanResponse df=n2 where
Residuals=observedypredictedy
PredictionIntervalforanIndividualResponse df=n2
where Correlationanditssquare
where
StandardErroroftheSampleIntercept ConfidenceIntervalfor df=n2
Estimateof where
tTestfor Totest df=n2
xYExY 10)( iii xy 10
i ),0( N
xbby 10 iii exbby 10
21 )(s.e. xxs
SsbXX
1)(s.e. 1
*1 btb
1 0: 10 H )(s.e.0
1
1
bb
t
MSEMSREGF
221 xx
yxx
xx
yyxx
SS
bXX
XY
xbyb 10
s.e.(fit) *ty
XXSxx
ns
2)(1)fit(s.e.
yye s.e.(pred) *ty 22 )fit(s.e.s.e.(pred) s
YYXX
XY
SSS
r
SSTOSSREG
SSTOSSESSTOr 2
2 yySSSTO YY
XXSx
nsb
2
01)(s.e.
0)(s.e. 0
*0 btb
2 nSSEMSEs
22 eyySSE
0 0: 00 H
)(s.e.0
0
0
bb
t
-
192
AdditionalNotesAplacetojotdownquestionsyoumayhaveandaskduringofficehours,takeafewextranotes,writeoutanextraproblemorsummarycompletedinlecture,createyourownsummaryabouttheseconcepts.
top related