3.2: least squares regressions

Post on 23-Feb-2022

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

3.2:LeastSquaresRegressions

Section3.2Least-SquaresRegression

Afterthissection,youshouldbeableto…

ü INTERPRETaregressionline

üCALCULATEtheequationoftheleast-squaresregressionline

üCALCULATEresiduals

üCONSTRUCTandINTERPRETresidualplots

üDETERMINEhowwellalinefitsobserveddata

ü INTERPRETcomputerregressionoutput

RegressionLinesAregressionline summarizestherelationshipbetweentwovariables,butonlyinsettingswhereoneofthevariableshelpsexplain orpredicttheother.

Aregressionline isalinethatdescribeshowa

responsevariabley changesasanexplanatoryvariablex

changes.Weoftenusearegressionlinetopredictthevalueofy

foragivenvalueofx.

RegressionLinesRegressionlinesareusedtoconductanalysis.• Collegesusestudent’sSATandGPAstopredictcollegesuccess

• Professionalsportsteamsuseplayer’svitalstats(40yarddash,height,weight)topredictsuccess

• Macy’susesshipping,salesandinventorydatapredictfuturesales.

• MDCPSusesstudentdatatoevaluateteachersusingtheVAMmodel

RegressionLineEquationSupposethatyisaresponsevariable(plottedontheverticalaxis)andxisanexplanatoryvariable(plottedonthehorizontalaxis).Aregressionlinerelatingytoxhasanequationoftheform:

ŷ=ax+bInthisequation,•ŷ(read“yhat”)isthepredictedvalueoftheresponsevariableyforagivenvalueoftheexplanatoryvariablex.•aistheslope,theamountbywhichyispredictedtochangewhenxincreasesbyoneunit.•bistheyintercept,thepredictedvalueofywhenx=0.

RegressionLineEquation

0.0908x+16.3

FormatofRegressionLines

Format1:=0.0908x+16.3=predictedbackpackweight

x=student’sweight

Format2:Predictedbackpackweight=16.3+0.0908(student’sweight)

InterpretingLinearRegression• Y-intercept:Astudentweighingzeropoundsispredicted

tohaveabackpackweightof16.3pounds(nopracticalinterpretation).

• Slope:Foreachadditionalpoundthatthestudentweighs,itispredictedthattheirbackpackwillweighanadditional0.0908poundsmore,onaverage.

InterpretingLinearRegressionInterpretthey-interceptandslopevaluesincontext.Isthereanypracticalinterpretation?

=37x+270x=HoursStudiedfortheSAT

PredictedSATMathScore

InterpretingLinearRegression=37x+270

Slope:Foreachadditionalhourthestudentstudies,his/herscoreispredictedtoincrease

37points,onaverage.Thismakessense

OR thisdoesnotmakesense;itisunreasonableforscorestoincreaseby37pointsforJUSTonehourofstudying.

InterpretingLinearRegression=37x+270

Y-intercept:Ifastudentstudiesforzerohours,thenthestudent’spredictedSATscoreis270

points.Thismakessense

OR ThisdoesnotmakesensebecauseaSATscoresof270isverylowregardlessofstudy.

PredictedValueWhatisthepredictedSATMathscoreforastudentwhostudies12hours?

=37x+270HoursStudiedfortheSAT(x)PredictedSATMathScore(y)

PredictedValueWhatisthepredictedSATMathscoreforastudentwhostudies12hours?

=37x+270HoursStudiedfortheSAT(x)PredictedSATMathScore(y)

=37(12)+270PredictedScore:714points

SelfCheckQuiz!

SelfCheckQuiz:CalculatetheRegressionEquation

AcrazyprofessorbelievesthatachildwithIQ100shouldhaveareadingtestscoreof50,andthatreadingscoreshouldincreaseby1pointforeveryadditionalpointofIQ.Whatistheequationoftheprofessor’sregressionlineforpredictingreadingscorefromIQ?Besuretoidentifyallvariablesused.

SelfCheckQuiz:CalculatetheRegressionEquation

AcrazyprofessorbelievesthatachildwithIQ100shouldhaveareadingtestscoreof50,andthatreadingscoreshouldincreaseby1pointforeveryadditionalpointofIQ.Whatistheequationoftheprofessor’sregressionlineforpredictingreadingscorefromIQ?Besuretoidentifyallvariablesused.

Answer:=50+x=predictedreadingscore

x=numberofIQpointsabove100

SelfCheckQuiz:InterpretingRegressionLines&PredictedValueDataontheIQtestscoresandreadingtestscoresforagroupoffifth-gradechildrenresultedinthefollowingregressionline:predictedreadingscore=−33.4+0.882(IQscore)

(a)What’stheslopeofthisline?Interpretthisvalueincontext.(b)What’sthey-intercept?Explainwhythevalueoftheinterceptisnotstatisticallymeaningful.(c)FindthepredictedreadingscoresfortwochildrenwithIQscoresof90and130,respectively.

predictedreadingscore=−33.4+0.882(IQscore)

(a)Slope=0.882.Foreach1pointincreaseofIQscore,thereadingscoreispredictedtoincrease0.882points,onaverage.

(b)Y-intercept=-33.4.IfthestudenthasanIQofzero,whichisessentialimpossible(wouldnotbeabletoholdapenciltotaketheexam),thescorewouldbe-33.4.Thishasnopracticalinterpretation.

(c)PredictedValue:90:-33.4+0.882(90)=45.98130:-33.4+0.882(130)=81.26points.

Least-SquaresRegressionLineDifferentregressionlinesproducedifferentresiduals.TheregressionlineweuseinAPStatsisLeast-SquaresRegression.Theleast-squaresregressionlineofyonxisthelinethatmakesthesumofthesquaredresidualsassmallaspossible.

ResidualsAresidual isthedifferencebetweenanobservedvalueoftheresponsevariableandthevaluepredictedbytheregressionline.Thatis,

residual=actualy – predictedy(rememberAP)

residual=y - ŷ

residual

Positiveresiduals(aboveline)

Negativeresiduals(belowline)

HowtoCalculatetheResidual

1. Calculatethepredictedvalue,byplugginginxtotheLSRE.

2. Determinetheobserved/actualvalue.3. Subtract.

CalculatetheResidual1. Ifastudentweighs170poundsandtheirbackpackweighs

35pounds,whatisthevalueoftheresidual?

2. Ifastudentweighs105poundsandtheirbackpackweighs24pounds,whatisthevalueoftheresidual?

CalculatetheResidual1.Ifastudentweighs170poundsandtheirbackpackweighs35pounds,whatisthevalueoftheresidual?

Predicted:ŷ=16.3+0.0908(170)=31.736Observed:35Residual:35- 31.736=3.264poundsThestudent’sbackpackweighs3.264poundsmorethanpredicted.

CalculatetheResidual2.Ifastudentweighs105poundsandtheirbackpackweighs24pounds,whatisthevalueoftheresidual?

Predicted:ŷ=16.3+0.0908(105)=25.834Observed:24Residual:24– 25.834=-1.834Thestudent’sbackpackweighs1.834poundslessthanpredicted

CheckYourUnderstandingSomedatawerecollectedontheweightofamalewhitelaboratoryratforthefirst25weeksafteritsbirth.Ascatterplotofy =weight(ingrams)andx=timesincebirth(inweeks)showsafairlystrong,positivelinearrelationship.Theregressionequation𝒚" = 𝟏𝟎𝟎 + 𝟒𝟎𝒙modelsthedatawell.A. Predicttherat’sweightat16weeksold.

B.Calculateandinterprettheresidualiftheratweighed700gramsat16weeksold

C.Shouldyouusethislinetopredicttherat’sweightat2yearsold?

ResidualPlotsAresidualplot isascatterplotoftheresidualsagainsttheexplanatoryvariable.Residualplotshelpusassesshowwellaregressionlinefitsthedata.

TI-NSpire:ResidualPlots1. PressMENU,4:Analyze2. Option6:Residual,Option2:ShowResidualPlot

InterpretingResidualPlotsAresidualplotmagnifiesthedeviationsofthepointsfromtheline,makingiteasiertoseeunusualobservationsandpatterns.

1) Theresidualplotshouldshownoobviouspatterns2) Theresidualsshouldberelativelysmallinsize.

Avalidresidualplotshouldlooklikethe“nightsky”withapproximatelyequalamountsofpositiveandnegativeresiduals.

Pattern in residualsLinear model not

appropriate

ShouldYouUseLSRL?1.

2.

InterpretingComputerRegressionOutput

Besureyoucanlocate:theslope,they interceptanddeterminetheequationoftheLSRL.

𝒚" =-0.0034415x+3.5051𝒚" =predicted....x=explanatoryvariable

DetermineistheequationoftheLSRL.

DetermineistheequationoftheLSRL.

𝒚" =174.40x+72.95x=customersinline𝒚" =predictedsecondsittakestocheckout.

r2:CoefficientofDeterminationr2tellsushowmuchbettertheLSRLdoesatpredictingvaluesofythansimplyguessingthemeany foreachvalueinthedataset.

Inthisexample,r2 equals60.6%.

60.6%ofthevariationinpackweightisexplainedbythelinearrelationshipwithbodyweight.

(Insertr2)%ofthevariationiny isexplainedbythelinearrelationshipwithx.

Interpretr2

Interpretinasentence(howmuchvariationisaccountedfor?)

1. r2 =0.875,x=hoursstudied,y=SATscore2. r2 =0.523,x=hoursslept,y=alertnessscore

Answers:1. 87.5%ofthevariationinSATscoreis

explainedbythelinearrelationshipwiththenumberofhoursstudied.

2. 52.3%ofthevariationinalertnessscoreisexplainedbythelinearrelationshipwiththenumberofhoursslept.

Interpretr2

S:StandardDeviationoftheResiduals

Ifweusealeast-squaresregressionlinetopredictthevaluesofaresponsevariabley fromanexplanatoryvariablex,thestandarddeviationoftheresiduals(s) isgivenby

SrepresentsthetypicaloraverageERROR(residual).

Positive=UNDERpredictsNegative=OVERpredicts

s =residuals2

n 2=

(yi Ù y )2

n 2

S:StandardDeviationoftheResiduals

1.Identifyandinterpretthestandarddeviationoftheresidual.

S:StandardDeviationoftheResiduals

Answer:S=0.740

Interpretation:Onaverage,themodelunderpredictsfatgainby0.740kilogramsusingtheleast-squaresregressionline.

SelfCheckQuiz!Thedataisarandomsampleof10trainscomparingnumberofcarsonthetrainandfuelconsumptioninpoundsofcoal.• Whatistheregressionequation?Besuretodefineallvariables.• Whatisr2 tellingyou?• Defineandinterprettheslopeincontext.Doesithavea

practicalinterpretation?• Defineandinterpretthey-interceptincontext.• Whatisstellingyou?

1.ŷ=2.1495x+10.667ŷ=predictedfuelconsumptioninpoundsofcoalx=numberofrailcars

2.96.7%ofthevarationisfuelconsumptionisexplainedbythelinearrelationshipwiththenumberofrailcars.3.Slope=2.1495.Witheachadditionalcar,thefuelconsuptionincreasedby2.1495poundsofcoal,onaverage.Thismakespracticalsense.4.Y-interpect=10.667.Whentherearenocarsattachedtothetrainthefuelconsuptionis10.667poundsofcoal.Thishasnopracticalintrepretationbeacusethereisalwaysatleastonecar,theengine.5.S=4.361.Onaverage,themodelunderpredictsfuelconsumptionby4.361poundsofcoalusingtheleast-squaresregressionline.

ExtrapolationWecanusearegressionlinetopredicttheresponseŷ foraspecificvalueoftheexplanatoryvariablex.Theaccuracyofthepredictiondependsonhowmuchthedatascatterabouttheline.Exercisecautioninmakingpredictionsoutsidetheobservedvaluesofx.

Extrapolation istheuseofaregressionlineforpredictionfaroutsidetheintervalofvaluesoftheexplanatory

variablex usedtoobtaintheline.Suchpredictionsareoftennotaccurate.

OutliersandInfluentialPoints

• Anoutlierisanobservationthatliesoutsidetheoverallpatternoftheotherobservations.

• Anobservationisinfluentialforastatisticalcalculationifremovingitwouldmarkedlychangetheresultofthecalculation.

• Pointsthatareoutliersinthex directionofascatterplotareofteninfluentialfortheleast-squaresregressionline.

• Note:Notallinfluentialpointsareoutliers,norarealloutliersinfluentialpoints.

OutliersandInfluentialPoints

Theleftgraphisperfectlylinear.Intherightgraph,thelastvaluewaschangedfrom(5,5)to(8,5)…clearlyinfluential,becauseitchangedthegraphsignificantly.However,theresidualisverysmall.

IdentifytheOutlier…

IdentifytheOutlier…

CheckYourUnderstandingThescatterplotshowsthepayroll(inmillionsofdollars)andnumberofwinsforMajorLeagueBaseballteamsin2016,alongwiththeleast-squaresregressionline.ThepointshighlightedinredrepresenttheLosAngelesDodgers(farright)andtheClevelandIndians(upperleft).

CheckYourUnderstandingA.DescribewhatinfluencethepointrepresentingtheLosAngelesDodgershasonthe equationoftheleast-squaresregressionline.Explainyourreasoning.

CheckYourUnderstandingB.DescribewhatinfluencethepointrepresentingtheClevelandIndianshasonthestandarddeviation oftheresidualsandr2.Explainyourreasoning.

CorrelationandRegressionLimitations

Thedistinctionbetweenexplanatoryandresponsevariablesisimportantinregression.

CorrelationandRegressionLimitations

Correlationandregressionlinesdescribeonlylinearrelationships.

NO!!!

Correlationandleast-squaresregressionlinesarenotresistant.

CorrelationandRegressionLimitations

CorrelationandRegressionWisdom

Anassociationbetweenanexplanatoryvariablex andaresponsevariabley,evenifitisverystrong,isnotbyitselfgoodevidencethatchangesinx actuallycausechangesiny.

AssociationDoesNotImplyCausation

Aseriousstudyoncefoundthatpeoplewithtwocarslivelongerthanpeoplewhoonlyownonecar.Owningthreecarsisevenbetter,andsoon.Thereisasubstantialpositivecorrelationbetweennumberofcarsx andlengthoflifey.Why?

FRQ2018#1

AdditionalCalculations&Proofs

Least-SquaresRegressionLineWecanusetechnologytofindtheequationoftheleast-squaresregressionline.Wecanalsowriteitintermsofthemeansandstandarddeviationsofthetwovariablesandtheircorrelation.

Equationoftheleast-squaresregressionlineWehavedataonanexplanatoryvariablex andaresponsevariabley forn individuals.Fromthedata,calculatethemeansandstandarddeviationsofthetwovariablesandtheircorrelation.Theleastsquaresregressionlineisthelineŷ =a +bx with

slope andy intercept

b = rsysx

a = y bx

CalculatetheLeastSquaresRegressionLine

SomepeoplethinkthatthebehaviorofthestockmarketinJanuarypredictsitsbehaviorfortherestoftheyear.Taketheexplanatoryvariablex tobethepercentchangeinastockmarketindexinJanuaryandtheresponsevariabley tobethechangeintheindexfortheentireyear.Weexpectapositivecorrelationbetweenx andy becausethechangeduringJanuarycontributestothefullyear’schange.Calculationfromdataforan18-yearperiodgivesMeanx=1.75% Sx=5.36% Meany=9.07%Sy =15.35% r=0.596Findtheequationoftheleast-squareslineforpredictingfull-yearchangefromJanuarychange.Showyourwork.

TheRoleofr2 inRegressionThestandarddeviationoftheresidualsgivesusanumericalestimateoftheaveragesizeofourpredictionerrors.

Thecoefficientofdeterminationr2 isthefractionofthevariationinthevaluesofy thatisaccountedforbytheleast-squaresregressionlineofy onx.Wecancalculater2 usingthefollowingformula:

Inpracticality,justsquarethecorrelationr.

r2 =1 SSESST

= 2residualSSE = 2)( yySST i

AccountedforError

IfweusetheLSRLtomakeourpredictions,thesumofthesquaredresidualsis30.90.SSE=30.90

1– SSE/SST=1–30.97/83.87r2 =0.63263.2%ofthevariationinbackpackweightisaccountedforbythelinearmodelrelatingpackweighttobodyweight.

Ifweusethemeanbackpackweightasourprediction,thesumofthesquaredresidualsis83.87.SST=83.87

SSE/SST=30.97/83.87SSE/SST=0.368

Therefore,36.8%ofthevariationinpackweightisunaccountedfor bytheleast-squaresregressionline.

UnaccountedforError

InterpretingaRegressionLineConsidertheregressionlinefromtheexample(pg.164)“DoesFidgetingKeepYouSlim?”Identifytheslopeandy-interceptandinterpreteachvalueincontext.

The y-intercept a = 3.505 kg is the fat gain estimated by this model if NEA does not change when a person overeats.

The slope b = -0.00344 tells us that the amount of fat gained is predicted to go down by 0.00344 kg for each added calorie of NEA.

fatgain = 3.505 - 0.00344(NEA change)

top related