linear models: looking for bias - discovering statistics models: looking for bias the following...
Post on 25-Jun-2018
213 Views
Preview:
TRANSCRIPT
©Prof.AndyField,2016 www.discoveringstatistics.com Page1
Linear Models: Looking for Bias The following sections have been adapted from Field (2013) Chapter 8. These sections have been edited downconsiderablyandIsuggest(especiallyifyou’reconfused)thatyoureadthisChapterinitsentirety.Youwillalsoneedtoreadthischaptertohelpyouinterprettheoutput.Ifyou’rehavingproblemsthereisplentyofsupportavailable:youcan(1)emailorseeyourseminartutor(2)postamessageonthecoursebulletinboardor(3)dropintomyofficehour.
More on Bias Outliers Wehaveseenthatoutlierscanbiasamodel:theybiasestimatesoftheregressionparameters.weknowthatanoutlier,byitsnature,isverydifferentfromalloftheotherscores.Therefore,ifweweretoworkoutthedifferencesbetweenthedatavaluesthatwerecollected,andthevaluespredictedbythemodel,wecoulddetectanoutlierbylookingforlargedifferences.Thedifferencesbetweenthevaluesoftheoutcomepredictedbythemodelandthevaluesoftheoutcomeobservedinthesamplearecalledresiduals.Ifamodelisapoorfitofthesampledatathentheresidualswillbelarge.Also,ifanycasesstandoutashavingalargeresidual,thentheycouldbeoutliers.
Thenormalorunstandardizedresidualsdescribedabovearemeasuredinthesameunitsastheoutcomevariableandso are difficult to interpret across differentmodels. All we can do is to look for residuals that stand out as beingparticularly large:wecannotdefineauniversalcut-offpoint forwhatconstitutesa largeresidual.Toovercomethisproblem,weusestandardizedresiduals,whicharetheresidualsconvertedtoz-scores,whichmeanstheyareconvertedintostandarddeviationunits(i.e.,theyaredistributedaroundameanof0withastandarddeviationof1).Byconvertingresidualsintoz-scores(standardizedresiduals)wecancompareresidualsfromdifferentmodelsandusewhatweknowaboutthepropertiesof z-scorestodeviseuniversalguidelinesforwhatconstitutesanacceptable(orunacceptable)value.Forexample,inanormallydistributedsample,95%ofz-scoresshouldliebetween−1.96and+1.96,99%shouldliebetween−2.58and+2.58,and99.9%(i.e.,nearlyallofthem)shouldliebetween−3.29and+3.29.Somegeneralrulesforstandardizedresidualsarederivedfromthesefacts:(1)standardizedresidualswithanabsolutevaluegreaterthan3.29(wecanuse3asanapproximation)arecauseforconcernbecauseinanaveragesampleavaluethishighisunlikelytooccur;(2)ifmorethan1%ofoursamplecaseshavestandardizedresidualswithanabsolutevaluegreaterthan2.58(weusuallyjustsay2.5)thereisevidencethattheleveloferrorwithinourmodelisunacceptable(themodelisafairlypoorfitofthesampledata);and(3)ifmorethan5%ofcaseshavestandardizedresidualswithanabsolutevaluegreaterthan1.96(wecanuse2forconvenience)thenthereisalsoevidencethatthemodelisapoorrepresentationoftheactualdata.
Influential Cases Aswellastestingforoutliersbylookingattheerrorinthemodel,itisalsopossibletolookatwhethercertaincasesexertundue influenceover theparametersof themodel. So, ifwewere todeleteacertaincase,wouldweobtaindifferentregressioncoefficients?Thistypeofanalysiscanhelptodeterminewhethertheregressionmodelisstableacrossthesample,orwhetheritisbiasedbyafewinfluentialcases.Therearenumerouswaystolookforinfluentialcases,alldescribedinscintillatingdetailinField(2013).We’lljustlookat1ofthem,Cook’sdistance,whichquantifiestheeffectofasinglecaseonthemodelasawhole.CookandWeisberg(1982)havesuggestedthatvaluesgreaterthan1maybecauseforconcern.
Generalization Rememberfromyourlectureonbiasthatlinearmodelsassume:
• Linearityandadditivity:therelationshipyou’retryingtomodelis,infact,linearandwithseveralpredictors,theycombineadditively.
• Normality: For b estimates to be optimal the residuals should be normally distributed. For p-values andconfidenceintervalstobeaccurate,thesamplingdistributionofbsshouldbenormal.
• Homoscedasticity:necessaryforbestimatestobeoptimalandsignificancetestsandCIsoftheparameterstobeaccurate.
©Prof.AndyField,2016 www.discoveringstatistics.com Page2
However,therearesomeotherassumptionsthatareimportantifwewanttogeneralizethemodelwefitbeyondoutsample.Themostimportantis:
• Independenterrors:Foranytwoobservationstheresidualtermsshouldbeuncorrelated(i.e.,independent).This eventuality is sometimes described as a lack of autocorrelation. If we violate the assumption ofindependencethenourconfidenceintervalsandsignificancetestswillbeinvalid.ThisassumptioncanbetestedwiththeDurbin-Watsontest(1951).Theteststatisticcanvarybetween0and4withavalueof2meaningthatthe residuals are uncorrelated. A value greater than 2 indicates a negative correlation between adjacentresiduals,whereasavaluebelow2 indicatesapositive correlation.The sizeof theDurbin-Watsonstatisticdependsuponthenumberofpredictorsinthemodelandthenumberofobservations.Asaveryconservativeruleofthumb,valueslessthan1orgreaterthan3aredefinitelycauseforconcern;however,valuescloserto2maystillbeproblematicdependingonyoursampleandmodel.
Therearesomeotherconsiderationsthatwehavenotyetdiscussed(seeBerry,1993):
• Predictorsareuncorrelatedwith‘externalvariables’:Externalvariablesarevariablesthathaven’tbeenincludedintheregressionmodelwhichinfluencetheoutcomevariable.
• Variabletypes:Allpredictorvariablesmustbequantitativeorcategorical(with2categories),andtheoutcomevariablemustbequantitative,continuousandunbounded.
• Noperfectmulticollinearity:Ifyourmodelhasmorethanonepredictorthenthereshouldbenoperfectlinearrelationshipbetweentwoormoreofthepredictors.So,thepredictorvariablesshouldnotcorrelatetoo.
• Non-zerovariance:Thepredictorsshouldhavesomevariationinvalue(i.e.,theydonothavevariancesof0).Thisisself-evidentreally.
Figure1:Plotsofstandardizedresidualsagainstpredicted(fitted)values
Thefourmostimportantconditionsarelinearityandadditivity,normality,homoscedasticity,andindependenterrors.Thesecanbetestedgraphicallyusingaplotofstandardizedresiduals (zresid)againststandardizedpredictedvalues(zpred).Figure1showsseveralexamplesoftheplotofstandardizedresidualsagainststandardizedpredictedvalues.Thetopleftpanelshowsasituationinwhichtheassumptionsoflinearity,independenterrorsandhomoscedasticityhavebeenmet.Independenterrorsareshownbyarandompatternofdots.Thetoprightpanelshowsasimilarplotfor
©Prof.AndyField,2016 www.discoveringstatistics.com Page3
adatasetthatviolatestheassumptionofhomoscedasticity.Notethatthepointsformafunnel:theybecomemorespreadoutacrossthegraph.Thisfunnelshapeistypicalofheteroscedasticityandindicatesincreasingvarianceacrosstheresiduals.Thebottomleftpanelshowsaplotofsomedatainwhichthereisanon-linearrelationshipbetweentheoutcomeandthepredictor:thereisaclearcurveintheresiduals.Finally,thebottomrightpanelillustratesdatathatnotonlyhaveanon-linearrelationship,butalsoshowheteroscedasticity.Notefirstthecurvedtrendintheresiduals,andthenalsonotethatatoneendoftheplotthepointsareveryclosetogetherwhereasattheotherendtheyarewidelydispersed.Whentheseassumptionshavebeenviolatedyouwillnotseetheseexactpatterns,buthopefullytheseplotswillhelpyoutounderstandthegeneralanomaliesyoushouldlookoutfor.
Methods of Regression Lastweekwelookedatasituationwhereweforcedpredictorsintothemodel.However,thereareotheroptions.Wecanselectpredictorsinseveralways:
• Inhierarchicalregressionpredictorsareselectedbasedonpastworkandtheresearcherdecidesinwhichordertoenterthepredictorsintothemodel.Asageneralrule,knownpredictors(fromotherresearch)shouldbeenteredintothemodelfirstinorderoftheirimportanceinpredictingtheoutcome.Afterknownpredictorshave been entered, the experimenter can add any newpredictors into themodel.Newpredictors can beenteredeitherallinonego,inastepwisemanner,orhierarchically(suchthatthenewpredictorsuspectedtobethemostimportantisenteredfirst).
• Forcedentry (orEnteras it isknowninSPSS) isamethodinwhichallpredictorsareforcedintothemodelsimultaneously. Like hierarchical, this method relies on good theoretical reasons for including the chosenpredictors,butunlikehierarchicaltheexperimentermakesnodecisionabouttheorderinwhichvariablesareentered.
• Stepwisemethodsaregenerally frowneduponbystatisticians. Instepwiseregressionsdecisionsabouttheorder inwhichpredictorsareentered into themodelarebasedonapurelymathematical criterion. In theforwardmethod,aninitialmodelisdefinedthatcontainsonlytheconstant(b0).Thecomputerthensearchesforthepredictor(outoftheonesavailable)thatbestpredictstheoutcomevariable—itdoesthisbyselectingthepredictorthathasthehighestsimplecorrelationwiththeoutcome.Ifthispredictorsignificantlyimprovestheabilityofthemodeltopredicttheoutcome,thenthispredictorisretainedinthemodelandthecomputersearchesforasecondpredictor.Thecriterionusedforselectingthissecondpredictoristhatitisthevariablethathasthelargestsemi-partialcorrelationwiththeoutcome.InplainEnglish,imaginethatthefirstpredictorcanexplain40%ofthevariationintheoutcomevariable;thenthereisstill60%leftunexplained.Thecomputersearchesforthepredictorthatcanexplainthebiggestpartoftheremaining60%(itisnotinterestedinthe40% that is already explained). As such, this semi-partial correlation gives a measure of how much ‘newvariance’intheoutcomecanbeexplainedbyeachremainingpredictor.Thepredictorthataccountsforthemostnewvarianceisaddedtothemodeland,ifitmakesasignificantcontributiontothepredictivepowerofthemodel,itisretainedandanotherpredictorisconsidered.
Many writers argue that stepwise methods take the important methodological decisions out of the hands of theresearcher.What’smore,themodelsderivedbystepwisemethodsoftentakeadvantageofrandomsamplingvariationandsodecisionsaboutwhichvariablesshouldbeincludedwillbebaseduponslightdifferencesintheirsemi-partialcorrelation.However,theseslightstatisticaldifferencesmaycontrastdramaticallywiththetheoreticalimportanceofapredictortothemodel.Thereisalsothedangerofover-fitting(havingtoomanyvariablesinthemodelthatessentiallymake little contribution topredicting theoutcome) andunder-fitting (leavingout importantpredictors) themodel.However,whenlittletheoryexistsstepwisemethodsmightbetheonlypracticaloption.
The Example We’lllookatdatacollectedfromseveralquestionnairesrelatingtoclinicalpsychology,andwewillusethesemeasuresto predict social anxiety usingmultiple regression.Anxiety disorders takeondifferent shapes and forms, and eachdisorderisbelievedtobedistinctandhaveuniquecauses.Wecansummarisethedisordersandsomepopulartheoriesasfollows:
©Prof.AndyField,2016 www.discoveringstatistics.com Page4
• SocialAnxiety: Social anxietydisorder is amarkedandpersistent fearof1ormore social orperformancesituationsinwhichthepersonisexposedtounfamiliarpeopleorpossiblescrutinybyothers.Thisanxietyleadstoavoidanceofthesesituations.Peoplewithsocialphobiaarebelievedtofeelelevatedfeelingsofshame.
• ObsessiveCompulsiveDisorder(OCD):OCDischaracterisedbytheeverydayintrusionintoconsciousthinkingof intense, repetitive,personallyabhorrent,absurdandalien thoughts (Obsessions), leading to theendlessrepetition of specific acts or to the rehearsal of bizarre and irrational mental and behavioural rituals(compulsions).
Socialanxietyandobsessivecompulsivedisorderareseenasdistinctdisordershavingdifferentcauses.However,therearesomesimilarities.
• Theybothinvolvesomekindofattentionalbias:attentiontobodilysensationinsocialanxietyandattentiontothingsthatcouldhavenegativeconsequencesinOCD.
• Theybothinvolverepetitivethinkingstyles:socialphobicsruminateaboutsocialencountersaftertheevent(knownaspost-eventprocessing),andpeoplewithOCDhaverecurringintrusivethoughtsandimages.
• Theybothinvolvesafetybehaviours(i.e.tryingtoavoidthethingthatmakesyouanxious).
This might lead us to think that, rather than being different disorders, they are manifestations of the same coreprocesses.Onewaytoresearchthispossibilitywouldbetoseewhethersocialanxietycanbepredictedfrommeasuresofotheranxietydisorders.IfsocialanxietydisorderandOCDaredistinctweshouldexpectthatmeasuresofOCDwillnotpredictsocialanxiety.However,iftherearecoreprocessesunderlyingallanxietydisorders,thenmeasuresofOCDshouldpredictsocialanxiety.
Figure2:Datalayoutformultipleregression
ThedataareinthefileSocialAnxietyRegression.savwhichcanbedownloadedfromStudyDirect.Thisfilecontainsfourvariables:
• TheSocialPhobiaandAnxietyInventory(SPAI),whichmeasureslevelsofsocialanxiety.
• InterpretationofIntrusionsInventory(III),whichmeasuresthedegreetowhichapersonexperiencesintrusivethoughtslikethosefoundinOCD.
• Obsessive BeliefsQuestionnaire (OBQ),whichmeasures the degree towhich people experience obsessivebeliefslikethosefoundinOCD.
• TheTestofSelf-ConsciousAffect(TOSCA),whichmeasuresshame.
Eachof134peoplewasadministeredall fourquestionnaires. You shouldnote thateachquestionnairehas itsowncolumnandeachrowrepresentsadifferentperson(seeFigure2).
©Prof.AndyField,2016 www.discoveringstatistics.com Page5
What analysis will we do? Weare going todoamultiple regressionanalysis. Specifically,we’re going todoahierarchicalmultiple regressionanalysis.Allthismeansisthatweentervariablesintotheregressionmodelinanorderdeterminedbypastresearchandexpectations.So,foryouranalysis,wewillentervariablesinso-called‘blocks’:
• Block1: the firstblockwill containanypredictors thatweexpect topredict socialanxiety.Thesevariablesshouldbeenteredusingforcedentry.Inthisexamplewehaveonlyonevariablethatweexpect,theoretically,topredictsocialanxietyandthatisshame(measuredbytheTOSCA).
• Block2:thesecondblockwillcontainourexploratorypredictorvariables(theone’swedon’tnecessarilyexpecttopredictsocialanxiety).ThisbockshouldcontainthemeasuresofOCD(OBQandIII)becausethesevariablesshouldn’tpredictsocialanxietyifsocialanxietyisindeeddistinctfromOCD.Thesevariablesshouldbeenteredusingastepwisemethodbecauseweare‘exploringthem’(thinkbacktoyourlecture).
Doing Multiple Regression on SPSS
Specifying the First Block in Hierarchical Regression
Theoryindicatesthatshameisasignificantpredictorofsocialphobia,andsothisvariableshouldbeincludedinthemodel first. The exploratory variables (obq and iii) should, therefore, be entered into themodel after shame. Thismethodiscalledhierarchical(theresearcherdecidesinwhichordertoentervariablesintothemodelbasedonpastresearch).TodoahierarchicalregressioninSPSSweenterthevariablesinblocks(eachblockrepresentingonestepinthehierarchy).Togettothemainregressiondialogboxselect .ThemaindialogboxisshowninFigure3.
Figure3:Maindialogboxforblock1ofthemultipleregressionThemaindialogboxisfairlyself-explanatoryinthatthereisaspacetospecifythedependentvariable(outcome),andaspacetoplaceoneormoreindependentvariables(predictorvariables).Asusual,thevariablesinthedataeditorarelistedontheleft-handsideofthebox.Highlighttheoutcomevariable(SPAIscores)inthislistbyclickingonitandthentransferittotheboxlabelledDependentbyclickingon ordraggingitacross.Wealsoneedtospecifythepredictorvariableforthefirstblock.Wedecidedthatshameshouldbeenteredintothemodelfirst(becausetheoryindicatesthatitisanimportantpredictor),so,highlightthisvariableinthe list and transfer it to the box labelled Independent(s) by clicking on or dragging it across.Underneath the Independent(s) box, there is a drop-down menu for specifying the Method ofregression.Youcanselectadifferentmethodofvariableentryforeachblockbyclickingon ,nexttowhereitsaysMethod.Thedefaultoptionisforcedentry,andthisistheoptionwewant,butifyouwerecarrying
©Prof.AndyField,2016 www.discoveringstatistics.com Page6
outmoreexploratorywork,youmightdecidetouseoneof thestepwisemethods (forward,backward,stepwiseorremove).
Specifying the Second Block in Hierarchical Regression
Havingspecifiedthefirstblockinthehierarchy,wemoveontotothesecond.Totellthecomputerthatyouwanttospecifyanewblockofpredictorsyoumustclickon .ThisprocessclearstheIndependent(s)boxsothatyoucanenterthenewpredictors(youshouldalsonotethatabovethisboxitnowreadsBlock2of2indicatingthatyouareinthesecondblockofthetwothatyouhavesofarspecified).Wedecidedthatthesecondblockwouldcontainbothofthenewpredictorsandsoyoushouldclickonobqand iii inthevariableslistandtransferthem,onebyone,totheIndependent(s)boxbyclickingon .ThedialogboxshouldnowlooklikeFigure4.Tomovebetweenblocksusethe
and buttons(so,forexample,tomovebacktoblock1,clickon ).
Itispossibletoselectdifferentmethodsofvariableentryfordifferentblocksinahierarchy.So,althoughwespecifiedforcedentryforthefirstblock,wecouldnowspecifyastepwisemethodforthesecond.GiventhatwehavenopreviousresearchregardingtheeffectsofobqandiiionSPAIscores,wemightbejustifiedinrequestingastepwisemethodforthisblock(seeyourlecturenotesandmytextbook).Forthisanalysisselectastepwisemethodforthissecondblock.
Figure4:Maindialogboxforblock2ofthemultipleregression
Statistics
Inthemainregressiondialogboxclickon toopenadialogboxforselectingvariousimportantoptionsrelatingtothemodel(Figure5).Mostoftheseoptionsrelatetotheparametersofthemodel;however,thereareproceduresavailable for checking the assumptionsof nomulticollinearity (Collinearity diagnostics) and independenceof errors(Durbin-Watson).Whenyouhaveselectedthestatisticsyourequire(Irecommendallbutthecovariancematrixasageneralrule)clickon toreturntothemaindialogbox.
® Estimates:Thisoptionisselectedbydefaultbecauseitgivesustheestimatedcoefficientsoftheregressionmodel(i.e.theestimatedb-values).
® Confidence intervals: This optionproduces confidence intervals for eachof theunstandardized regressioncoefficients.
® Modelfit:Thisoptionisvitalandisselectedbydefault.Itprovidesnotonlyastatisticaltestofthemodel’sabilitytopredicttheoutcomevariable(theF-test),butalsothevalueofR(ormultipleR),thecorrespondingR2,andtheadjustedR2.
® Rsquaredchange:ThisoptiondisplaysthechangeinR2resultingfromtheinclusionofanewpredictor(orblockofpredictors).Thismeasureisausefulwaytoassesstheuniquecontributionofnewpredictors(orblocks)toexplainingvarianceintheoutcome.
©Prof.AndyField,2016 www.discoveringstatistics.com Page7
® Descriptives: If selected, this option displays a table of the mean, standard deviation and number ofobservationsofallofthevariablesincludedintheanalysis.Acorrelationmatrixisalsodisplayedshowingthecorrelationbetweenallof thevariablesand theone-tailedprobability foreachcorrelationcoefficient.Thiscorrelationmatrixcanbeusedtoestablishwhetherthereismulticollinearity.
® Part and partial correlations: This option produces the zero-order correlation (the Pearson correlation)between each predictor and the outcome variable. It also produces the partial correlation between eachpredictorandtheoutcome,controllingforallotherpredictorsinthemodel.
® Collinearitydiagnostics:ThisoptionisforobtainingcollinearitystatisticssuchastheVIF,tolerance,eigenvaluesofthescaled,uncentredcross-productsmatrix,conditionindexesandvarianceproportions(seeField,2013,andyourlecturenotes).
® Durbin-Watson:ThisoptionproducestheDurbin-Watsonteststatistic,whichtestsforcorrelationsbetweenerrors.
® Casewisediagnostics:Thisoptionliststheobservedvalueoftheoutcome,thepredictedvalueoftheoutcome,thedifferencebetweenthesevalues(theresidual)andthisdifferencestandardized.Furthermore,itwill listthesevalueseitherforallcases,orjustforcasesforwhichthestandardizedresidualisgreaterthan3(whenthe±signisignored).Thiscriterionvalueof3canbechanged,andIrecommendchangingitto2forreasonsthatwillbecomeapparent.
Figure5:Statisticsdialogboxforregressionanalysis
Regression Plots
Onceyouarebackinthemaindialogbox,clickon toactivatetheregressionplotsdialogboxshowninFigure6.Thisdialogboxprovidesthemeanstospecifyanumberofgraphs,whichcanhelptoestablishthevalidityofsomeregressionassumptions.Mostoftheseplotsinvolvevariousresidualvalues.Ontheleft-handsideofthedialogboxisalistofseveralvariables:
• DEPENDNT(theoutcomevariable).• *ZPRED(thestandardizedpredictedvaluesofthedependentvariablebasedonthemodel).Thesevaluesare
standardizedformsofthevaluespredictedbythemodel.• *ZRESID (thestandardizedresiduals,orerrors).Thesevaluesarethestandardizeddifferencesbetweenthe
observeddataandthevaluesthatthemodelpredicts).• *DRESID(thedeletedresiduals).• *ADJPRED(theadjustedpredictedvalues).• *SRESID(theStudentizedresidual).• *SDRESID(theStudentizeddeletedresidual).Thisvalueisthedeletedresidualdividedbyitsstandarderror.
Thevariableslistedinthisdialogboxallcomeunderthegeneralheadingofresiduals,andarediscussedindetailinmybook(sorryforalloftheself-referencing,butI’mtryingtocondensea60pagechapterintoamanageablehandout!).Forabasicanalysisitisworthplotting*ZRESID(Y-axis)against*ZPRED(X-axis),becausethisplotisusefultodeterminewhether theassumptionsof randomerrorsandhomoscedasticityhavebeenmet (seeealier).Tocreatetheseplots
©Prof.AndyField,2016 www.discoveringstatistics.com Page8
selectavariablefromthelist,andtransferittothespacelabelledeitherXorY(whichrefertotheaxes)byclicking .Whenyouhaveselectedtwovariablesforthefirstplot(asisthecaseinFigure6)youcanspecifyanewplotbyclickingon .Thisprocessclearsthespacesinwhichvariablesarespecified.Ifyouclickon andwouldliketoreturntotheplotthatyoulastspecified,thensimplyclickon .
Youcanalsoselectthetick-boxlabelledProduceallpartialplotswhichwillproducescatterplotsoftheresidualsoftheoutcomevariableandeachofthepredictorswhenbothvariablesareregressedseparatelyontheremainingpredictors.Anyobvious outliers on a partial plot represent cases thatmight haveundue influenceon a predictor’s regressioncoefficient.Also,non-linear relationshipsbetweenapredictorandtheoutcomevariablearemuchmoredetectableusing theseplots. Finally, theyare ausefulwayofdetecting collinearity. Thereare several options forplotsof thestandardized residuals. First, you can select a histogram of the standardized residuals (this is extremely useful forcheckingtheassumptionofnormalityoferrors).Second,youcanaskforanormalprobabilityplot,whichalsoprovidesinformationaboutwhethertheresidualsinthemodelarenormallydistributed.Whenyouhaveselectedtheoptionsyourequire,clickon totakeyoubacktothemainregressiondialogbox.
Figure6:Linearregression:plotsdialogbox
Saving Regression Diagnostics
Inthisweek’slecturewemettwotypesofregressiondiagnostics:thosethathelpusassesshowwellourmodelfitsoursampleandthosethathelpusdetectcasesthathavealargeinfluenceonthemodelgenerated.InSPSSwecanchoosetosavethesediagnosticvariablesinthedataeditor(so,SPSSwillcalculatethemandthencreatenewcolumnsinthedataeditorinwhichthevaluesareplaced).
Clickon inthemainregressiondialogboxtoactivatethesavenewvariablesdialogbox(seeFigure7).Oncethisdialogboxisactive,itisasimplemattertoticktheboxesnexttotherequiredstatistics.MostoftheavailableoptionsareexplainedinField(2013)andFigure7shows,whatIconsidertobe,abareminimumsetofdiagnosticstatistics.StandardizedversionsofthesediagnosticsaregenerallyeasiertointerpretandsoIsuggestselectingtheminpreferencetotheunstandardizedversions.Oncetheregressionhasbeenrun,SPSScreatesacolumninyourdataeditorforeachstatistic requestedand ithasastandardsetofvariablenames todescribeeachone (zpr_1: standardizedpredictedvalue;zre_1:standardizedresidual;coo_1:Cook’sdistance).Afterthename,therewillbeanumberthatreferstotheanalysisthathasbeenrun.So,forthefirstregressionrunonadatasetthevariablenameswillbefollowedbya1,ifyoucarryoutasecondregressionitwillcreateanewsetofvariableswithnamesfollowedbya2andsoon.Whenyouhaveselected the diagnostics you require (by clicking in the appropriate boxes), click on to return to the mainregressiondialogbox.
©Prof.AndyField,2016 www.discoveringstatistics.com Page9
Figure7:Dialogboxforregressiondiagnostics
Bootstrapping
We can get bootstrapped confidence intervals for the regression coefficients by clicking (see last week’shandout).However,thisfunctiondoesn’tworkwhenwehaveusedthe optiontosaveresiduals,sowecan’tuseitnow.However,onceyouhaveruntheanalysisandinspectedtheresidualsandinfluentialcases,youmightwanttore-runtheanalysisselectingthebootstrapoption(andrememberingtodeselectalloftheoptionsforsavingvariables).
A Brief Guide to Interpretation
Model Summary
Themodelsummary(Output1)containstwomodels.Model1referstothefirststageinthehierarchywhenonlyTOSCAisusedasapredictor.Model2referstothefinalmodel(TOSCA,andOBQandIIIiftheyendupbeingincluded).
® In the column labelledR are the valuesof themultiple correlation coefficient between thepredictors and theoutcome.WhenonlyTOSCAisusedasapredictor,thisisthesimplecorrelationbetweenSPAIandTOSCA(0.34).
® The next column gives us a value of R2,which is ameasure of howmuch of the variability in the outcome isaccountedforbythepredictors.Forthefirstmodelitsvalueis0.116,whichmeansthatTOSCAaccountsfor11.6%ofthevariationinsocialanxiety.However,forthefinalmodel(model2),thisvalueincreasesto0.157or15.7%ofthevarianceinSPAI.Therefore,whatevervariablesenterthemodelinblock2accountforanextra(15.7-11.6)4.1%ofthevariance inSPAIscores (this isalsothevalue inthecolumn labelledR-squarechangebutexpressedasapercentage).
® TheadjustedR2givesussomeideaofhowwellourmodelgeneralizesandideallywewouldlikeitsvaluetobethesame,orverycloseto,thevalueofR2.Inthisexamplethedifferenceforthefinalmodelisafairbit(0.157–0.143=0.014or1.4%).Thisshrinkagemeansthatifthemodelwerederivedfromthepopulationratherthanasampleitwouldaccountforapproximately1.4%lessvarianceintheoutcome.
® Finally, ifyourequestedtheDurbin-Watsonstatistic itwillbefoundinthe lastcolumn.Thisstatistic informsusaboutwhethertheassumptionofindependenterrorsistenable.Thecloserto2thatthevalueis,thebetter,andforthesedatathevalueis2.084,whichissocloseto2thattheassumptionhasalmostcertainlybeenmet.
©Prof.AndyField,2016 www.discoveringstatistics.com Page10
Output1
ANOVA Table
Output2containsananalysisofvariance(ANOVA)thattestswhetherthemodelissignificantlybetteratpredictingtheoutcomethanusingthemeanasa‘bestguess’.Thistableisagainsplitintotwosections:oneforeachmodel.IftheimprovementduetofittingtheregressionmodelismuchgreaterthantheinaccuracywithinthemodelthenthevalueofFwillbegreaterthan1andSPSScalculatestheexactprobabilityofobtainingthevalueofFatleastthisbigiftherewerenoeffect.FortheinitialmodeltheF-ratiois16.52(p<.001),andforthesecondmodelthevalueofFis11.61,whichisalsohighlysignificant(p<.001).Wecaninterprettheseresultsasmeaningthatthefinalmodelsignificantlyimprovesourabilitytopredicttheoutcomevariable.
Output2
Model Parameters
Thenextpartoftheoutput isconcernedwiththeparametersofthemodel.ThefirststepinourhierarchyincludedTOSCAandalthoughtheseparametersareinterestinguptoapoint,we’remoreinterestedinthefinalmodelbecausethis includesallpredictorsthatmakeasignificantcontributiontopredictingsocialanxiety.So,we’ll lookonlyatthelowerhalfofthetable(Model2).
Inmultipleregressionthemodeltakestheformofanequationthatcontainsacoefficient(b)foreachpredictor.Thefirstpartofthetablegivesusestimatesforthesebvaluesandthesevaluesindicatetheindividualcontributionofeachpredictortothemodel.
Thebvaluestellusabouttherelationshipbetweensocialanxietyandeachpredictor.Ifthevalueispositivewecantellthatthereisapositiverelationshipbetweenthepredictorandtheoutcomewhereasanegativecoefficientrepresentsanegativerelationship.Forthesedatabothpredictorshavepositivebvaluesindicatingpositiverelationships.So,asshame(TOSCA)increases,socialanxietyincreasesandasobsessivebeliefsincreasesodoessocialanxiety;Thebvaluesalsotellustowhatdegreeeachpredictoraffectstheoutcomeiftheeffectsofallotherpredictorsareheldconstant.
Eachofthesebetavalueshasanassociatedstandarderrorindicatingtowhatextentthesevalueswouldvaryacrossdifferentsamples,andthesestandarderrorsareusedtodeterminewhetherornotthebvaluedifferssignificantlyfromzero(usingthet-statistic).Therefore, if thet-testassociatedwithabvalue issignificant (if thevalue inthecolumnlabelledSig. is less than0.05) then thatpredictor ismakinga significantcontribution to themodel.For thismodel,
Model Summaryc
.340a .116 .109 28.38137 .116 16.515 1 126 .000
.396b .157 .143 27.82969 .041 6.045 1 125 .015 2.084
Model12
R R SquareAdjustedR Square
Std. Error ofthe Estimate
R SquareChange F Change df1 df2 Sig. F Change
Change StatisticsDurbin-W
atson
Predictors: (Constant), Shame (TOSCA)a.
Predictors: (Constant), Shame (TOSCA), OCD (Obsessive Beliefs Questionnaire)b.
Dependent Variable: Social Anxiety (SPAI)c.
ANOVAc
13302.700 1 13302.700 16.515 .000a
101493.3 126 805.502114796.0 127
17984.538 2 8992.269 11.611 .000b
96811.431 125 774.491114796.0 127
RegressionResidualTotalRegressionResidualTotal
Model1
2
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), Shame (TOSCA)a.
Predictors: (Constant), Shame (TOSCA), OCD (Obsessive Beliefs Questionnaire)b.
Dependent Variable: Social Anxiety (SPAI)c.
©Prof.AndyField,2016 www.discoveringstatistics.com Page11
Shame(TOSCA),t(125)=3.16,p=.002,andobsessivebeliefs,t(125)=2.46,p=.015,aresignificantpredictorsofsocialanxiety. From themagnitude of the t-statisticswe can see that the Shame (TOSCA) had slightlymore impact thanobsessivebeliefs.Thisconclusionisalsoborneoutbythestandardizedbetavalues,whicharemeasuredinstandarddeviation units and so are directly comparable: the standardized beta values for Shame (TOSCA) is 0.273, and forobsessivebeliefsis0.213.Thistellsusthatshamehasslightlymoreimpactinthemodel.
Output3
Excluded Variables
AteachstageofaregressionanalysisSPSSprovidesasummaryofanyvariablesthathavenotyetbeenenteredintothemodel. Inahierarchicalmodel, this summaryhasdetailsof thevariables thathavebeenspecified tobeentered insubsequentsteps,andinstepwiseregressionthistablecontainssummariesofthevariablesthatSPSSisconsideringenteringintothemodel.Thesummarygivesanestimateofeachpredictor’sbvalueifitwasenteredintotheequationatthispointandcalculatesat-testforthisvalue.Inastepwiseregression,SPSSshouldenterthepredictorwiththehighestt-statisticandwillcontinueenteringpredictorsuntiltherearenoneleftwitht-statisticsthathavesignificancevalueslessthan0.05.Therefore,thefinalmodelmightnotincludeallofthevariablesyouaskedSPSStoenter.
Inthiscaseittellsusthatiftheinterpretationofintrusions(III)isenteredintothemodelitwouldnothaveasignificantimpactonthemodel’sabilitytopredictsocialanxiety,t=–0.049,p=.961.Infactthesignificanceofthisvariableisalmost1 indicating itwouldhavevirtuallyno impactwhatsoever(notealsothat itsbetavalue isextremelyclosetozero!).
Output4
Checking for Bias SPSSproducesasummarytableoftheresidualstatisticsandtheseshouldbeexaminedforextremecases.Output5showsanycasesthathaveastandardizedresiduallessthan−2orgreaterthan2(rememberthatwechangedthedefaultcriterionfrom3to2).Inanordinarysamplewewouldexpect95%ofcasestohavestandardizedresidualswithinabout±2.Wehaveasampleof134,therefore it isreasonabletoexpectabout7cases(5%approx..)tohavestandardizedresidualsoutsideoftheselimits.FromOutput5wecanseethatwehave7cases(5%)thatareoutsideofthelimits:therefore,oursampleisbasicallywhatwewouldexpect. Inaddition,99%ofcasesshouldliewithin±2.5andsowewouldexpectonly1%ofcasestolieoutsideoftheselimits.Fromthecaseslistedhere,itisclearthattwocases(1.5%)
Coefficientsa
-54.368 28.618 -1.900 .060 -111.002 2.26727.448 6.754 .340 4.064 .000 14.081 40.814 .340 .340 .340 1.000 1.000
-51.493 28.086 -1.833 .069 -107.079 4.09422.047 6.978 .273 3.160 .002 8.237 35.856 .340 .272 .260 .901 1.110
6.920 2.815 .213 2.459 .015 1.350 12.491 .299 .215 .202 .901 1.110
(Constant)Shame (TOSCA)(Constant)Shame (TOSCA)OCD (ObsessiveBeliefs Questionnaire)
Model1
2
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Lower Bound Upper Bound95% Confidence Interval for B
Zero-order Partial PartCorrelations
Tolerance VIFCollinearity Statistics
Dependent Variable: Social Anxiety (SPAI)a.
Excluded Variablesc
.132a
1.515 .132 .134 .917 1.091 .917
.213a
2.459 .015 .215 .901 1.110 .901
-.005b
-.049 .961 -.004 .541 1.849 .531
OCD (Interpretation ofIntrusions Inventory)OCD (ObsessiveBeliefs Questionnaire)OCD (Interpretation ofIntrusions Inventory)
Model1
2
Beta In t Sig.Partial
Correlation Tolerance VIFMinimumTolerance
Collinearity Statistics
Predictors in the Model: (Constant), Shame (TOSCA)a.
Predictors in the Model: (Constant), Shame (TOSCA), OCD (Obsessive Beliefs Questionnaire)b.
Dependent Variable: Social Anxiety (SPAI)c.
©Prof.AndyField,2016 www.discoveringstatistics.com Page12
lieoutsideofthelimits(cases8and45).Therefore,oursampleappearstoconformroughlytowhatwewouldexpectforafairlyaccuratemodel.Therearealsonostandardizedresidualsgreaterthan3,whichisgoodnews.
WeshouldalsoscanthedataeditortoseeifanycaseshaveCook’sdistance(COO_1)greaterthan1.[YoucouldalsouseSPSStofindthemaximumvalueofCook’sdistancebyusingthedescriptivestatisticscommand].YoushouldfindthatallofCook’sdistancesarebelow1,whichmeansthatnocasesarehavinganundueinfluence.
Output5
Figure8:P-Pplot(topleft),aplotofstandardizedresidualsvs.standardizedpredictedvalues(topright),andpartialplotsofsocialanxietyagainstshame(bottomleft)andOBQ(bottomright)
WecanusehistogramsandP-Pplotstolookfornormalityoftheresiduals.Figure8(topleft)showstheP-Pplotforourmodel. The dots hover fairly close to the diagonal line indicating normality in the residuals. We can look forheteroscedasticity and non-linearity using a plot of standardized residuals against standardized predicted values. IfeverythingisOKthenthisgraphshouldlooklikearandomarrayofdots,ifthegraphfunnelsoutthenthatisasignof
P-P PLot ZResid vs. ZPred
Partial Plot: Shame Partial Plot: OBQ
©Prof.AndyField,2016 www.discoveringstatistics.com Page13
heteroscedasticityandanycurvesuggestnonlinearity(seeearlier).Figure8(topright)showstheplotforourmodel.Notehowthepointsarerandomlyandevenlydispersedthroughouttheplot.Thispatternisindicativeofasituationinwhichtheassumptionsoflinearityandhomoscedasticityhavebeenmet.ComparethiswiththeexamplesinFigure1.
Figure8alsoshowsthepartialplots,whicharescatterplotsoftheresidualsoftheoutcomevariableandeachofthepredictorswhenbothvariablesareregressedseparatelyontheremainingpredictors.Obviousoutliersonapartialplotrepresentcasesthatmighthaveundueinfluenceonapredictor’sregressioncoefficientandthatnon-linearrelationshipsandheteroscedasticitycanbedetectedusingtheseplotsaswell.Forshame(Figure8bottomleft)thepartialplotshowsthepositiverelationshiptosocialanxiety.Therearenoobviousoutliersonthisplot,butthecloudofdotsisabitfunnel-shaped,possiblyindicatingsomeheteroscedasticity.ForOBQ(Figure8,bottomright)theplotagainshowsapositiverelationshiptosocialanxiety.Therearenoobviousoutliersonthisplot.
Finally,theVIFvaluesarewellbelow10whichreassuresusthatmulticollinearityisnotaproblem.
Writing Up Multiple Regression Analysis Ifyourmodelhasseveralpredictorsthanyoucan’treallybeatasummarytableasaconcisewaytoreportyourmodel.Asabareminimumreportthebetas,theirconfidenceinterval,significancevalueandsomegeneralstatisticsaboutthemodel(suchastheR2).Thestandardizedbetavaluesandthestandarderrorsarealsoveryuseful.So,basically,youwanttoreproducethetablelabelledCoefficientsfromtheSPSSoutputandomitsomeofthenon-essentialinformation.Fortheexampleinthischapterwemightproduceatablelikethatin.
SeeifyoucanlookbackthroughtheSPSSoutputinthischapterandworkoutfromwherethevaluescame.Thingstonoteare:(1)I’veroundedoffto2decimalplacesthroughoutbecausethisisareasonablelevelofprecisiongiventhevariablesmeasured; (2) for thestandardizedbetas there isnozerobefore thedecimalpoint (because thesevaluesshouldn’texceed1)butforallothervalueslessthan1thezeroispresent;(3)oftenyou’llseethesignificanceofthevariableisdenotedbyanasteriskwithafootnotetoindicatethesignificancelevelbeingusedbutit’sbetterpracticetoreportexactp-values;(4)theR2fortheinitialmodelandthechangeinR2(denotedas∆R2)foreachsubsequentstepofthemodelarereportedbelowthetable;and(5)inthetitleIhavementionedthatconfidenceintervalsandstandarderrorsinthetablearebasedonbootstrapping:thisinformationisimportantforreaderstoknow
Table1:Linearmodelofpredictorsofsocialanxiety(SPAI).95%confidenceintervalsreportedinparentheses.
b SEB b pStep1 Constant -54.37
(-111.00,2.27)28.62 p=.06
Shame(TOSCA) 27.45(14.08,40.81)
6.75 .34 p<.001
Step2 Constant −51.49
(-107.08,4.09)28.09 p=.069
Shame(TOSCA) 22.05(8.24,35.86)
6.98 .27 p=.002
OCD(OBQ) 6.92(1.35,12.49)
2.82 .21 p=.015
Note.R2=.12forStep1:∆R2=.04forStep2(ps<.05).
Tasks Task 1 Afashionstudentwasinterestedinfactorsthatpredictedthesalariesofcatwalkmodels.Shecollecteddatafrom231models.Foreachmodelsheaskedthemtheirsalaryperdayondayswhentheywereworking(salary),theirage(age),howmanyyearstheyhadworkedasamodel(years),andthengotapanelofexpertsfrommodellingagenciestoratetheattractivenessofeachmodelasapercentagewith100%beingperfectlyattractive(beauty).ThedataareinthefileSupermodel.savonthecoursewebsite.Conductamultipleregressiontoseewhichfactorspredictamodel’ssalary?(Answerstothistaskcanbefoundatwww.uk.sagepub.com/field4e/study/smartalex/chapter8.pdf).
©Prof.AndyField,2016 www.discoveringstatistics.com Page14
Howmuchvariancedoesthefinalmodelexplain?
YourAnswers:
Whichvariablessignificantlypredictsalary?
YourAnswers:
FillinthevaluesforthefollowingAPAformattableoftheresults:
b SEb b p
Constant
Age
YearsasaModel
Attractiveness
Note.R2=
Writeouttheregressionequationforthefinalmodel.
YourAnswers:
Aretheresidualsasyouwouldexpectforagoodmodel?
YourAnswers:
Isthereevidenceofnormalityoferrors,homoscedasticityandnomulticollinearity?
©Prof.AndyField,2016 www.discoveringstatistics.com Page15
YourAnswers:
Task 2 Coldwell,PikeandDunn(2006)investigatedwhetherhouseholdchaospredictedchildren’sproblembehaviouroverandaboveparenting.Theycollecteddatafrom118two-parentfamilies.Foreachfamilytheyrecordedtheageandgenderofboththeolderandyoungersibling;age_child1,gender_child1,age_child1andgender_child2respectively.Theytheninterviewed each child about their relationship with their parent’s using the Berkeley Puppet Interview (BPI). Theinterviewmeasuredeachchild’srelationshipwitheachparentalongtwodimensions:(1)warmth/enjoyment,and(2)anger/hostility.Higherscoresindicatemoreanger/hostilityandwarmth/enjoymentrespectively.EachparentwastheninterviewedabouttheirrelationshipwitheachoftheirchildrenusingTheParent-childRelationshipScale.Thisresultedinscoresforparent-childrelationshippositivityandparent-childrelationshipnegativity.Overall,thesemeasuresresultinalotofvariables:
Mum Dad
Measures Child1 Child2 Child1 Child2
Warmth/Enjoyment mum_warmth_child1 mum_warmth_child2 dad_warmth_child1 dad_warmth_child2
Anger/Hostility mum_anger_child1 mum_anger_child2 dad_anger_child1 dad_anger_child2
PositiveRelationship
mum_pos_child1 mum_pos_child2 dad_pos_child1 dad_pos_child2
NegativeRelationship
mum_neg_child1 mum_neg_child2 dad_neg_child1 dad_neg_child2
Household chaos (chaos) was assessed using the Confusion, Hubbub, And Order Scale (CHAOS). There were twooutcomevariables (one for each child) thatmeasured children’s adjustment (sdq_child1 and sdq_child2) using theStrengthsandDifficultiesQuestionnaire:thehigherthescore,themoreproblembehaviourthechildisreportedtobedisplaying.
ThedataareinthefileCHAOS.savonthecoursewebsite.Totestwhetherhouseholdchaoswaspredictiveofchildren’sproblembehaviouroverandaboveparenting,conductfourhierarchicalregressions:
(1) Maternalrelationshipwithchild1(2) Maternalrelationshipwithchild2(3) Paternalrelationshipwithchild1(4) Paternalrelationshipwithchild2
Eachhierarchicalregressionconsistsofthreesteps.First,enterchildageandchildgenderascontrolvariables.Inthesecondstepaddthevariablesmeasuringparent-childpositivity,parent-childnegativity,parent-childwarmth,parent-childanger.Finally,inthethirdstep,chaosshouldbeadded.Thecrucialtestofthehypothesisliesinthefinalstep.Toconfirmthathouseholdchaosispredictiveofchildren’sproblembehaviouroverandaboveparenting,thisthirdstepmustresultinasignificantR2change.
©Prof.AndyField,2016 www.discoveringstatistics.com Page16
Whatconclusionscanyoudrawfromtheseanalyses?
YourAnswers:
Look at Coldwell, J., Pike, A.&Dunn, J. (2006).Household chaos - linkswith parenting and childbehaviour.JournalofChildPsychologyandPsychiatry,47,1116-1122.(Onthecoursewebsite).Howdo your results and interpretation compare to those reported?Reflect uponhowyouhaveusedregressionasatooltoansweranimportantpsychologicalquestion.
YourAnswers:
FillinthevaluesforthefollowingAPAformattableoftheresults:
Mother-childrelationship Father-childrelationship
Oldersibling
SDQ
Youngersibling
SDQ
Oldersibling
SDQ
Youngersibling
SDQ
TotalR2= TotalR2= TotalR2= TotalR2=
bDR2 bDR2 bDR2 bDR2
Step1
Childage
Childgender
©Prof.AndyField,2016 www.discoveringstatistics.com Page17
Step2
Childage
Childgender
Childrptparent-childpositivity
Childrptparent-childnegativity
Parentrptparent-childpositivity
Parentrptparent-childnegativity
Step3
Childage
Childgender
Childrptparent-childpositivity
Childrptparent-childnegativity
Parentrptparent-childpositivity
Parentrptparent-childnegativity
CHAOS
*p<.05,**p<.01,***p<.001
Task 3 Complete the multiple choice questions for Chapter 8 on the companion website to Field (2013):https://studysites.uk.sagepub.com/field4e/study/mcqs.htm.Ifyougetanywrong,re-readthishandout(orField,2013,Chapter8)anddothemagainuntilyougetthemallcorrect.
Task 4 Gobacktotheoutputforlastweek’stask(doeslisteningtoheavymetalpredictsuiciderisk).Isthemodelvalid(i.e.arealloftheassumptionsmet?)?
References Berry,W.D.(1993).Understandingregressionassumptions.Sageuniversitypaperseriesonquantitativeapplicationsin
thesocialsciences,07–092.NewburyPark,CA:Sage.
Cook,R.D.,&Weisberg,S.(1982).Residualsandinfluenceinregression.NewYork:Chapman&Hall.
Durbin,J.,&Watson,G.S.(1951).Testingforserialcorrelationinleastsquaresregression,II.Biometrika,30,159-178.
Field,A.P.(2013).DiscoveringstatisticsusingIBMSPSSStatistics:Andsexanddrugsandrock'n'roll(4thed.).London:Sage.
Terms of Use Thishandoutcontainsmaterialfrom:
Field,A.P.(2013).DiscoveringstatisticsusingSPSS:andsexanddrugsandrock‘n’roll(4thEdition).London:Sage.
ThismaterialiscopyrightAndyField(2000-2016).
This document is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 InternationalLicense (https://creativecommons.org/licenses/by-nc-nd/4.0/), basically you can use it for teaching and non-profitactivitiesbutnotmeddlewithitwithoutpermissionfromtheauthor.
top related