principal components analysis; exploratory factor analysis ...€¦ · the logic of exploratory...
TRANSCRIPT
PrincipalComponentsAnalysis;ExploratoryFactorAnalysis;
ConductingExploratoryAnalyseswithCFA
(or…WhyIHateEFA)
EPSY905:MultivariateAnalysisSpring2016
Lecture#13– April27,2016
EPSY905:PCA,EFA,andCFA
Today’sClass
• Methodsforexploratoryfactoranalysis(EFA)Ø PrincipalComponents-based (TERRIBLE)Ø MaximumLikelihood-based Exploratory FactorAnalysis(BAD)Ø ExploratoryStructuralEquationModeling(ALSOBAD)
• ComparisonsofCFAandEFA
• HowtodoexploratoryanalyseswithCFAØ StructureofnoitemsknownØ Structureofsomeitemsknown
EPSY905:PCA,EFA,andCFA 2
TheLogicofExploratoryAnalyses• Exploratoryanalysesattempt todiscover hiddenstructure indatawithlittletono
user inputØ Asidefromtheselectionofanalysisandestimation
• Theresults fromexploratoryanalysescanbemisleadingØ IfdatadonotmeetassumptionsofmodelormethodselectedØ IfdatahavequirksthatareidiosyncratictothesampleselectedØ IfsomecasesareextremerelativetoothersØ Ifconstraintsmadebyanalysisareimplausible
• Sometimes,exploratory analysesareneededØ MustconstructananalysisthatcapitalizesontheknownfeaturesofdataØ Therearebetterwaystoconductsuchanalyses
• Often,exploratoryanalysesarenotneededØ Butareconductedanyway– seealotofreportsofscaledevelopmentthatstartwiththeideathat
aconstructhasacertainnumberofdimensions
EPSY905:PCA,EFA,andCFA 3
ADVANCEDMATRIXOPERATIONS
EPSY905:PCA,EFA,andCFA 4
AGuidingExample
• Todemonstratesomeadvancedmatrixalgebra,wewillmakeuseofdata
• IcollecteddataSATtestscoresforboththeMath(SATM)andVerbal(SATV)sectionsof1,000students
• Thedescriptivestatisticsofthisdatasetaregivenbelow:
EPSY905:PCA,EFA,andCFA 5
MatrixTrace
• Forasquarematrix𝚺 withp rows/columns,thetraceisthesumofthediagonalelements:
𝑡𝑟𝚺 =%𝑎''
(
')*• Forourdata,thetraceofthecorrelationmatrixis2
Ø Forallcorrelationmatrices, thetrace isequaltothenumberofvariablesbecausealldiagonalelements are1
• Thetracewillbeconsideredthetotalvarianceinprincipalcomponentsanalysis
Ø Usedasatargettorecoverwhenapplyingstatisticalmodels
EPSY905:PCA,EFA,andCFA 6
MatrixDeterminants• Asquarematrixcanbecharacterizedbyascalarvaluecalledadeterminant:
det𝚺 = 𝚺
• CalculationofthedeterminantbyhandistediousØ Ourdeterminant was0.3916Ø Computers canhavedifficulties withthiscalculation(unstable incases)
• Thedeterminantisusefulinstatistics:Ø ShowsupinmultivariatestatisticaldistributionsØ Isameasureof“generalized”varianceofmultiplevariables
• Ifthedeterminantispositive,thematrixiscalledpositivedefiniteØ Isinvertable
• Ifthedeterminantisnotpositive,thematrixiscallednon-positivedefinite
Ø NotinvertableEPSY905:PCA,EFA,andCFA 7
MatrixOrthogonality
• Asquarematrix𝚲 issaidtobeorthogonalif:𝚲𝚲/ = 𝚲/𝚲 = 𝐈
• Orthogonalmatricesarecharacterizedbytwoproperties:1. Thedotproductofallrowvectormultiplesisthezerovector
w Meaning vectorsareorthogonal (oruncorrelated)2. Foreach rowvector, thesumofallelements isone
w Meaning vectorsare“normalized”
• ThematrixaboveisalsocalledorthonormalØ Thediagonal isequal to1(eachvectorhasaunit length)
• Orthonormalmatricesareusedinprincipalcomponentsandexploratoryfactoranalysis
EPSY905:PCA,EFA,andCFA 8
EigenvaluesandEigenvectors
• Asquarematrix𝚺 canbedecomposedintoasetofeigenvalues𝛌 andasetofeigenvectors𝐞
𝚺𝐞 = λ𝐞
• EacheigenvaluehasacorrespondingeigenvectorØ Thenumberequaltothenumberofrows/columnsof𝚺Ø Theeigenvectors areallorthogonal
• Principalcomponentsanalysisuseseigenvaluesandeigenvectorstoreconfiguredata
EPSY905:PCA,EFA,andCFA 9
EigenvaluesandEigenvectorsExample
• InourSATexample,thetwoeigenvaluesobtainedwere:𝜆* = 1.78𝜆9 = 0.22
• Thetwoeigenvectorsobtainedwere:
𝐞* =0.710.71 ; 𝐞9 =
0.71−0.71
• Thesetermswillhavemuchgreatermeaningprincipalcomponentsanalysis
EPSY905:PCA,EFA,andCFA 10
SpectralDecomposition
• Usingtheeigenvaluesandeigenvectors,wecanreconstructtheoriginalmatrixusingaspectraldecomposition:
𝚺 =%𝜆'𝐞'𝐞'/(
')*• Forourexample,wecangetbacktoouroriginalmatrix:
𝐑* = 𝜆*𝐞*𝐞*/ = 1.78 .71.71 .71 .71 = .89 .89
.89 .89
𝐑9 = 𝐑* + 𝜆9𝐞9𝐞9/
= .89 .89.89 .89 + 0.22 .71
−.71 .71 −.71 = 1.00 0.780.78 1.00
EPSY905:PCA,EFA,andCFA 11
SpectralDecomposition inR
EPSY905:PCA,EFA,andCFA 12
AdditionalEigenvalueProperties
• Thematrixtraceisthesumoftheeigenvalues:
𝑡𝑟𝚺 =%𝜆'
(
')*Ø Inourexample, the𝑡𝑟𝐑 = 1.78 + .22 = 2
• Thematrixdeterminantcanbefoundbytheproductoftheeigenvalues
𝚺 =B𝜆'
(
')*Ø Inourexample 𝐑 = 1.78 ∗ .22 = .3916
EPSY905:PCA,EFA,andCFA 13
ANINTRODUCTIONTOPRINCIPALCOMPONENTSANALYSIS
EPSY905:PCA,EFA,andCFA 14
PCAOverview
• PrincipalComponentsAnalysis(PCA)isamethodforre-expressingthecovariance(oroftencorrelation)betweenasetofvariables
Ø There-expression comesfromcreatingasetofnewvariables(linearcombinations) oftheoriginalvariables
• PCAhastwoobjectives:1. Datareduction
w Movingfrommanyoriginalvariablesdowntoafew“components”
2. Interpretationw Determiningwhichoriginalvariablescontributemosttothenew“components”
EPSY905:PCA,EFA,andCFA 15
GoalsofPCA
• ThegoalofPCAistofindasetofkprincipalcomponents(compositevariables)that:
Ø Ismuchsmaller innumberthantheoriginalsetofV variablesØ Accountsfornearlyallofthetotalvariance
w Totalvariance=traceofcovariance/correlationmatrix
• Ifthesetwogoalscanbeaccomplished,thenthesetofk principalcomponentscontainsalmostasmuchinformationastheoriginalV variables
Ø Meaning – thecomponents cannowreplacetheoriginalvariables inanysubsequentanalyses
EPSY905:PCA,EFA,andCFA 16
QuestionswhenusingPCA
• PCAanalysesproceedbyseekingtheanswerstotwoquestions:
1. Howmanycomponents(newvariables)areneededto“adequately”representtheoriginaldata?Ø Thetermadequately isfuzzy(andwillbeintheanalysis)
2. (once#1hasbeenanswered):Whatdoeseachcomponentrepresent?Ø Theterm“represent” isalsofuzzy
EPSY905:PCA,EFA,andCFA 17
PCAFeatures
• PCAoftenrevealsrelationshipsbetweenvariablesthatwerenotpreviouslysuspected
Ø Newinterpretations ofdataandvariables oftenstemfromPCA
• PCAusuallyservesasmoreofameanstoanendratherthananendititself
Ø Components (thenewvariables) areoftenusedinotherstatistical techniques
w Multiple regression/ANOVAw Clusteranalysis
• Unfortunately,PCAisoftenintermixedwithExploratoryFactorAnalysis
Ø Don’t.Pleasedon’t.Pleasemakeitstop.
EPSY905:PCA,EFA,andCFA 18
PCADetails
• Notation:𝑍 areournewcomponentsand𝐘 isouroriginaldatamatrix(withN observationsandV variables)
Ø Wewillletp beourindexforasubject
• Thenewcomponentsarelinearcombinations:𝑍(* = 𝐞*/𝐘 = 𝑒**𝑌(* + 𝑒9*𝑌(9 + ⋯+ 𝑒K*𝑌(K𝑍(9 = 𝐞9/𝐘 = 𝑒*9𝑌(* + 𝑒99𝑌(9 + ⋯+ 𝑒K9𝑌(K
⋮𝑍(K = 𝐞K/𝐘 = 𝑒*K𝑌(* + 𝑒9K𝑌(9 + ⋯+ 𝑒KK𝑌(K
• Theweightsofthecomponents(𝑒NO) comefromtheeigenvectorsofthecovarianceorcorrelationmatrixforcomponent𝑘 andvariable𝑗
EPSY905:PCA,EFA,andCFA 19
DetailsAbouttheComponents
• Thecomponents(𝑍) areformedbytheweightsoftheeigenvectorsofthecovarianceorcorrelationmatrixoftheoriginaldata
Ø Thevariance ofacomponent isgivenbytheeigenvalue associatedwiththeeigenvector forthecomponent
• Usingtheeigenvalueandeigenvectorsmeans:Ø Eachsuccessive componenthaslowervariance
w Var(Z1)>Var(Z2)>…>Var(Zv)Ø Allcomponents areuncorrelatedØ Thesumofthevariances oftheprincipalcomponents isequaltothetotalvariance:
%𝑉𝑎𝑟 𝑍T = 𝑡𝑟𝚺 = % 𝜆T
K
T)*
K
T)*
EPSY905:PCA,EFA,andCFA 20
PCAonourExample
• WewillnowconductaPCAonthecorrelationmatrixofoursampledata
Ø Thisexample isgivenfordemonstration purposes – typicallywewillnotdoPCAonsmallnumbersofvariables
EPSY905:PCA,EFA,andCFA 21
PCAinR
• TheRfunctionthatdoesprincipalcomponentsiscalledprcomp()
EPSY905:PCA,EFA,andCFA 22
GraphicalRepresentation
• PlottingthecomponentsandtheoriginaldatasidebysiderevealsthenatureofPCA:
Ø ShownfromPCAofcovariancematrix
EPSY905:PCA,EFA,andCFA 23
TheGrowthofGamblingAccess
• Inpast25years:Ø Anexponential increase intheaccessibilityofgambling
Ø Anincreased rateofwithproblemorpathologicalgambling(Volberg,2002,Welte etal.,2009)
• Hence,thereisaneedtobetter:Ø Understand theunderlyingcausesofthedisorderØ ReliablyidentifypotentialpathologicalgamblersØ Provideeffective treatment interventions
EPSY905:PCA,EFA,andCFA 24
PathologicalGambling:DSMDefinition
• Tobediagnosedasapathologicalgambler,anindividualmustmeet5of10definedcriteria:
1. Ispreoccupiedwithgambling2. Needs togamblewithincreasing
amountsofmoneyinordertoachieve thedesiredexcitement
3. Hasrepeated unsuccessful efforts tocontrol,cutback,orstopgambling
4. Isrestlessorirritablewhenattemptingtocutdownorstopgambling
5. Gamblesasawayofescapingfromproblemsorrelievingadysphoricmood
6. After losingmoneygambling,oftenreturnsanotherdaytogeteven
7. Liestofamilymembers, therapist, orothers toconcealtheextentofinvolvementwithgambling
8. Hascommittedillegalactssuchasforgery, fraud,theft,orembezzlement tofinancegambling
9. Hasjeopardizedorlostasignificantrelationship, job,educational,orcareeropportunitybecauseofgambling
10. Reliesonothers toprovidemoneytorelieveadesperate financialsituationcausedbygambling
EPSY905:PCA,EFA,andCFA 25
ResearchonPathologicalGambling
• Inordertostudytheetiologyofpathologicalgambling,morevariabilityinresponseswasneeded
• TheGamblingResearchInstrument(Feasel,Henson,&Jones,2002)wascreatedwith41Likert-typeitems
Ø Itemsweredevelopedtomeasureeachcriterion
• Exampleitems(ratings:StronglyDisagreetoStronglyAgree):Ø IworrythatIamspendingtoomuchmoneyongambling (C3)Ø TherearefewthingsIwouldratherdothangamble (C1)
• TheinstrumentwasusedonasampleofexperiencedgamblersfromariverboatcasinoinaFlatMidwesternState
Ø Casinopatronsweresolicitedafterplayingroulette
EPSY905:PCA,EFA,andCFA 26
TheGRIItems
• TheGRIuseda6-pointLikert scaleØ 1:StronglyDisagreeØ 2:DisagreeØ 3:SlightlyDisagreeØ 4:SlightlyAgreeØ 5:AgreeØ 6:StronglyAgree
• Tomeettheassumptionsoffactoranalysis,wewilltreattheseresponsesasbeingcontinuous
Ø Thisistenuousatbest,butoftenisthecase infactoranalysisØ Categorical itemswouldbebetter….but you’dneedanothercourse forhowtodothat
w Hint: ItemResponse Models
EPSY905:PCA,EFA,andCFA 27
TheSample
• Datawerecollectedfromtwosources:Ø 112“experienced” gamblers
w ManyfromanactualcasinoØ 1192college students froma“rectangular”midwestern state
w Manynevergambledbefore
• Today,wewillcombinebothsamplesandtreatthemashomogenous– onesampleof1304subjects
EPSY905:PCA,EFA,andCFA 28
Final10ItemsontheScale
Item Criterion Question
GRI1 3 Iwouldliketocutbackonmygambling.
GRI3 6IfIlostalotofmoneygamblingoneday,Iwouldbemorelikelytowanttoplayagainthefollowingday.
GRI5 2Ifinditnecessarytogamblewithlargeramountsofmoney(thanwhenIfirstgambled)forgamblingtobeexciting.
GRI9 4 IfeelrestlesswhenItrytocutdownorstopgambling.
GRI10 1 ItbothersmewhenIhavenomoneytogamble.GRI13 3 Ifinditdifficulttostopgambling.
GRI14 2 IamdrawnmorebythethrillofgamblingthanbythemoneyIcouldwin.
GRI18 9 Myfamily,coworkers,orotherswhoareclosetomedisapproveofmygambling.GRI21 1 Itishardtogetmymindoffgambling.
GRI23 5 Igambletoimprovemymood.
EPSY905:PCA,EFA,andCFA 29
PCAwithGamblingItems
• ToshowhowPCAworkswithalargersetofitems,wewillexaminethe10GRIitems(theonesthatfitaone-factorCFAmodel)
• TODOTHISYOUMUSTIMAGINE:Ø THESEWERETHEONLY10ITEMSYOUHADØ YOUWANTEDTOREDUCETHE10ITEMSINTO1OR2COMPONENTVARIABLES
• CAPITALLETTERSAREUSEDASYOUSHOULDNEVERDOAPCAAFTERRUNNINGACFA– THEYAREFORDIFFERENTPURPOSES!
EPSY905:PCA,EFA,andCFA 30
Question#1:HowManyComponents?
• Toanswerthequestionofhowmanycomponents,twomethodsareused:
Ø Screeplotofeigenvalues (lookingforthe“elbow”)Ø Varianceaccounted for(shouldbe>70%)
• Wewillgowith4components:(varianceaccountedforVAC=75%)
• Varianceaccountedforisforthetotalsamplevariance
EPSY905:PCA,EFA,andCFA 31
PlotstoAnswerHowManyComponents
EPSY905:PCA,EFA,andCFA 32
Question#2:WhatDoesEachComponentRepresent?
• Toanswerquestion#2– welookattheweightsoftheeigenvectors(hereistheunrotated solution)
EPSY905:PCA,EFA,andCFA 33
FinalResult:FourPrincipalComponents
• Usingtheweightsoftheeigenvectors,wecancreatefournewvariables– thefourprincipalcomponents
• EachoftheseisuncorrelatedwitheachotherØ Thevariance ofeachisequaltothecorresponding eigenvalue
• WewouldthenusetheseinsubsequentanalysesEPSY905:PCA,EFA,andCFA 34
PCASummary
• PCAisadatareductiontechniquethatreliesonthemathematicalpropertiesofeigenvaluesandeigenvectors
Ø Usedtocreatenewvariables (smallnumber)outoftheolddata (lotsofvariables)
Ø Thenewvariables areprincipalcomponents (theyarenotfactorscores)
• PCAappearedfirstinthepsychometricliteratureØ Many“factoranalysis”methodsusedvariants ofPCAbefore likelihood-basedstatisticswereavailable
• Currently,PCA(orvariants)methodsarethedefaultoptioninSPSSandSAS(PROCFACTOR)
EPSY905:PCA,EFA,andCFA 35
PotentiallySolvableStatisticalIssuesinPCA• ThetypicalPCAanalysisalsohasafewstatisticalconcerns
Ø SomeofthesecanbesolvedifyouknowwhatyouaredoingØ Thetypicalanalysis(usingprogramdefaults)doesnotsolvethese
• Missingdataisomittedusinglistwise deletion– biasespossibleØ CoulduseMLtoestimatecovariancematrix,butthenwouldhavetoassumemultivariatenormality
Ø CoulduseMItoimputedata
• Thedistributionsofvariablescanbeanything…butvariableswithmuchlargervarianceswilllookliketheycontributemoretoeachcomponent
Ø Couldstandardizevariables– butsomecan’tbestandardizedeasily(thinkgender)
• Thelackofstandarderrorsmakesthecomponentweights(eigenvectorelements)hardtointerpret
Ø Canusearesampling/bootstrapanalysistogetSEs(butnoteasytodo)
EPSY905:PCA,EFA,andCFA 36
My(Unsolvable)IssueswithPCA• MyissueswithPCAinvolvethetwoquestionsinneedofanswersforanyuseofPCA:
1. ThenumberofcomponentsneededisnotbasedonastatisticalhypothesistestandhenceissubjectiveØ VarianceaccountedforisadescriptivemeasureØ Nostatisticaltestforwhetheranadditionalcomponentsignificantlyaccounts
formorevariance
2. TherelativemeaningofeachcomponentisquestionableatbestandhenceissubjectiveØ Typicalpackagesprovidenostandarderrorsforeacheigenvectorweight(can
beobtainedinbootstrapanalyses)Ø Nodefinitiveanswerforcomponentcomposition
• Insum,Ifeelitisveryeasytobemisled(orpurposefullymislead)withPCA
EPSY905:PCA,EFA,andCFA 37
EXPLORATORYFACTORANALYSIS
EPSY905:PCA,EFA,andCFA 38
PrimaryPurposeofEFA
• EFA:“Determinenatureandnumberoflatentvariablesthataccountforobservedvariationandcovariationamongsetofobservedindicators(≈itemsorvariables)”
Ø Inotherwords,whatcauses theseobserved responses?Ø Summarizepatterns ofcorrelation amongindicatorsØ Solutionisanend(i.e., isofinterest) inandofitself
• ComparedwithPCA: “Reducemultipleobservedvariablesintofewercomponentsthatsummarizetheirvariance”
Ø Inotherwords,howcanIabbreviate thissetofvariables?Ø Solutionisusuallyameanstoanend
EPSY905:PCA,EFA,andCFA 39
MethodsforEFA
• Youwillseemanydifferenttypesofmethodsfor“extraction”offactorsinEFA
Ø ManyarePCA-basedØ Mostweredeveloped beforecomputersbecame relevantorlikelihood theorywasdeveloped
• Youcanignoreallofthemandfocusonone:
OnlyUseMaximumLikelihoodforEFA
• ThemaximumlikelihoodmethodofEFAextraction:Ø Usesthesamelog-likelihood asconfirmatoryfactoranalyses/SEM
w Defaultassumption:multivariatenormaldistributionofdataØ Providesconsistent estimates withgoodstatistical properties(assuming youhavealargeenoughsample)
Ø Missing datausingall thedatathatwasobserved(MAR)Ø Isconsistent withmodernstatistical practices
EPSY905:PCA,EFA,andCFA 40
QuestionswhenusingEFA
• EFAsproceedbyseekingtheanswerstotwoquestions:(thesamequestionsposedinPCA;butwithdifferentterms)
1. Howmanylatentfactorsareneededto“adequately”representtheoriginaldata?Ø “Adequately”=doesagivenEFAmodelfitwell?
2. (once#1hasbeenanswered):Whatdoeseachfactorrepresent?Ø Theterm“represent” isfuzzy
EPSY905:PCA,EFA,andCFA 41
TheSyntaxofFactorAnalysis• Factoranalysisworksbyhypothesizingthatasetoflatentfactorshelpstodetermineaperson’sresponsetoasetofvariables
Ø Thiscanbeexplainedbyasystemof simultaneous linearmodelsØ HereY=observed data,p=person, v=variable,F=factorscore(Qfactors)
𝑌(* = 𝜇VW + 𝜆**𝐹(* + 𝜆*9𝐹(9 +⋯+ 𝜆*Y𝐹(Y + 𝑒(*𝑌(9 = 𝜇VZ + 𝜆9*𝐹(* + 𝜆99𝐹(9 +⋯+ 𝜆9Y𝐹(Y + 𝑒(9
⋮𝑌(K = 𝜇V[ + 𝜆K*𝐹(* + 𝜆K9𝐹(9 +⋯+ 𝜆KY𝐹(Y + 𝑒(K
• 𝜇V\ =meanforvariable𝑣• 𝜆T^ =factorloadingforvariablevontofactorf(regressionslope)
Ø FactorsareassumeddistributedMVNwithzeromeanand(forEFA)identitycovariancematrix(uncorrelated factors– tostart)
• 𝑒(T =residualforpersonpandvariablevØ ResidualsareassumeddistributedMVN(acrossitems)withazeromeanandadiagonalcovariancematrix𝚿 containing theunique variances
• Often,thisgetsshortenedintomatrixform:𝐘( = 𝝁a + 𝚲𝐅(/ + 𝐞𝐩
EPSY905:PCA,EFA,andCFA 42
HowMaximumLikelihoodEFAWorks
• MaximumlikelihoodEFAassumesthedatafollowamultivariatenormaldistribution
Ø Thebasisforthelog-likelihood function (samelog-likelihoodwehaveusedineveryanalysistothispoint)
• Thelog-likelihoodfunctiondependsontwosetsofparameters:themeanvectorandthecovariancematrix
Ø Meanvector issaturated (justusestheitemmeans foritemintercepts) – soitisoftennotthoughtofinanalysis
w Denotedas𝝁a = 𝝁d
Ø Covariancematrixiswhatgives“factorstructure”w EFAmodelsprovideastructureforthecovariancematrix
EPSY905:PCA,EFA,andCFA 43
TheEFAModelfortheCovarianceMatrix
• Thecovariancematrixismodeledbasedonhowitwouldlookifasetofhypothetical(latent)factorshadcausedthedata
• Forananalysismeasuring𝐹 factors,eachitemintheEFA:Ø Has1uniquevarianceparameterØ Has𝐹 factorloadings
• Theinitialestimationoffactorloadingsisconductedbasedontheassumptionofuncorrelatedfactors
Ø Assumption isdubiousatbest– yetisthecornerstone oftheanalysis
EPSY905:PCA,EFA,andCFA 44
ModelImpliedCovarianceMatrix
• Thefactormodelimpliedcovariancematrixis𝚺a =𝚲𝚽𝚲/ +𝚿
Ø Where:w 𝚺a =model implied covariancematrixoftheobserveddata(size𝐼 x𝐼)w 𝚲 =matrixoffactorloadings(size𝐼 x𝐹)
– InEFA:all termsin𝚲 areestimatedw 𝚽 =factorcovariancematrix(size𝐹 x𝐹)
– InEFA:𝚽 = 𝐈 (allfactorshavevariancesof1andcovariancesof0)– InCFA:thisisestimated
w 𝚿 =matrixofunique(residual)variances(size𝐼 x𝐼)– InEFA:𝚿 isdiagonalbydefault(noresidual covariances)
• Therefore,theEFAmodel-impliedcovariancematrixis:𝚺a = 𝚲𝚲/ +𝚿
EPSY905:PCA,EFA,andCFA 45
EFAModelIdentifiability
• UndertheMLmethodforEFA,thesamerulesofidentificationapplytoEFAastoPathAnalysis
Ø T-rule:TotalnumberofEFAmodelparameters mustnotexceeduniqueelements insaturated covariancematrixofdata
w Forananalysiswithanumberoffactors𝐹 andasetnumberofitems𝐼 thereare𝐹∗𝐼 + 𝐼 = 𝐼 𝐹 + 1 EFAmodelparameters
w Aswewillsee,theremustbeg gh*9
constraints forthemodel towork
w Therefore,𝐼 𝐹 + 1 − g gh*9
< d dj*9
Ø Local-identification: eachportionofthemodelmustbelocallyidentifiedw Withallfactorloadingsestimated localidentification fails
– Nowayofdifferentiating factorswithoutconstraints
EPSY905:PCA,EFA,andCFA 46
ConstraintstoMakeEFAinMLIdentified
• TheEFAmodelimposesthefollowingconstraint:𝚲/𝚿𝚲 = 𝚫
suchthat𝚫 isadiagonalmatrix
• Thisputsg gh*9
constraintsonthemodel(thatmanyfewerparameterstoestimate)
• Thisconstraintisnotwellknown– andhowitfunctionsishardtodescribe
Ø Fora1-factormodel,theresultsofEFAandCFAwillmatch
• Note:theothermethodsofEFA“extraction”avoidthisconstraintbynotbeingstatisticalmodelsinthefirstplace
Ø PCA-basedroutinesrelyonmatrixpropertiestoresolveidentificationEPSY905:PCA,EFA,andCFA 47
TheNatureoftheConstraintsinEFA
• TheEFAconstraintsprovidesomedetailedassumptionsaboutthenatureofthefactormodelandhowitpertainstothedata
• Forexample,takea2-factormodel(oneconstraint):
%𝜓T9B𝜆T^
Y)9
^)*
K
T)*
= 0
• Inshort,somecombinationsoffactorloadingsanduniquevariances(acrossandwithinitems)cannothappen
Ø Thisgoesagainstmostofourstatistical constraints – whichmustbejustifiableandunderstandable (therefore testable)
Ø Thisconstraint isnottestable inCFAEPSY905:PCA,EFA,andCFA 48
TheLog-LikelihoodFunction
• Giventhemodelparameters,theEFAmodelisestimatedbymaximizingthemultivariatenormallog-likelihood
Ø Forthedata
log 𝐿 = log = 2𝜋 hrK9 𝚺 hr9 exp %−𝒀( − 𝝁V
/𝚺h* 𝒀( − 𝝁V2
r
()*
=
−𝑁𝑉2 log 2𝜋 −
𝑁2 log 𝚺 − %
𝒀( − 𝝁V/𝚺h* 𝒀( − 𝝁V2
r
()*
• UnderEFA,thisbecomes:log 𝐿
= −𝑁𝑉2 log 2𝜋 −
𝑁2 log 𝚲𝚲/ +𝚿
−%𝒀( − 𝝁𝑰
/𝚲𝚲/ +𝚿 h* 𝒀𝒑 − 𝝁𝑰
2
r
()*EPSY905:PCA,EFA,andCFA 49
BenefitsandConsequencesofEFAwithML
• TheparametersoftheEFAmodelunderMLretainthesamebenefitsandconsequencesofanymodel(i.e.,CFA)
Ø Asymptotically(largeN)theyareconsistent, normal,andefficientØ Missingdataare“skipped”inthelikelihood,allowingforincompleteobservations tocontribute (assumedMAR)
• Furthermore,thesametypesofmodelfitindicesareavailableinEFAasareinCFA
• AswithCFA,though,anEFAmodelmustbeacloseapproximationtothesaturatedmodelcovariancematrixiftheparametersaretobebelieved
Ø Thisisamarkeddifference between EFAinMLandEFAwithothermethods–qualityoffitisstatisticallyrigorous
EPSY905:PCA,EFA,andCFA 50
MLEFAWITHBASERFUNCTIONFACTANAL(THEBADWAY)
EPSY905:PCA,EFA,andCFA 51
MLEFAUsingthefactanal()Function• ThebaseRprogramhasthefactanal()functionthatconductsML-basedEFA
Ø Butitisverylimited
• AlthoughthefunctionuseML,youstillcannothavemissingdataintheanalysis
Ø BADR!
• Wewillremovecaseswithanymissingdata(listwisedeletion)andproceed
• WewillalsonotusearotationmethodatfirstastoshowhowdefaultconstraintsinEFAwithMLareridiculous
EPSY905:PCA,EFA,andCFA 52
Step#1:DetermineNumberofFactors• TheEFAfactanal()functionprovidesarudimentarytestformodelfit
• Rememberthesaturatedmodelfrompathanalysis?
Ø Allcovariances estimated
• ThemodelfitteststhesolutionfromEFAvsthesaturatedmodel
Ø EFA1-factormodelshown
• Thegoalistofindamodelthatfitswell
EPSY905:PCA,EFA,andCFA 53
Step#1inR:ModelFitTests
• Onefactor:
• Twofactors:
• Threefactors:
• Fourfactors:
EPSY905:PCA,EFA,andCFA 54
SideNote:DefaultConstraintsinMLEFA
• TheEFAmodelimposesthefollowingconstraint:𝚲/𝚿𝚲 = 𝚫
suchthat𝚫 isadiagonalmatrix• Herearethe𝚫matricesfromeachanalysis:
EPSY905:PCA,EFA,andCFA 55
Step#2:InterpretingtheBestModel
• Asthefour-factorsolutionfitbest,wewillinterpretit• Unrotated solutionoffactorloadings:
What???
EPSY905:PCA,EFA,andCFA 56
FACTORLOADINGROTATIONSINEFA
EPSY905:PCA,EFA,andCFA 57
RotationsofFactorLoadingsinEFA
• Transformationsofthefactorloadingsarepossibleasthematrixoffactorloadingsisonlyuniqueuptoanorthogonaltransformation
Ø Don’tlikethesolution?Rotate!
• Historically,rotationsusethepropertiesofmatrixalgebratoadjustthefactorloadingstomoreinterpretablenumbers
• Modernversionsofrotations/transformationsrelyon“targetfunctions”thatspecifywhata“good”solutionshouldlooklike
Ø Thedetailsofthemodernapproacharelackinginmosttexts
EPSY905:PCA,EFA,andCFA 58
TypesofClassicalRotatedSolutions
• Multipletypesofrotationsexistbuttwobroadcategoriesseemtodominatehowtheyarediscussed:
• Orthogonalrotations:rotationsthatforcethefactorcorrelationtozero(orthogonalfactors).Thenameorthogonalrelatestotheanglebetweenaxesoffactorsolutionsbeing90degrees.Themostprevalentisthevarimax rotation.
• Obliquerotations:rotationsthatallowfornon-zerofactorcorrelations.Thenameorthogonalrelatestotheanglebetweenaxesoffactorsolutionsnotbeing90degrees.Themostprevalentisthepromax rotation.
Ø These rotationsprovide anestimateof“factorcorrelation”EPSY905:PCA,EFA,andCFA 59
HowClassicalOrthogonalRotationWorks
• Classicalorthogonalrotationalgorithmsworkbydefininganewrotatedsetoffactorloadings𝚲∗ asafunctionoftheoriginal(non-rotated)loadings𝚲 andanorthogonalrotationmatrix𝐓
𝚲∗ = 𝚲𝐓where: 𝐓𝐓/ = 𝐓/𝐓 = 𝐈
• Theserotationsdonotalterthefitofthemodelas𝚺a = 𝚲∗𝚲∗/ + 𝚿 = 𝚲𝐓 𝚲𝐓 / + 𝚿 = 𝚲𝐓𝐓/𝚲/ +𝚿= 𝚲𝚲/ +𝚿
EPSY905:PCA,EFA,andCFA 60
ModernVersionsofRotation
• MoststudiesusingEFAusetheclassicalrotationmechanisms,likelyduetoinsufficienttraining
• Modernmethodsforrotationsrelyontheuseofatargetfunctionforhowanoptimalloadingsolutionshouldlook
FromBrowne(2001)
EPSY905:PCA,EFA,andCFA 61
RotationAlgorithms
• Givenatargetfunction,rotationalgorithmsseektofindarotatedsolutionthatsimultaneously:
1. Minimizes thedistancebetween therotatedsolutionandtheoriginalfactor loadings
2. Fitsbest tothetarget function
• Rotationalgorithmsaretypicallyiterative– meaningtheycanfailtoconverge
• RotationsearchestypicallyhavemultipleoptimalvaluesØ Needmanyrestarts
EPSY905:PCA,EFA,andCFA 62
RotatedFactorLoadings:OrthogonalRotationviaVarimax
• TheVarimax rotationbroughtaboutthefollowingloadings
• Arethesebetterforinterpretation?
• Alsonote:nofactorcorrelation
EPSY905:PCA,EFA,andCFA 63
RotatedFactorLoadings:ObliqueRotationviaPromax
• ThePromax rotationbroughtaboutthefollowingloadings:
• Italsobroughtaboutthefollowingfactorcorrelations:
EPSY905:PCA,EFA,andCFA 64
EFAVIACFA
EPSY905:PCA,EFA,andCFA 65
CFAApproachestoEFA
• WecanconductexploratoryanalysisusingaCFAmodelØ Needtoset therightnumberofconstraints foridentificationØ Weset thevalueoffactorloadings forafewitemsonafewofthefactors
w Typicallytozero(myusualthought)w Sometimes toone(Brown,2002)
Ø Wekeep thefactorcovariancematrixasanidentityw Uncorrelated factors(asinEFA)withvariancesofone
• BenefitsofusingCFAforexploratoryanalyses:Ø CFAconstraints remove rotational indeterminacyoffactor loadings– norotatingisneeded (orpossible)
Ø Defines factors withpotentially lessambiguityw Constraints areeasytosee
Ø Forsomesoftware (SASandSPSS),wegetmuchmoremodelfitinformation
EPSY905:PCA,EFA,andCFA 66
EFAwithCFAConstraints
• TodoEFAwithCFA,youmust:Ø Fixfactor loadings(settoeither zeroorone)
w Use“rowechelon” form:w Oneitemhasonlyonefactorloadingestimatedw Oneitemhasonlytwofactorloadingsestimatedw Oneitemhasonlythreefactorloadings estimated
Ø Fixfactorcovariancesw Setall to0
Ø Fixfactorvariancesw Setall to1
EPSY905:PCA,EFA,andCFA 67
EFAViaCFAExample
• Wecanuselavaan todoCFA…hereisthesyntaxfortheonefactormodel
Ø The~=isthekeyà Factornametotheleft,itemsmeasuring ittotheright
EPSY905:PCA,EFA,andCFA 68
One-FactorResultsfromlavaan
EPSY905:PCA,EFA,andCFA 69
LookFamiliar?TheyareIdentical totheOne-Factor EFAfromfactanal()
EPSY905:PCA,EFA,andCFA 70
CFALogic…Applied toEFA
• Becauseourone-factormodelfitwell,wecanstop!
• CFAhasmoreindicesofmodelfit– whichcanmakefindinganappropriatesolutioneasier
• CFAalsogivesyouthestandarderrorsforeachfactorloading,leadingtoaWaldtesttoseeifitisnon-zero
Ø Noneedtousearbitrary .3cutoffØ Smallnote:AlthoughmostEFAroutines (likefactanal)don’tgiveSEstheyarecertainlyattainable underMLtheory
• Althoughweshouldstophere…We’llcontinuewiththetwo- andthree-factorversionstocomparewithEFA
EPSY905:PCA,EFA,andCFA 71
Two-FactorSyntax
EPSY905:PCA,EFA,andCFA 72
CFATwo-FactorResults
EPSY905:PCA,EFA,andCFA 73
CFAThree-FactorSyntax
EPSY905:PCA,EFA,andCFA 74
CONCLUDINGREMARKS
EPSY905:PCA,EFA,andCFA 75
WrappingUp
• Todaywediscussedtheworldofexploratoryfactoranalysisandfoundthefollowing:
Ø PCAiswhatpeople typicallyrunwhentheyareafterEFA
Ø MLEFAisabetteroptiontopick(likelihoodbased)w Constraints employed arehidden!w Rotations canbreakwithoutyourealizingtheydo
Ø MLEFAcanbeshowntobeequaltoCFAforcertainmodels
Ø Overall,CFAisstillyourbestbet
EPSY905:PCA,EFA,andCFA 76