exploratory factor analysis
DESCRIPTION
MULTIVARIATE ANALYSISTRANSCRIPT
AD 601EXPLANATORY FACTOR ANALYSISTutku Seçkin Çelik
Yusuf Öç
Research Question• HBAT company
• Newsprint• Magazine
• Are there any differences in customer perceptions towards to magazine industry and newsprint industry of HBAT company?
• We want to use explanatory factor analysis to reduce the dimensions of perceptions
What is EFA?• An interdependence technique
• Primary purpose is to define the underlying structure among the variables in the analysis.
• A tool for analyzing the structure of interrelationships (correlations) among a large number of variables by defining sets of variables that are highly interrelated, known as factors.
• Also used for data reduction for further use.
HBAT Data• 13 attributes about perceptions of HBAT were developed
through focus groups, a pretest and use in previous studies
• Sample consisted 200 purchasing managers of companies buying from HBAT
• Respondents were asked to rate HBAT on 13 attributes using a 0-10 graphic rating scale• 0 indicates “poor” and 10 indicates “excellent”
Evaluation of Data• No missing data• Some outliers, no variable has a standard deviation more
than 2.5, thus we decided to keep the outliers
Evaluation of Data• Also multivariate detection of outliers: Mahalanobis D2
• We employ regression then we calculate the z scores for Mahalanobis values by using compute variable command and choosing cdf.chisq,
• If there is any variable has a Mahalanobis probability less than 0.001. Since there was none, we decided that there is no outliers to delete in our dataset.
Assumptions• Normality, homoscedasticity, and linearity• Normality, necessary for statistics
• Those tests indicate that if the significance values of the attributes are greater than .05, they are normally distributed.
Table 2: Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
X6 - Product Quality .095 200 .000 .950 200 .000
X7 - E-Commerce .122 200 .000 .962 200 .000
X8 - Technical Support .046 200 .200* .989 200 .114
X9 - Complaint Resolution .045 200 .200* .996 200 .844
X10 - Advertising .078 200 .005 .984 200 .021
X11 - Product Line .063 200 .049 .984 200 .025
X12 - Salesforce Image .107 200 .000 .981 200 .007
X13 - Competitive Pricing .091 200 .000 .971 200 .000
X14 - Warranty & Claims .058 200 .093 .996 200 .824
X15 - New Products .036 200 .200* .996 200 .912
X16 - Order & Billing .105 200 .000 .984 200 .022
X17 - Price Flexibility .095 200 .000 .968 200 .000
X18 - Delivery Speed .086 200 .001 .984 200 .026
Assumptions
• Histograms
Assumptions• Log, 1/x, Square root, x2, x3 did not help to normalize the
other variables, even variables became much complicated.
• We tried Arcsin, but it didn’t transform the data
• So, we decided to continue our analysis with the original variables except X11, while keeping in mind that most of the variables are not normally distributed.
Assumptions - Homogeneity• . According to Levene statistics, only X13 Competitive
Pricing has a homogenous variance
Table 3: Test of Homogeneity of Variances
Levene Statistic df1 df2 Sig.
X7 - E-Commerce ,533 1 198 ,466
X8 - Technical Support ,018 1 198 ,892
X9 - Complaint Resolution ,002 1 198 ,963
X10 - Advertising ,775 1 198 ,380
tr_x11 - transformed product line 1,228 1 198 ,269
X12 - Salesforce Image ,080 1 198 ,777
X13 - Competitive Pricing 5,116 1 198 ,025
X14 - Warranty & Claims 2,403 1 198 ,123
X15 - New Products ,917 1 198 ,339
X16 - Order & Billing ,451 1 198 ,503
X17 - Price Flexibility 2,717 1 198 ,101
X18 - Delivery Speed ,006 1 198 ,939
Assumptions - Linearity
Objectives of EFA
• Specify the unit of analysisWhat is being grouped?
Cases or respondents (Q type) Variables (R type)
• Achieving data summarization vs. data reduction Data summarization: identifying underlying dimensions Data reduction: using factor loadings as the basis for subsequent
analysis
• Variable selection Consider the conceptual underpinnings and intuition as to the
appropriateness of variables Comprehensive & parsimonious
Designing a Factor Analysis
• Correlations among variables or respondents R type– correlation matrix Q type– factor matrix
• Variable selection and measurement issues Metric variables, if necessary dummy code nonmetric ones Reasonable number of variables
• Sample size ≥ 100 More observations than variables At least 50 observation Number of observations per variable -> 5:1
EFA• An R type EFA was employed• Aim is data reduction• We looked at the correlation matrix of variables• Sample size is 200, which is more than the required
number 100. • There are more observations then variables, as
suggested. • Number of observations per variable is approximately
15:1, which is more than the desired limit of 5:1
Assumptions in Factor Analysis
Conceptual issues• Conceptually valid & appropriate patterns• Homogenous sample with respect to underlying factor structure
Statistical issues• Overall measures of intercorrelation
Correlations btw variables ≥ 0.30 Small partial correlations (unexplained correlation when the effects of
other variables are taken into account) Anti-image correlation matrix (correlations ≥ 0.70) Bartlett test of sphericity (significance < 0.05) Measure of sampling adequacy (MSA) (MSA values > 0.50)
• Variable-specific measures of intercorrelation MSA for each variable (MSA values > 0.50)
Assumptions in Factor Analysis
• Bartlett test of sphericity is significant at .00 < .05 indicating that correlation matrix has significant correlations among at least some of the variables.
• Also, MSA value is .648, which is more than desired .5 indicating the appropriateness of factor analysis.
Table 4: KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling Adequacy. ,648
Bartlett's Test of Sphericity
Approx. Chi-Square 1875,571
df 78
Sig. ,000
Correlation Matrix
X6 -
Product Quality
X7 - E-Commerce
X8 - Technical Support
X9 - Complaint Resolution
X10 - Advertising
tr_x11 - transformed product line
X12 - Salesforce
Image
X13 - Competitive
Pricing
X14 - Warranty & Claims
X15 - New
Products
X16 - Order
& Billing
X17 - Price
Flexibility
X18 - Delivery Speed
X6 - Product Quality 1
X7 - E-Commerce -0,034 1
X8 - Technical Support 0,087 0,041 1
X9 - Complaint Resolution
0,09 ,192** ,152* 1
X10 - Advertising -0,054 ,505** 0,028 ,234** 1
tr_x11 - transformed product line
,491** 0,069 ,166* ,576** ,145* 1
X12 - Salesforce Image -0,116 ,788** 0,086 ,256** ,627** 0,056 1
X13 - Competitive Pricing
-,448** ,177* -0,092 -0,077 0,099 -,484** ,200** 1
X14 - Warranty & Claims
0,109 0,103 ,838** ,181* 0,035 ,232** ,163* -0,085 1
X15 - New Products 0,136 -0,041 -0,038 0,09 0,063 ,144* 0,009 -0,121 0,03 1
X16 - Order & Billing 0,083 ,217** 0,121 ,741** ,230** ,466** ,284** -0,06 ,204** 0,137 1
X17 - Price Flexibility -,487** ,186** -0,029 ,418** ,260** -,309** ,272** ,470** -0,041 0,047 ,419** 1
X18 - Delivery Speed 0,067 ,241** 0,132 ,878** ,323** ,629** ,299** -0,055 ,183** ,147* ,773** ,513** 1
Hair et al. suggests that correlations lower than 0.30 may show that factor analysis is inappropriate. Of the 91 correlations between variables, 20 of them had correlation values higher than 0.30.
Partial Correlations• Partial correlations should be small, as opposed to
correlations. In anti-image correlation matrix, the values other than the diagonal shows us the partial correlations.
• Only X8 and X14, and X17 and X18 had high partial correlations with each other.
• MSA less than 0.50 MSA values should be deleted one by one.
• X17 Price Flexibility demonstrated a lower individual MSA value of 0.4 that we deleted this variable as it strictly suggested by Hair et al.
Partial Correlations
X6 -
Product Quality
X7 - E-Commerc
e
X8 - Technical Support
X9 - Complaint Resolution
X10 - Advertisin
g
tr_x11 - transformed product
line
X12 - Salesforce Image
X13 - Competitive Pricing
X14 - Warranty &
Claims
X15 - New Products
X16 - Order & Billing
X17 - Price
Flexibility
X18 - Delivery Speed
X6 - Product Quality ,859a
X7 - E-Commerce -0,096 ,692a
X8 - Technical Support
-0,024 0,019 ,518a
X9 - Complaint Resolution
-0,017 0,055 -0,092 ,908a
X10 - Advertising -0,038 -0,022 -0,084 0,099 ,746a
tr_x11 - transformed product line
0,026 0,025 0,029 0,013 -0,209 ,499a
X12 - Salesforce Image
0,113 -0,676 0,068 -0,071 -0,437 0,141 ,655a
X13 - Competitive Pricing
0,118 -0,08 0,071 0,003 0,021 0,072 -0,044 ,923a
X14 - Warranty & Claims
0,014 0,012 -0,839 0,078 0,129 -0,049 -0,144 -0,063 ,542a
X15 - New Products -0,128 0,089 0,114 0,085 -0,025 -0,086 -0,045 0,086 -0,092 ,567a
X16 - Order & Billing -0,089 0,008 0,11 -0,208 0,073 -0,012 -0,084 0,071 -0,137 -0,029 ,937a
X17 - Price Flexibility
0,228 0,041 -0,012 0,036 -0,208 0,904 0,121 -0,123 0,017 -0,115 -0,069 ,442a
X18 - Delivery Speed -0,088 -0,057 0,002 -0,352 0,095 -0,871 -0,088 0,011 0,001 0,039 -0,13 -0,86 ,586a
Correlations revisited• After deleting X17, KMO and Bartlett’s Test results
showed that total MSA value raised to 0,695 from 0,648, and Bartlett test gave us a significant result, again.
• The new correlation matrix presented 15 correlations more than 0.30 while the number was 20 with X17, although most of the correlations were significant.
Deriving Factors and Assessing Overall Fit
• Selecting the Factor Extraction Method: Common factor analysis vs. Component factor analysis
Total variance = common variance + specific variance + error variance
Component analysis considers the total variance, most appropriate when:• Data reduction is primary concern (minimum number of factor to account
for maximum portion of total variance)• Prior knowledge suggests that specific and error variance is small
Common factor analysis considers only common variance, used when:• Data summarization is the primary objective• Little knowledge about specific and error variance
Stage 4: Deriving Factors and Assessing Overall Fit
• Criteria for number of factors to extract
Latent root criterion (component)Eigenvalues ≥ 1 if 20 <♯of variables < 50
A priori criterion – set a predetermined ♯of factors
Percentage of variance criterion – until achieving a specified cumulative % of total variance (%60 in social sciences)
Scree test criterion – factors before inflection point
More factors if the respondents are heterogeneous
Deriving Factors and Assessing Overall Fit• Principal component factor analysis• Factor extraction method, latent root criterion which only except factors with an
eigenvalues more than 1• The below table show that the first factor accounts for 31%, the second one 19%,
the third %16, and the fourth 15% of total variance. • A total of 75% of total variance was explained with a four-factor solution
Component
Initial Eigenvalues Rotation Sums of Squared Loadings
Total% of
VarianceCumulative
% Total% of
VarianceCumulative
%1 3,723 31,024 31,024 2,888 24,066 24,066
2 2,320 19,331 50,355 2,330 19,416 43,482
3 1,689 14,071 64,427 1,910 15,914 59,396
4 1,267 10,559 74,986 1,871 15,590 74,986
5 ,946 7,879 82,865
6 ,574 4,787 87,652
7 ,489 4,078 91,730
8 ,342 2,852 94,582
9 ,228 1,902 96,484
10 ,187 1,561 98,044
11 ,136 1,137 99,182
12 ,098 ,818 100,000
Scree Plot
If we had employed scree test criterion, then we would have came up more factors. As you can see from the figure, inflection point was after the sixth factor
Interpreting the Factors
Three processes• Estimate the factor matrix • Factor rotation – Orthogonal or Oblique
Orthogonal – best suited when the aim is data reduction• QUARTIMAX • VARIMAX• EQUIMAX
Oblique – best suited when the aim is to obtain theoretically meaningful factors or constructs • OBLIMIN
• Factor interpretation and respecification Factor loadings ≥ 0.30, preferably ≥ 0.50 Avoid cross-loadings Communality ≥ 0.50 Respecify the factor model if needed Label the factors
Interpreting the Factors
• Orthogonal VARIMAX rotation was used due to its simplicity and wide usage.
• For data reduction purposes orthogonal rotation is suggested• We set factor loadings more than 0.40 as significant
Table 10: Component MatrixTable 11: Rotated Component Matrix
Component
Component
1 2 3 4 1 2 3 4X6 -,571 ,567 X6 ,831
X7 ,468 ,646 X7 ,882
X8 ,867 X8 ,954
X9 ,836 X9 ,920
X10 ,493 ,535 X10 ,784
tr_x11 ,699 -,499 tr_x11 ,576 ,663
X12 ,535 ,689 X12 ,908
X13 ,662 -,450 X13 -,797
X14 ,408 ,838 X14 ,942
X15 X15
X16 ,795 X16 ,861
X18 ,877 X18 ,932
Rotation• As you can see unrotated factor solution had lots of cross-
loadings that rotation was necessary. • X11 Product Line had a cross-loading both in factor 1 and
3, and loadings were closer than 0.10. • Thus, we tried another rotation methods to remedy this
inconsistency, but other rotation methods also gave cross-loadings for X11.
• As a result, we decided to eliminate X11. • Also, X15 had a lower loading than 0.40 that we
eliminated this variable too. • After the elimination of X11 and X15, total variance
explained by 4 factor solution increased to 81%.
Rotated Component Matrix
Table 12: New Rotated Component Matrix
Component Communalities
1 2 3 4
X18 - Delivery Speed 0,932
,908
X9 - Complaint Resolution 0,929
,886
X16 - Order & Billing 0,88
,804
X12 - Salesforce Image 0,904
,868
X7 - E-Commerce 0,884
,793
X10 - Advertising 0,78
,642
X8 - Technical Support 0,954
,918
X14 - Warranty & Claims 0,948
,921
X6 - Product Quality 0,861
,746
X13 - Competitive Pricing -0,827
,714
Factors• Factor 1: Delivery Speed, Complaint Resolution, Order &
Billing (Sales Support)
• Factor 2: Salesforce Image, E-Commerce, Advertising (Recognition)
• Factor 3: Technical Support, Warranty & Claims (After Sales Services)
• Factor 4: Product Quality, Competitive Pricing (Quality & Price)
Validation of Factor Analysis
• Assess the generalizability of results Split sample Separate sample CFA
• Detect the influential observations - outliers
Validation of Factor Analysis
Table 13: Rotated Component Matrix of Split Sample
Component
1 2 3 4
X18 - Delivery Speed ,927
X9 - Complaint Resolution ,912
X16 - Order & Billing ,866
X12 - Salesforce Image ,922
X7 - E-Commerce ,882
X10 - Advertising ,793
X14 - Warranty & Claims ,952
X8 - Technical Support ,948
X6 - Product Quality ,869
X13 - Competitive Pricing -,812
We used split sample. When we run the factor analysis with split sample, MSA value became 0,652, and Bartlett Test gave a significant result that factor structure can be examined in split sample also. Besides rotated component matrix demonstrated the same factor structure.
Additional Uses of EFA Results
Data Reduction Options• Surrogate variable
• Summated scales Unidimensionality Reliability
Item-to-total correlations ≥ 0.50, inter-item correlations ≥ 0.30Cronbach α ≥ 0.30
Validity - Convergent, discriminant, nomological
• Factor scores, we used factor scores to avoid additional validaitons.
Thank You for Listening!