exploratory factor analysis

AD 601EXPLANATORY FACTOR ANALYSISTutku Seçkin Çelik

Yusuf Öç

Research Question• HBAT company

• Newsprint• Magazine

• Are there any differences in customer perceptions towards to magazine industry and newsprint industry of HBAT company?

• We want to use explanatory factor analysis to reduce the dimensions of perceptions

What is EFA?• An interdependence technique

• Primary purpose is to define the underlying structure among the variables in the analysis.

• A tool for analyzing the structure of interrelationships (correlations) among a large number of variables by defining sets of variables that are highly interrelated, known as factors.

• Also used for data reduction for further use.

HBAT Data• 13 attributes about perceptions of HBAT were developed

through focus groups, a pretest and use in previous studies

• Sample consisted 200 purchasing managers of companies buying from HBAT

• Respondents were asked to rate HBAT on 13 attributes using a 0-10 graphic rating scale• 0 indicates “poor” and 10 indicates “excellent”

Evaluation of Data• No missing data• Some outliers, no variable has a standard deviation more

than 2.5, thus we decided to keep the outliers

Evaluation of Data• Also multivariate detection of outliers: Mahalanobis D2

• We employ regression then we calculate the z scores for Mahalanobis values by using compute variable command and choosing cdf.chisq,

• If there is any variable has a Mahalanobis probability less than 0.001. Since there was none, we decided that there is no outliers to delete in our dataset.

Assumptions• Normality, homoscedasticity, and linearity• Normality, necessary for statistics

• Those tests indicate that if the significance values of the attributes are greater than .05, they are normally distributed.

Table 2: Tests of Normality

Kolmogorov-Smirnova Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

X6 - Product Quality .095 200 .000 .950 200 .000

X7 - E-Commerce .122 200 .000 .962 200 .000

X8 - Technical Support .046 200 .200* .989 200 .114

X9 - Complaint Resolution .045 200 .200* .996 200 .844

X10 - Advertising .078 200 .005 .984 200 .021

X11 - Product Line .063 200 .049 .984 200 .025

X12 - Salesforce Image .107 200 .000 .981 200 .007

X13 - Competitive Pricing .091 200 .000 .971 200 .000

X14 - Warranty & Claims .058 200 .093 .996 200 .824

X15 - New Products .036 200 .200* .996 200 .912

X16 - Order & Billing .105 200 .000 .984 200 .022

X17 - Price Flexibility .095 200 .000 .968 200 .000

X18 - Delivery Speed .086 200 .001 .984 200 .026

Assumptions

• Histograms

Assumptions• Log, 1/x, Square root, x2, x3 did not help to normalize the

other variables, even variables became much complicated.

• We tried Arcsin, but it didn’t transform the data

• So, we decided to continue our analysis with the original variables except X11, while keeping in mind that most of the variables are not normally distributed.

Assumptions - Homogeneity• . According to Levene statistics, only X13 Competitive

Pricing has a homogenous variance

Table 3: Test of Homogeneity of Variances

Levene Statistic df1 df2 Sig.

X7 - E-Commerce ,533 1 198 ,466

X8 - Technical Support ,018 1 198 ,892

X9 - Complaint Resolution ,002 1 198 ,963

X10 - Advertising ,775 1 198 ,380

tr_x11 - transformed product line 1,228 1 198 ,269

X12 - Salesforce Image ,080 1 198 ,777

X13 - Competitive Pricing 5,116 1 198 ,025

X14 - Warranty & Claims 2,403 1 198 ,123

X15 - New Products ,917 1 198 ,339

X16 - Order & Billing ,451 1 198 ,503

X17 - Price Flexibility 2,717 1 198 ,101

X18 - Delivery Speed ,006 1 198 ,939

Assumptions - Linearity

Objectives of EFA

• Specify the unit of analysisWhat is being grouped?

Cases or respondents (Q type) Variables (R type)

• Achieving data summarization vs. data reduction Data summarization: identifying underlying dimensions Data reduction: using factor loadings as the basis for subsequent

analysis

• Variable selection Consider the conceptual underpinnings and intuition as to the

appropriateness of variables Comprehensive & parsimonious

Designing a Factor Analysis

• Correlations among variables or respondents R type– correlation matrix Q type– factor matrix

• Variable selection and measurement issues Metric variables, if necessary dummy code nonmetric ones Reasonable number of variables

• Sample size ≥ 100 More observations than variables At least 50 observation Number of observations per variable -> 5:1

EFA• An R type EFA was employed• Aim is data reduction• We looked at the correlation matrix of variables• Sample size is 200, which is more than the required

number 100. • There are more observations then variables, as

suggested. • Number of observations per variable is approximately

15:1, which is more than the desired limit of 5:1

Assumptions in Factor Analysis

Conceptual issues• Conceptually valid & appropriate patterns• Homogenous sample with respect to underlying factor structure

Statistical issues• Overall measures of intercorrelation

Correlations btw variables ≥ 0.30 Small partial correlations (unexplained correlation when the effects of

other variables are taken into account) Anti-image correlation matrix (correlations ≥ 0.70) Bartlett test of sphericity (significance < 0.05) Measure of sampling adequacy (MSA) (MSA values > 0.50)

• Variable-specific measures of intercorrelation MSA for each variable (MSA values > 0.50)

Assumptions in Factor Analysis

• Bartlett test of sphericity is significant at .00 < .05 indicating that correlation matrix has significant correlations among at least some of the variables.

• Also, MSA value is .648, which is more than desired .5 indicating the appropriateness of factor analysis.

Table 4: KMO and Bartlett's Test

Kaiser-Meyer-Olkin Measure of Sampling Adequacy. ,648

Bartlett's Test of Sphericity

Approx. Chi-Square 1875,571

df 78

Sig. ,000

Correlation Matrix

X6 -

Product Quality

X7 - E-Commerce

X8 - Technical Support

X9 - Complaint Resolution

X10 - Advertising

tr_x11 - transformed product line

X12 - Salesforce

Image

X13 - Competitive

Pricing

X14 - Warranty & Claims

X15 - New

Products

X16 - Order

& Billing

X17 - Price

Flexibility

X18 - Delivery Speed

X6 - Product Quality 1

X7 - E-Commerce -0,034 1

X8 - Technical Support 0,087 0,041 1


0,09 ,192** ,152* 1

X10 - Advertising -0,054 ,505** 0,028 ,234** 1


,491** 0,069 ,166* ,576** ,145* 1

X12 - Salesforce Image -0,116 ,788** 0,086 ,256** ,627** 0,056 1

X13 - Competitive Pricing

-,448** ,177* -0,092 -0,077 0,099 -,484** ,200** 1


0,109 0,103 ,838** ,181* 0,035 ,232** ,163* -0,085 1

X15 - New Products 0,136 -0,041 -0,038 0,09 0,063 ,144* 0,009 -0,121 0,03 1

X16 - Order & Billing 0,083 ,217** 0,121 ,741** ,230** ,466** ,284** -0,06 ,204** 0,137 1

X17 - Price Flexibility -,487** ,186** -0,029 ,418** ,260** -,309** ,272** ,470** -0,041 0,047 ,419** 1

X18 - Delivery Speed 0,067 ,241** 0,132 ,878** ,323** ,629** ,299** -0,055 ,183** ,147* ,773** ,513** 1

Hair et al. suggests that correlations lower than 0.30 may show that factor analysis is inappropriate. Of the 91 correlations between variables, 20 of them had correlation values higher than 0.30.

Partial Correlations• Partial correlations should be small, as opposed to

correlations. In anti-image correlation matrix, the values other than the diagonal shows us the partial correlations.

• Only X8 and X14, and X17 and X18 had high partial correlations with each other.

• MSA less than 0.50 MSA values should be deleted one by one.

• X17 Price Flexibility demonstrated a lower individual MSA value of 0.4 that we deleted this variable as it strictly suggested by Hair et al.

Partial Correlations

X6 -

Product Quality

X7 - E-Commerc

e



X10 - Advertisin

g

tr_x11 - transformed product

line

X12 - Salesforce Image


X14 - Warranty &

Claims

X15 - New Products

X16 - Order & Billing

X17 - Price

Flexibility

X18 - Delivery Speed

X6 - Product Quality ,859a

X7 - E-Commerce -0,096 ,692a


-0,024 0,019 ,518a


-0,017 0,055 -0,092 ,908a

X10 - Advertising -0,038 -0,022 -0,084 0,099 ,746a


0,026 0,025 0,029 0,013 -0,209 ,499a

X12 - Salesforce Image

0,113 -0,676 0,068 -0,071 -0,437 0,141 ,655a


0,118 -0,08 0,071 0,003 0,021 0,072 -0,044 ,923a


0,014 0,012 -0,839 0,078 0,129 -0,049 -0,144 -0,063 ,542a

X15 - New Products -0,128 0,089 0,114 0,085 -0,025 -0,086 -0,045 0,086 -0,092 ,567a

X16 - Order & Billing -0,089 0,008 0,11 -0,208 0,073 -0,012 -0,084 0,071 -0,137 -0,029 ,937a

X17 - Price Flexibility

0,228 0,041 -0,012 0,036 -0,208 0,904 0,121 -0,123 0,017 -0,115 -0,069 ,442a

X18 - Delivery Speed -0,088 -0,057 0,002 -0,352 0,095 -0,871 -0,088 0,011 0,001 0,039 -0,13 -0,86 ,586a

Correlations revisited• After deleting X17, KMO and Bartlett’s Test results

showed that total MSA value raised to 0,695 from 0,648, and Bartlett test gave us a significant result, again.

• The new correlation matrix presented 15 correlations more than 0.30 while the number was 20 with X17, although most of the correlations were significant.

Deriving Factors and Assessing Overall Fit

• Selecting the Factor Extraction Method: Common factor analysis vs. Component factor analysis

Total variance = common variance + specific variance + error variance

Component analysis considers the total variance, most appropriate when:• Data reduction is primary concern (minimum number of factor to account

for maximum portion of total variance)• Prior knowledge suggests that specific and error variance is small

Common factor analysis considers only common variance, used when:• Data summarization is the primary objective• Little knowledge about specific and error variance

Stage 4: Deriving Factors and Assessing Overall Fit

• Criteria for number of factors to extract

Latent root criterion (component)Eigenvalues ≥ 1 if 20 <♯of variables < 50

A priori criterion – set a predetermined ♯of factors

Percentage of variance criterion – until achieving a specified cumulative % of total variance (%60 in social sciences)

Scree test criterion – factors before inflection point

More factors if the respondents are heterogeneous

Deriving Factors and Assessing Overall Fit• Principal component factor analysis• Factor extraction method, latent root criterion which only except factors with an

eigenvalues more than 1• The below table show that the first factor accounts for 31%, the second one 19%,

the third %16, and the fourth 15% of total variance. • A total of 75% of total variance was explained with a four-factor solution

Component

Initial Eigenvalues Rotation Sums of Squared Loadings

Total% of

VarianceCumulative

% Total% of

VarianceCumulative

%1 3,723 31,024 31,024 2,888 24,066 24,066

2 2,320 19,331 50,355 2,330 19,416 43,482

3 1,689 14,071 64,427 1,910 15,914 59,396

4 1,267 10,559 74,986 1,871 15,590 74,986

5 ,946 7,879 82,865

6 ,574 4,787 87,652

7 ,489 4,078 91,730

8 ,342 2,852 94,582

9 ,228 1,902 96,484

10 ,187 1,561 98,044

11 ,136 1,137 99,182

12 ,098 ,818 100,000

Scree Plot

If we had employed scree test criterion, then we would have came up more factors. As you can see from the figure, inflection point was after the sixth factor

Interpreting the Factors

Three processes• Estimate the factor matrix • Factor rotation – Orthogonal or Oblique

Orthogonal – best suited when the aim is data reduction• QUARTIMAX • VARIMAX• EQUIMAX

Oblique – best suited when the aim is to obtain theoretically meaningful factors or constructs • OBLIMIN

• Factor interpretation and respecification Factor loadings ≥ 0.30, preferably ≥ 0.50 Avoid cross-loadings Communality ≥ 0.50 Respecify the factor model if needed Label the factors

Interpreting the Factors

• Orthogonal VARIMAX rotation was used due to its simplicity and wide usage.

• For data reduction purposes orthogonal rotation is suggested• We set factor loadings more than 0.40 as significant

Table 10: Component MatrixTable 11: Rotated Component Matrix

Component

Component

1 2 3 4 1 2 3 4X6 -,571 ,567 X6 ,831

X7 ,468 ,646 X7 ,882

X8 ,867 X8 ,954

X9 ,836 X9 ,920

X10 ,493 ,535 X10 ,784

tr_x11 ,699 -,499 tr_x11 ,576 ,663

X12 ,535 ,689 X12 ,908

X13 ,662 -,450 X13 -,797

X14 ,408 ,838 X14 ,942

X15 X15

X16 ,795 X16 ,861

X18 ,877 X18 ,932

Rotation• As you can see unrotated factor solution had lots of cross-

loadings that rotation was necessary. • X11 Product Line had a cross-loading both in factor 1 and

3, and loadings were closer than 0.10. • Thus, we tried another rotation methods to remedy this

inconsistency, but other rotation methods also gave cross-loadings for X11.

• As a result, we decided to eliminate X11. • Also, X15 had a lower loading than 0.40 that we

eliminated this variable too. • After the elimination of X11 and X15, total variance

explained by 4 factor solution increased to 81%.

Rotated Component Matrix

Table 12: New Rotated Component Matrix

Component Communalities

1 2 3 4

X18 - Delivery Speed 0,932

,908

X9 - Complaint Resolution 0,929

,886

X16 - Order & Billing 0,88

,804

X12 - Salesforce Image 0,904

,868

X7 - E-Commerce 0,884

,793

X10 - Advertising 0,78

,642

X8 - Technical Support 0,954

,918

X14 - Warranty & Claims 0,948

,921

X6 - Product Quality 0,861

,746

X13 - Competitive Pricing -0,827

,714

Factors• Factor 1: Delivery Speed, Complaint Resolution, Order &

Billing (Sales Support)

• Factor 2: Salesforce Image, E-Commerce, Advertising (Recognition)

• Factor 3: Technical Support, Warranty & Claims (After Sales Services)

• Factor 4: Product Quality, Competitive Pricing (Quality & Price)

Validation of Factor Analysis

• Assess the generalizability of results Split sample Separate sample CFA

• Detect the influential observations - outliers

Validation of Factor Analysis

Table 13: Rotated Component Matrix of Split Sample

Component

1 2 3 4

X18 - Delivery Speed ,927

X9 - Complaint Resolution ,912

X16 - Order & Billing ,866

X12 - Salesforce Image ,922

X7 - E-Commerce ,882

X10 - Advertising ,793

X14 - Warranty & Claims ,952

X8 - Technical Support ,948

X6 - Product Quality ,869

X13 - Competitive Pricing -,812

We used split sample. When we run the factor analysis with split sample, MSA value became 0,652, and Bartlett Test gave a significant result that factor structure can be examined in split sample also. Besides rotated component matrix demonstrated the same factor structure.

Additional Uses of EFA Results

Data Reduction Options• Surrogate variable

• Summated scales Unidimensionality Reliability

Item-to-total correlations ≥ 0.50, inter-item correlations ≥ 0.30Cronbach α ≥ 0.30

Validity - Convergent, discriminant, nomological

• Factor scores, we used factor scores to avoid additional validaitons.

Thank You for Listening!

exploratory factor analysis

Documents

sets of variables

original variables

perceptions of hbat

explanatory factor analysis

large number of variables

hbat data13 attributes

mahalanobis values

customer perceptions