exploratory factor analysis

33
AD 601 EXPLANATORY FACTOR ANALYSIS Tutku Seçkin Çelik Yusuf Öç

Upload: mesut

Post on 25-Dec-2015

238 views

Category:

Documents


7 download

DESCRIPTION

MULTIVARIATE ANALYSIS

TRANSCRIPT

Page 1: Exploratory Factor Analysis

AD 601EXPLANATORY FACTOR ANALYSISTutku Seçkin Çelik

Yusuf Öç

Page 2: Exploratory Factor Analysis

Research Question• HBAT company

• Newsprint• Magazine

• Are there any differences in customer perceptions towards to magazine industry and newsprint industry of HBAT company?

• We want to use explanatory factor analysis to reduce the dimensions of perceptions

Page 3: Exploratory Factor Analysis

What is EFA?• An interdependence technique

• Primary purpose is to define the underlying structure among the variables in the analysis.

• A tool for analyzing the structure of interrelationships (correlations) among a large number of variables by defining sets of variables that are highly interrelated, known as factors.

• Also used for data reduction for further use.

Page 4: Exploratory Factor Analysis

HBAT Data• 13 attributes about perceptions of HBAT were developed

through focus groups, a pretest and use in previous studies

• Sample consisted 200 purchasing managers of companies buying from HBAT

• Respondents were asked to rate HBAT on 13 attributes using a 0-10 graphic rating scale• 0 indicates “poor” and 10 indicates “excellent”

Page 5: Exploratory Factor Analysis

Evaluation of Data• No missing data• Some outliers, no variable has a standard deviation more

than 2.5, thus we decided to keep the outliers

Page 6: Exploratory Factor Analysis

Evaluation of Data• Also multivariate detection of outliers: Mahalanobis D2

• We employ regression then we calculate the z scores for Mahalanobis values by using compute variable command and choosing cdf.chisq,

• If there is any variable has a Mahalanobis probability less than 0.001. Since there was none, we decided that there is no outliers to delete in our dataset.

Page 7: Exploratory Factor Analysis

Assumptions• Normality, homoscedasticity, and linearity• Normality, necessary for statistics

• Those tests indicate that if the significance values of the attributes are greater than .05, they are normally distributed.

Table 2: Tests of Normality

  Kolmogorov-Smirnova Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

X6 - Product Quality .095 200 .000 .950 200 .000

X7 - E-Commerce .122 200 .000 .962 200 .000

X8 - Technical Support .046 200 .200* .989 200 .114

X9 - Complaint Resolution .045 200 .200* .996 200 .844

X10 - Advertising .078 200 .005 .984 200 .021

X11 - Product Line .063 200 .049 .984 200 .025

X12 - Salesforce Image .107 200 .000 .981 200 .007

X13 - Competitive Pricing .091 200 .000 .971 200 .000

X14 - Warranty & Claims .058 200 .093 .996 200 .824

X15 - New Products .036 200 .200* .996 200 .912

X16 - Order & Billing .105 200 .000 .984 200 .022

X17 - Price Flexibility .095 200 .000 .968 200 .000

X18 - Delivery Speed .086 200 .001 .984 200 .026

Page 8: Exploratory Factor Analysis

Assumptions

• Histograms

Page 9: Exploratory Factor Analysis

Assumptions• Log, 1/x, Square root, x2, x3 did not help to normalize the

other variables, even variables became much complicated.

• We tried Arcsin, but it didn’t transform the data

• So, we decided to continue our analysis with the original variables except X11, while keeping in mind that most of the variables are not normally distributed.

Page 10: Exploratory Factor Analysis

Assumptions - Homogeneity• . According to Levene statistics, only X13 Competitive

Pricing has a homogenous variance

Table 3: Test of Homogeneity of Variances

  Levene Statistic df1 df2 Sig.

X7 - E-Commerce ,533 1 198 ,466

X8 - Technical Support ,018 1 198 ,892

X9 - Complaint Resolution ,002 1 198 ,963

X10 - Advertising ,775 1 198 ,380

tr_x11 - transformed product line 1,228 1 198 ,269

X12 - Salesforce Image ,080 1 198 ,777

X13 - Competitive Pricing 5,116 1 198 ,025

X14 - Warranty & Claims 2,403 1 198 ,123

X15 - New Products ,917 1 198 ,339

X16 - Order & Billing ,451 1 198 ,503

X17 - Price Flexibility 2,717 1 198 ,101

X18 - Delivery Speed ,006 1 198 ,939

Page 11: Exploratory Factor Analysis

Assumptions - Linearity

Page 12: Exploratory Factor Analysis

Objectives of EFA

• Specify the unit of analysisWhat is being grouped?

Cases or respondents (Q type) Variables (R type)

• Achieving data summarization vs. data reduction Data summarization: identifying underlying dimensions Data reduction: using factor loadings as the basis for subsequent

analysis

• Variable selection Consider the conceptual underpinnings and intuition as to the

appropriateness of variables Comprehensive & parsimonious

Page 13: Exploratory Factor Analysis

Designing a Factor Analysis

• Correlations among variables or respondents R type– correlation matrix Q type– factor matrix

• Variable selection and measurement issues Metric variables, if necessary dummy code nonmetric ones Reasonable number of variables

• Sample size ≥ 100 More observations than variables At least 50 observation Number of observations per variable -> 5:1

Page 14: Exploratory Factor Analysis

EFA• An R type EFA was employed• Aim is data reduction• We looked at the correlation matrix of variables• Sample size is 200, which is more than the required

number 100. • There are more observations then variables, as

suggested. • Number of observations per variable is approximately

15:1, which is more than the desired limit of 5:1

Page 15: Exploratory Factor Analysis

Assumptions in Factor Analysis

Conceptual issues• Conceptually valid & appropriate patterns• Homogenous sample with respect to underlying factor structure

Statistical issues• Overall measures of intercorrelation

Correlations btw variables ≥ 0.30 Small partial correlations (unexplained correlation when the effects of

other variables are taken into account) Anti-image correlation matrix (correlations ≥ 0.70) Bartlett test of sphericity (significance < 0.05) Measure of sampling adequacy (MSA) (MSA values > 0.50)

• Variable-specific measures of intercorrelation MSA for each variable (MSA values > 0.50)

Page 16: Exploratory Factor Analysis

Assumptions in Factor Analysis

• Bartlett test of sphericity is significant at .00 < .05 indicating that correlation matrix has significant correlations among at least some of the variables.

• Also, MSA value is .648, which is more than desired .5 indicating the appropriateness of factor analysis.

Table 4: KMO and Bartlett's Test

Kaiser-Meyer-Olkin Measure of Sampling Adequacy. ,648

Bartlett's Test of Sphericity

Approx. Chi-Square 1875,571

df 78

Sig. ,000

Page 17: Exploratory Factor Analysis

Correlation Matrix

 X6 -

Product Quality

X7 - E-Commerce

X8 - Technical Support

X9 - Complaint Resolution

X10 - Advertising

tr_x11 - transformed product line

X12 - Salesforce

Image

X13 - Competitive

Pricing

X14 - Warranty & Claims

X15 - New

Products

X16 - Order

& Billing

X17 - Price

Flexibility

X18 - Delivery Speed

X6 - Product Quality 1                        

X7 - E-Commerce -0,034 1                      

X8 - Technical Support 0,087 0,041 1                    

X9 - Complaint Resolution

0,09 ,192** ,152* 1                  

X10 - Advertising -0,054 ,505** 0,028 ,234** 1                

tr_x11 - transformed product line

,491** 0,069 ,166* ,576** ,145* 1              

X12 - Salesforce Image -0,116 ,788** 0,086 ,256** ,627** 0,056 1            

X13 - Competitive Pricing

-,448** ,177* -0,092 -0,077 0,099 -,484** ,200** 1          

X14 - Warranty & Claims

0,109 0,103 ,838** ,181* 0,035 ,232** ,163* -0,085 1        

X15 - New Products 0,136 -0,041 -0,038 0,09 0,063 ,144* 0,009 -0,121 0,03 1      

X16 - Order & Billing 0,083 ,217** 0,121 ,741** ,230** ,466** ,284** -0,06 ,204** 0,137 1    

X17 - Price Flexibility -,487** ,186** -0,029 ,418** ,260** -,309** ,272** ,470** -0,041 0,047 ,419** 1  

X18 - Delivery Speed 0,067 ,241** 0,132 ,878** ,323** ,629** ,299** -0,055 ,183** ,147* ,773** ,513** 1

Hair et al. suggests that correlations lower than 0.30 may show that factor analysis is inappropriate. Of the 91 correlations between variables, 20 of them had correlation values higher than 0.30.

Page 18: Exploratory Factor Analysis

Partial Correlations• Partial correlations should be small, as opposed to

correlations. In anti-image correlation matrix, the values other than the diagonal shows us the partial correlations.

• Only X8 and X14, and X17 and X18 had high partial correlations with each other.

• MSA less than 0.50 MSA values should be deleted one by one.

• X17 Price Flexibility demonstrated a lower individual MSA value of 0.4 that we deleted this variable as it strictly suggested by Hair et al.

Page 19: Exploratory Factor Analysis

Partial Correlations

 X6 -

Product Quality

X7 - E-Commerc

e

X8 - Technical Support

X9 - Complaint Resolution

X10 - Advertisin

g

tr_x11 - transformed product

line

X12 - Salesforce Image

X13 - Competitive Pricing

X14 - Warranty &

Claims

X15 - New Products

X16 - Order & Billing

X17 - Price

Flexibility

X18 - Delivery Speed

X6 - Product Quality ,859a                        

X7 - E-Commerce -0,096 ,692a                      

X8 - Technical Support

-0,024 0,019 ,518a                    

X9 - Complaint Resolution

-0,017 0,055 -0,092 ,908a                  

X10 - Advertising -0,038 -0,022 -0,084 0,099 ,746a                

tr_x11 - transformed product line

0,026 0,025 0,029 0,013 -0,209 ,499a              

X12 - Salesforce Image

0,113 -0,676 0,068 -0,071 -0,437 0,141 ,655a            

X13 - Competitive Pricing

0,118 -0,08 0,071 0,003 0,021 0,072 -0,044 ,923a          

X14 - Warranty & Claims

0,014 0,012 -0,839 0,078 0,129 -0,049 -0,144 -0,063 ,542a        

X15 - New Products -0,128 0,089 0,114 0,085 -0,025 -0,086 -0,045 0,086 -0,092 ,567a      

X16 - Order & Billing -0,089 0,008 0,11 -0,208 0,073 -0,012 -0,084 0,071 -0,137 -0,029 ,937a    

X17 - Price Flexibility

0,228 0,041 -0,012 0,036 -0,208 0,904 0,121 -0,123 0,017 -0,115 -0,069 ,442a  

X18 - Delivery Speed -0,088 -0,057 0,002 -0,352 0,095 -0,871 -0,088 0,011 0,001 0,039 -0,13 -0,86 ,586a

Page 20: Exploratory Factor Analysis

Correlations revisited• After deleting X17, KMO and Bartlett’s Test results

showed that total MSA value raised to 0,695 from 0,648, and Bartlett test gave us a significant result, again.

• The new correlation matrix presented 15 correlations more than 0.30 while the number was 20 with X17, although most of the correlations were significant.

Page 21: Exploratory Factor Analysis

Deriving Factors and Assessing Overall Fit

• Selecting the Factor Extraction Method: Common factor analysis vs. Component factor analysis

Total variance = common variance + specific variance + error variance

Component analysis considers the total variance, most appropriate when:• Data reduction is primary concern (minimum number of factor to account

for maximum portion of total variance)• Prior knowledge suggests that specific and error variance is small

Common factor analysis considers only common variance, used when:• Data summarization is the primary objective• Little knowledge about specific and error variance

Page 22: Exploratory Factor Analysis

Stage 4: Deriving Factors and Assessing Overall Fit

• Criteria for number of factors to extract

Latent root criterion (component)Eigenvalues ≥ 1 if 20 <♯of variables < 50

A priori criterion – set a predetermined ♯of factors

Percentage of variance criterion – until achieving a specified cumulative % of total variance (%60 in social sciences)

Scree test criterion – factors before inflection point

More factors if the respondents are heterogeneous

Page 23: Exploratory Factor Analysis

Deriving Factors and Assessing Overall Fit• Principal component factor analysis• Factor extraction method, latent root criterion which only except factors with an

eigenvalues more than 1• The below table show that the first factor accounts for 31%, the second one 19%,

the third %16, and the fourth 15% of total variance. • A total of 75% of total variance was explained with a four-factor solution

Component

Initial Eigenvalues Rotation Sums of Squared Loadings

Total% of

VarianceCumulative

% Total% of

VarianceCumulative

%1 3,723 31,024 31,024 2,888 24,066 24,066

2 2,320 19,331 50,355 2,330 19,416 43,482

3 1,689 14,071 64,427 1,910 15,914 59,396

4 1,267 10,559 74,986 1,871 15,590 74,986

5 ,946 7,879 82,865      

6 ,574 4,787 87,652      

7 ,489 4,078 91,730      

8 ,342 2,852 94,582      

9 ,228 1,902 96,484      

10 ,187 1,561 98,044      

11 ,136 1,137 99,182      

12 ,098 ,818 100,000      

Page 24: Exploratory Factor Analysis

Scree Plot

If we had employed scree test criterion, then we would have came up more factors. As you can see from the figure, inflection point was after the sixth factor

Page 25: Exploratory Factor Analysis

Interpreting the Factors

Three processes• Estimate the factor matrix • Factor rotation – Orthogonal or Oblique

Orthogonal – best suited when the aim is data reduction• QUARTIMAX • VARIMAX• EQUIMAX

Oblique – best suited when the aim is to obtain theoretically meaningful factors or constructs • OBLIMIN

• Factor interpretation and respecification Factor loadings ≥ 0.30, preferably ≥ 0.50 Avoid cross-loadings Communality ≥ 0.50 Respecify the factor model if needed Label the factors

Page 26: Exploratory Factor Analysis

Interpreting the Factors

• Orthogonal VARIMAX rotation was used due to its simplicity and wide usage.

• For data reduction purposes orthogonal rotation is suggested• We set factor loadings more than 0.40 as significant

Table 10: Component MatrixTable 11: Rotated Component Matrix

 

Component

 

Component

1 2 3 4 1 2 3 4X6   -,571   ,567 X6     ,831  

X7 ,468 ,646     X7   ,882    

X8     ,867   X8       ,954

X9 ,836       X9 ,920      

X10 ,493 ,535     X10   ,784    

tr_x11 ,699 -,499     tr_x11 ,576   ,663  

X12 ,535 ,689     X12   ,908    

X13   ,662   -,450 X13     -,797  

X14 ,408   ,838   X14       ,942

X15         X15        

X16 ,795       X16 ,861      

X18 ,877       X18 ,932      

Page 27: Exploratory Factor Analysis

Rotation• As you can see unrotated factor solution had lots of cross-

loadings that rotation was necessary. • X11 Product Line had a cross-loading both in factor 1 and

3, and loadings were closer than 0.10. • Thus, we tried another rotation methods to remedy this

inconsistency, but other rotation methods also gave cross-loadings for X11.

• As a result, we decided to eliminate X11. • Also, X15 had a lower loading than 0.40 that we

eliminated this variable too. • After the elimination of X11 and X15, total variance

explained by 4 factor solution increased to 81%.

Page 28: Exploratory Factor Analysis

Rotated Component Matrix

Table 12: New Rotated Component Matrix  

 

Component  Communalities

1 2 3 4  

 

X18 - Delivery Speed 0,932         

,908

X9 - Complaint Resolution 0,929         

,886

X16 - Order & Billing 0,88         

,804

X12 - Salesforce Image   0,904       

,868

X7 - E-Commerce   0,884       

,793

X10 - Advertising   0,78       

,642

X8 - Technical Support     0,954     

,918

X14 - Warranty & Claims     0,948     

,921

X6 - Product Quality       0,861   

,746

X13 - Competitive Pricing       -0,827   

,714

Page 29: Exploratory Factor Analysis

Factors• Factor 1: Delivery Speed, Complaint Resolution, Order &

Billing (Sales Support)

• Factor 2: Salesforce Image, E-Commerce, Advertising (Recognition)

• Factor 3: Technical Support, Warranty & Claims (After Sales Services)

• Factor 4: Product Quality, Competitive Pricing (Quality & Price)

Page 30: Exploratory Factor Analysis

Validation of Factor Analysis

• Assess the generalizability of results Split sample Separate sample CFA

• Detect the influential observations - outliers

Page 31: Exploratory Factor Analysis

Validation of Factor Analysis

Table 13: Rotated Component Matrix of Split Sample

  Component

1 2 3 4

X18 - Delivery Speed ,927      

X9 - Complaint Resolution ,912      

X16 - Order & Billing ,866      

X12 - Salesforce Image   ,922    

X7 - E-Commerce   ,882    

X10 - Advertising   ,793    

X14 - Warranty & Claims     ,952  

X8 - Technical Support     ,948  

X6 - Product Quality       ,869

X13 - Competitive Pricing       -,812

We used split sample. When we run the factor analysis with split sample, MSA value became 0,652, and Bartlett Test gave a significant result that factor structure can be examined in split sample also. Besides rotated component matrix demonstrated the same factor structure.

Page 32: Exploratory Factor Analysis

Additional Uses of EFA Results

Data Reduction Options• Surrogate variable

• Summated scales Unidimensionality Reliability

Item-to-total correlations ≥ 0.50, inter-item correlations ≥ 0.30Cronbach α ≥ 0.30

Validity - Convergent, discriminant, nomological

• Factor scores, we used factor scores to avoid additional validaitons.

Page 33: Exploratory Factor Analysis

Thank You for Listening!