Social Media Application
Goal: Data Reduction for Data Visualization
People
Variables
CLUSTER ANALYSIS
FACTOR ANALYSISVariable/Dimension Reduction
Cluster and Factor Analysis
For car buying, what matters to customers?Question
Hypothesis
Data
Analytics
Charts
Answer
Brainstorm: Car Purchase
Surveys
Q Rate on a scale of 1-Low to 9-High(randomized list)
Shopper#1 NewBMW
1971 Olds 442 Conv.
1 Initial Price 9 3 42 Style 7 8 93 # of Miles on Car 7 9 44 Reliability 7 6 25 Color 5 7 96 Comfort 6 7 57 Horsepower 2 6 98 Safety 6 7 19 Financing Terms 7 5 2
10 Country Origin 1 7 711 Drive Type (Front, 4WD) 4 4 612 Miles Per Gallon (MPG) 6 7 513 Warranty Coverage 4 5 2
Survey: Attribute Ratings
Many more features, options….
Q Rate on a sale of 1- 91 Initial Price
2 Style
3 # of Miles on Car
4 Reliability
5 Color
6 Comfort
7 Horsepower
8 Safety
9 Financing Terms
10 Country Origin
11 Drive Type (Front, 4WD)
12 Miles Per Gallon (MPG)
13 Warranty Coverage
Survey: Attribute Ratings1 2 3 4 5 6 7 8 9 1
011
12
13
cor(data, digits=2)
Correlation Matrix
install.packages("corrgram")library(corrgram)corrgram(data)
Factor Analysis / Variable Reduction
Correlation Matrix
Correlated variables are grouped together and separated from other variables with low or no correlation
Factor Analysis
F1
Factor Analysis
F2 FN….F3
First & Second Principal Components
Z1 and Z2 are two linear combinations.
• Z1 has the highest variation (spread of values)
• Z2 has the lowest variation
40 60 80 100 120 140 160 1800
10
20
30
40
50
60
70
80
90
100
calories
rating
z1
z2
16
F1
b’s Factor Loadings
Factor Analysis
F2 FN….F3
Packages
Library PC Method Rotation Plot
psych fa() Yes No
principal()
princomp() No Yes
Principal Components Analysis
Modelmodel <- princomp(data, cor=TRUE)summary(model) biplot(model)
Output
# scree plotplot(fit,type="lines")
Psych Package
Psych Package – falibrary(psych)rmodel <- fa(r = corMat, nfactors = 3, rotate = “none", fm = "pa")
Each variable (circle) loads on both
factors and there is no clarity about
separating the variables into different
factors, to give the factors useful
names.
Factor 2
Factor 1
RotationRotations Courtesy of Professor Paul Berger
26
“CLASSIC CASE”
After rotationof ~450
NOW, all variables are loading on one factor and not at all the other; This is an overly “dramatic” case.
• Not Correlated Orthogonal• Varimax = Orthogonal Rotation
RotationRotations Courtesy of Professor Paul Berger
Psych Package – falibrary(psych)rmodel <- fa(r = corMat, nfactors = 3, rotate = "oblimin", fm = "pa")
Psych Package – principallibrary(psych)fit <- principal(ratings6, nfactors=4, rotate=“null")
Psych Package – principallibrary(psych)fit <- principal(ratings6, nfactors=4, rotate="varimax“)
corrgram(ratings6[,(1,2,9,12,3,4,6,8,10,5,11,7,13)])
Orthogonal /No Correlation
3 Factor vs. 4 Factor
3 Factor vs. 4 Factor
StyleComfortColorUpgrade PackagesReliabilitySafetyCountry OriginHorsepowerNice DashMiles Per GallonInitial Price# of Miles on CarFinancing Options
Aaahh!!!Factor
Money
Perceptual Map
Factor Loadings
Brand Ratings
Weights
Average
Variance
Which One?Which Car?
Price$$$
$
Sweet!!!BORING
Aaaah factor…
Component Matrixa
.714 -7.61E-02 .327
.539 .226 -.145
.796 -3.02E-02 .338
.789 6.734E-02 -.379
.712 .107 -.499
.747 -2.02E-02 -.205
6.412E-03 .795 -4.87E-02
-.130 .841 3.175E-02
.675 -4.47E-02 .512
-5.09E-02 .701 .251
.791 1.682E-02 6.907E-02
D01
D02
D03
D04
D05
D06
D07
D08
D09
D10
D11
1 2 3
Component
3 components extracted.a.
Factor Analysis Recap
Dimensionality Reduction
Applications Algorithms