social media application goal: data reduction for data visualization

Post on 19-Dec-2015

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Social Media Application

Goal: Data Reduction for Data Visualization

People

Variables

CLUSTER ANALYSIS

FACTOR ANALYSISVariable/Dimension Reduction

Cluster and Factor Analysis

For car buying, what matters to customers?Question

Hypothesis

Data

Analytics

Charts

Answer

Brainstorm: Car Purchase

Surveys

Q Rate on a scale of 1-Low to 9-High(randomized list)

Shopper#1 NewBMW

1971 Olds 442 Conv.

1 Initial Price 9 3 42 Style 7 8 93 # of Miles on Car 7 9 44 Reliability 7 6 25 Color 5 7 96 Comfort 6 7 57 Horsepower 2 6 98 Safety 6 7 19 Financing Terms 7 5 2

10 Country Origin 1 7 711 Drive Type (Front, 4WD) 4 4 612 Miles Per Gallon (MPG) 6 7 513 Warranty Coverage 4 5 2

Survey: Attribute Ratings

Many more features, options….

Q Rate on a sale of 1- 91 Initial Price

2 Style

3 # of Miles on Car

4 Reliability

5 Color

6 Comfort

7 Horsepower

8 Safety

9 Financing Terms

10 Country Origin

11 Drive Type (Front, 4WD)

12 Miles Per Gallon (MPG)

13 Warranty Coverage

Survey: Attribute Ratings1 2 3 4 5 6 7 8 9 1

011

12

13

cor(data, digits=2)

Correlation Matrix

install.packages("corrgram")library(corrgram)corrgram(data)

Factor Analysis / Variable Reduction

Correlation Matrix

Correlated variables are grouped together and separated from other variables with low or no correlation

Factor Analysis

F1

Factor Analysis

F2 FN….F3

First & Second Principal Components

Z1 and Z2 are two linear combinations.

• Z1 has the highest variation (spread of values)

• Z2 has the lowest variation

40 60 80 100 120 140 160 1800

10

20

30

40

50

60

70

80

90

100

calories

rating

z1

z2

16

F1

b’s Factor Loadings

Factor Analysis

F2 FN….F3

Packages

Library PC Method Rotation Plot

psych fa() Yes No

principal()

princomp() No Yes

Principal Components Analysis

Modelmodel <- princomp(data, cor=TRUE)summary(model) biplot(model)

Output

# scree plotplot(fit,type="lines")

Psych Package

Psych Package – falibrary(psych)rmodel <- fa(r = corMat, nfactors = 3, rotate = “none", fm = "pa")

Each variable (circle) loads on both

factors and there is no clarity about

separating the variables into different

factors, to give the factors useful

names.

Factor 2

Factor 1

RotationRotations Courtesy of Professor Paul Berger

26

“CLASSIC CASE”

After rotationof ~450

NOW, all variables are loading on one factor and not at all the other; This is an overly “dramatic” case.

• Not Correlated Orthogonal• Varimax = Orthogonal Rotation

RotationRotations Courtesy of Professor Paul Berger

Psych Package – falibrary(psych)rmodel <- fa(r = corMat, nfactors = 3, rotate = "oblimin", fm = "pa")

Psych Package – principallibrary(psych)fit <- principal(ratings6, nfactors=4, rotate=“null")

Psych Package – principallibrary(psych)fit <- principal(ratings6, nfactors=4, rotate="varimax“)

corrgram(ratings6[,(1,2,9,12,3,4,6,8,10,5,11,7,13)])

Orthogonal /No Correlation

3 Factor vs. 4 Factor

3 Factor vs. 4 Factor

StyleComfortColorUpgrade PackagesReliabilitySafetyCountry OriginHorsepowerNice DashMiles Per GallonInitial Price# of Miles on CarFinancing Options

Aaahh!!!Factor

Money

Perceptual Map

Factor Loadings

Brand Ratings

Weights

Average

Variance

Which One?Which Car?

Price$$$

$

Sweet!!!BORING

Aaaah factor…

Component Matrixa

.714 -7.61E-02 .327

.539 .226 -.145

.796 -3.02E-02 .338

.789 6.734E-02 -.379

.712 .107 -.499

.747 -2.02E-02 -.205

6.412E-03 .795 -4.87E-02

-.130 .841 3.175E-02

.675 -4.47E-02 .512

-5.09E-02 .701 .251

.791 1.682E-02 6.907E-02

D01

D02

D03

D04

D05

D06

D07

D08

D09

D10

D11

1 2 3

Component

3 components extracted.a.

Factor Analysis Recap

Dimensionality Reduction

Applications Algorithms

top related