factorial analysis of qualitative and quantitative ... · european university of britany agrocampus...

Post on 19-Apr-2018

232 Views

Category:

Documents

8 Downloads

Preview:

Click to see full reader

TRANSCRIPT

European University of Britany

Agrocampus

Factorial analysis of qualitative and quantitative dataFactorial analysis of qualitative and quantitative databoth mixed and structured according to a hierarchy

Jérôme Pagès

Applied Mathematics deptApplied Mathematics dept.

1

Mont Saint-Michel

Rennes

Britany

2

3

Context of factorial analysis : One set of individuals described by several variables

Taking into account both quantitative and qualitative variablesFactor Analysis for Mixed Data (FAMD)1

Taking into account a partition of the variables2 Multiple Factor Analysis (MFA)2

Taking into account a hierarchy defined on the variablesHierarchical Multiple Factor Analysis (HMFA)3 Hierarchical Multiple Factor Analysis (HMFA)3

4

Taking into account both quantitative and qualitative variables1Data

g q qFactor Analysis for Mixed Data (FAMD)1

K1 quantitativevariables

( t d di d)

Q qualitativevariables(standardized) variables

1 k K11 q Q1

individuals i xik xiq

I

5

DataData

Q qualitatives variablesK quantitative Q q=K2 indicatrices(complete disjunctive coding)

K1 quantitativevariables

(standardized)

1 q Q

1 k K1 kq K2

1 k K1 1 kq Kq

1

individuals i xik xikq

I

6

Principal components analysisRepresentation of the variables and criterionRepresentation of the variables and criterion

RI

O i bl i

Criterion of standardized PCA1

k

v

One variable = one axis

v

θk

2 ( , )k

r k v∑O

1

θkv

2cos kvk

θ∑k

k

7

Multiple correspondence analysisRepresentation of the variables and criterionRepresentation of the variables and criterion

Variable q = sub space ERI

Variable q = sub-space Eq

v E

Criterion of MCA

2 ( )q vη∑ Eq

2cos qvθ∑

( , )q

q vη∑

O θqvqv

q∑

8

Factor analysis for mixed data (FAMD)

Criterion

RI2 2cos coskθ θ+∑ ∑B. Escofier (1979)

1k

RI cos coskv qvk q

θ θ+∑ ∑

1v Eq

O

θkvθqv

12 2( , ) ( , )

k qr k v q vη+∑ ∑

k q

M. Tenenhaus (1985)G. Saporta (1990)

9

p ( )

Representations provided by FAMD

F2F2

A

F2

i1

k1

kF1

A

B

Cindividual

categoryF1

i1

i2

k3

k2

10

Taking into account a partition of the variablesMultiple Factor Analysis (MFA)2

J tit ti J lit ti

MFA applied to groups of variables : quantitative, qualitative or mixed

p y ( )

(complete disjunctive coding)J1 quantitative groups J2 qualitative groups

1 q Qj1 k Kj

1 j J1 1 j J2Groups

Var.

1 k Kq

1

Ind. i xik xik

I

11

Weighting the groups of variablesWeighting

S l f ti i bl i i l iSeveral groups of active variables in a unique analysis

Question :

How to balance their influence ?

12

Weighting the variables in MFAWeighting

Reference example : two groups of quantitative variables

RI

set 1 : 2 var.

set 2 : 3 varset 2 : 3 var.

13

Reference exampleWeighting

PCA of the 5 variables, without considering the sets

1st principalcomponent

RI component

set 1 : 2 var.

set 2 : 3 var.

14

Reference exampleWeighting

Weighting the sets of variables in MFAbalancing the maximum axial inertia

Each variable of the set j is weighted by 1/λ1j

λ1j : 1st eigenvalue of PCA applied to set j.

15

RI

.5

.5

set 1 : 2 var.

set 2 : 3 var.

1

1

15

MFA is based on a factor analysis applied to all active sets of variables

Weighted fact. an.

The groups of variables can bequantitative (standardized or not)

MFA is based on a factor analysis applied to all active sets of variables

quantitative (standardized or not)qualitativemixed

Criterion (case of 2 groups : K1 quantitative variables Q2 qualitative variables)

1 2

2 21 21 1 2

1 1( , ) ( , )k K q Q

r k v q vQ

ηλ λ∈ ∈

+∑ ∑

Equivalences of MFAwhen each group t i i l i bl

Quantitative variables

Qualitatives variables

Standardized PCA

MCA

16

contains a single variableQ

Mixed data FAMD

MFA is based on a factor analysis applied to all active sets of variablesWeighted fact. an.

MFA provides :Firstly : classical results of factor analysis

F2F2

A

F2

i1

k1

kF1

A

B

Cindividual

categoryF1

i1

i2

k3

k2

Specific representations (see HMFA)

17

Taking into account a hierarchy defined on the variablesHierarchical Multiple Factor Analysis (HMFA)3

Sorted napping : an holistic approach in sensory evaluation

Hierarchical Multiple Factor Analysis (HMFA)

A set of products is given (a, b, c, d, e, f)

Task 1 : napping

Position the products on the tablecloth in such a way that :two products are very near one another if they seem identical (for you) ,two products are distant one another if they seem different (for you). p y ( y )

This must be done according to your own criteria.

18

Sorted napping : an holistic approach in sensory evaluation

A set of product is given (a, b, c, d, e, f)

Task : sorted napping

As the panellist forms his “tablecloth” (or ”nappe”)As the panellist forms his tablecloth (or nappe ),he is asked to make groups of products,i.e. to put in the same group the products that he perceives as similar.

19

Data structure One sorted napping

Coordinates on the tableclothe2 quantitative var. (not standardized)

Sorting : 1 qualitative var.g q

X Y C

X Y C

Nappingpp g

Sorting

20Sorted napping

Case of several sorted nappings Data structure

X1 Y1 X2 Y2C1 C2

Napping 1 Napping 2pp g

Sorting 1

pp g

Sorting 2

21Sorted napping 1 Sorted napping 2

Exemple : two sorted nappings

22

λ1=1 955HMFA λ1 1.955Balancing in HMFA

λ1=1.866 λ1=1.956

λ 1

MFAMFA

λ1=200 λ1=249

λ1=1λ1=1

PCA

PCA

MCA MCA

PCA

X1 Y1 X2 Y2C1 C2

Napping 1 Napping 2pp g

Sorting 1

pp g

Sorting 2

23Sorted napping 1 Sorted napping 2

Sorted nappings seen through their MFAFirst step of HMFA

The raw sorted nappings

24

Individuals factor map

F2 (30.35 %)

c d

λ1=1.955

b

F1 (64.54 %)1

a b e f

25

λ1=1 955Decomposition of the inertia for the first axisEfficiency of the balancing

λ1 1.955

.977 .978

464

.514 .505

.464 .472

X1 Y1 X2 Y2C1 C2

Napping 1 Napping 2pp g

Sorting 1

pp g

Sorting 2

26Sorted napping 1 Sorted napping 2

Representation of the nodes (=sets of variables) in HMFA

Data1 Kj

1

scalar products1 l I

1The set Kj ofvariables gatheredin the node j

1

i xik

1

i Wj(i,l)in the node j

I I

RI RI²

NKj

Wj

NJ

27

Representing NJ with HMFA

RI RI²

jNKj

N

Wj ws

NJvs

ws : W associated to vs

i i f j j d di finertia of NKj projected upon vs co-ordinate of Wj upon ws

Projected inertia of the whole set of the variables Kj onto vsj

= a measure of relationship betweenone variable vs

a set of variables Kj

( , )s jLg v K

0 ( , ) 1s jLg v K≤ ≤

28

a set of variables Kj( , )s jg

Can be applied to each node of the hierarchy

F2 (30.35 %)Lg square = relationship square

c d

F1 (64.54 %)

1 Sorting 1 F2 – 30.35%

a b e f

( )

sorted napping 1

5.5napping 1

napping 2

2

sorted napping 2

napping 2

2

01.50

sorting 2

F1 - 64.54%

29

Superimposed representation of the partial clouds of individuals(e g the J partial clouds associated to the highest partition in HMFA)

1 j J1 K1 1 Kj 1 KJ

(e.g. the J partial clouds associated to the highest partition in HMFA)

1 K1 1 Kj 1 KJ

1i1 ij iJ

ii

I

RK1 RKj RKJ NIJNI

jRNI

j

i ji 1

NI

i J

NI

30NIj : partial cloud (individuals seen through group j)

Superimposed representation of the J partial clouds of individuals (MFA)Geometrical framework

RK1 RKj RKJ N JN jR 1 RNI

j

i ji1

R NI

iJ

NI

RKj

RK

NIjRK1 i jNI

j

i 1jKKR R= ⊕

NIjpartial cloud

N mean cloudi

NI mean cloud

31NI

Superimposed representation of the J partial clouds of individuals (MFA)

Principle

RKj

RK

NIj

RK1

i jNIj

i1

i

i1

usNI

Partial clouds are projected onto principal axes of the mean cloud

32

p j p p

Representation of the partial clouds associated to the two highest nodes

c dc d

F1 - 64.54 %

ab

e f

F2 - 30.35 %

33

Representation of the partial clouds associated to the two highest nodes

c-1 d-1

c dc d

F1 - 64.54 %b-2

c-2 d-2

e-2 f-2

ab

e f

a-2e-2

a-1F2 - 30.35 %

b-1 e-1f-1

34

Concl sion

Conclusion

Conclusion

HMFA is a factor analysis devoted to multiple tables in whicha set of individuals is described byseveral sets of variables organized according to a hierarchy

The variables can be quantitative or categoricalThe variables can be quantitative or categorical

The core of the method is a weighted factor analysis ; it worksThe core of the method is a weighted factor analysis ; it worksas a PCA for quantitative variablesas MCA for categorical variables

It provides resultsusual in any factor analysisusual in any factor analysis

representation of individuals, of variables, etc.specific to the hierarchy defined on the variables

representation of partials points, of nodes, etc.

35

representation of partials points, of nodes, etc.

The analyses were performed with

An R package dedicated to p gExploratory Analysis

See LMA² site36

top related