mv7.cluster analysis

8/13/2019 MV7.Cluster Analysis

1/16

Cluster Analysis

M.ThenmozhiProfessor

Department of Management StudiesIIT Madras

[email protected]


2/16

CLUSTER ANALYSIS

Searches for the natural groupings

among objects described by p variables. Within each cluster - high homogenity but

between clusters - high heterogenity.

12/6/2013 2DoMS, [email protected]


3/16

Data reduction - Information from entire population

reduced to information about specific smaller sub-groups

Segmenting market - basis number of variablesIdentifying similar test markets, similar firms, products

Group personality profilesProduct positioning - brands into groups

When:

Large sample of data consisting of many variables - Datarecorded on continuous scale as well as on categoricalscale.

Purpose



4/16

Shampoo buying behaviour

Degree of importance measured on 8variables: brand name, price, availability,

brand image, Co. Name, advertising,retailer recommendations, family income.Result - Five Clusters



5/16

Key patterns Classi f i cation1. High importance to price, brand,

image, family, advt., influence,

availability

Conservative

2. High importance to priceModerate product & Co. imageLess dependence on advtg. &

retailer reco.

Value formoney

3. High brand imageModerate advtg. & Co. nameLow family influence & retailer

reco

Brandconscious &

personal choice

4. High brand image, loyalty, familyinfluence and low price

Habitual purchaser

5. High availability and low brandloyalty

Switcher12/6/2013 5DoMS, IITM

[email protected]


6/16

Stage 1:PartitioningHow should inter-object similarity be measured?

Correlation coefficient/Distance

What procedure(algorithm) should be used to place similar objects into groups?

Hierarchical or Non-Hierarchical

How many Clusters?



7/16

Single linkage rule Complete linkage rule Average linkage rule Ward's method

AgglomerativeType title here

Divisive

Hierarchical



8/16

Agglomerative method :Hierarchical clustering procedure which starts with each object inseparate clusters. Subsequently, clusters closesttogether - aggregated.

Single linkage method : Procedure based onminimum distance. Finds 2 objects with theshortest distance placed in one cluster - processcontinues until all objects are placed in onecluster.



9/16

Complete linkage rule : Maximum distance Average linkage rule : Average distance

Wards method : Distance b/w 2 clusters is thesum of the squares b/w the 2 clusters summedover all variables.

Centroid method : Distance b/w 2 clusters is thedistance b/w the centroids - less affected outliers,requires metric data



10/16

Stage 2: Interpretation - naming, using average scores(for each

variable/cluster)- Significance

Stage 3:

Profiling - describing the characteristics of each cluster-explain how they may differ on relevantdimensions



11/16

Mahalanobis distance : Standardized form of E.D. Data isstandardized by scaling responses in terms of standarddeviations and adjustments are made for inter-correlation

b/w variables.

Euclidean Distance

D15 = sqrt{(X 11 - X 15)2

+ (X 21 - X 25)2

+ ... + (X n1 - X n5)2

}

X11 : Respondent scoren: number of variablesMatrix of inter-respondent distances.

If variables are categorical - distance - no. of questions on

which 2 respondents gave the same answers 12/6/2013 11DoMS, [email protected]


12/16

Dendrogram : A tree graph - Graphicalrepresentation of the results of a clustering

procedure in which the vertical axis consistsof the objects or individuals & thehorizontal axis represents the number of

clusters formed.



13/16

Validity:

Two separate samples - see for similarity of results Use two different C.A. Program Discriminant analysis ANOVA



14/16

Group M eans & Signif icance level for two groups

MeansVariableCluster 1 Cluster2

F.Ratio Level of Significance

X1 4.94 8.82 55.4 0.0001

X2 6.11 3.10 34.1 0.0001

X3 13.52 17.81 68.6 0.0001

X4 10.87 9.88 2.6 0.1160

X5 5.52 5.92 1.0 0.3212

X6 5.35 5.07 0.4 0.5183



15/16

X1, X 2, X 3 - Interpretation - LabelingX4, X 5, X 6 - Not significant

E.I.D. Parry

200 respondents: Two most crucial factors considered

while buying a particular brand of genset. 60%-initial cost of genset, running & maintenance cost Cluster analysis performed - ranking on 9 point scale -

most important to least important. Sample size 50: split into two. Euclidean distance,

single linkage rule.



16/16

Six clusters I nitial Cost Running & maintenance cost

1 Very low Very low

2 Low Very low

3 Moderate Low

4 Low Moderate

5 High Very low

6 High Moderate


mv7.cluster analysis

Documents