mv7.cluster analysis
TRANSCRIPT
-
8/13/2019 MV7.Cluster Analysis
1/16
Cluster Analysis
M.ThenmozhiProfessor
Department of Management StudiesIIT Madras
-
8/13/2019 MV7.Cluster Analysis
2/16
CLUSTER ANALYSIS
Searches for the natural groupings
among objects described by p variables. Within each cluster - high homogenity but
between clusters - high heterogenity.
12/6/2013 2DoMS, [email protected]
-
8/13/2019 MV7.Cluster Analysis
3/16
Data reduction - Information from entire population
reduced to information about specific smaller sub-groups
Segmenting market - basis number of variablesIdentifying similar test markets, similar firms, products
Group personality profilesProduct positioning - brands into groups
When:
Large sample of data consisting of many variables - Datarecorded on continuous scale as well as on categoricalscale.
Purpose
12/6/2013 3DoMS, [email protected]
-
8/13/2019 MV7.Cluster Analysis
4/16
Shampoo buying behaviour
Degree of importance measured on 8variables: brand name, price, availability,
brand image, Co. Name, advertising,retailer recommendations, family income.Result - Five Clusters
12/6/2013 4DoMS, [email protected]
-
8/13/2019 MV7.Cluster Analysis
5/16
Key patterns Classi f i cation1. High importance to price, brand,
image, family, advt., influence,
availability
Conservative
2. High importance to priceModerate product & Co. imageLess dependence on advtg. &
retailer reco.
Value formoney
3. High brand imageModerate advtg. & Co. nameLow family influence & retailer
reco
Brandconscious &
personal choice
4. High brand image, loyalty, familyinfluence and low price
Habitual purchaser
5. High availability and low brandloyalty
Switcher12/6/2013 5DoMS, IITM
-
8/13/2019 MV7.Cluster Analysis
6/16
Stage 1:PartitioningHow should inter-object similarity be measured?
Correlation coefficient/Distance
What procedure(algorithm) should be used to place similar objects into groups?
Hierarchical or Non-Hierarchical
How many Clusters?
12/6/2013 6DoMS, [email protected]
-
8/13/2019 MV7.Cluster Analysis
7/16
Single linkage rule Complete linkage rule Average linkage rule Ward's method
AgglomerativeType title here
Divisive
Hierarchical
12/6/2013 7DoMS, [email protected]
-
8/13/2019 MV7.Cluster Analysis
8/16
Agglomerative method :Hierarchical clustering procedure which starts with each object inseparate clusters. Subsequently, clusters closesttogether - aggregated.
Single linkage method : Procedure based onminimum distance. Finds 2 objects with theshortest distance placed in one cluster - processcontinues until all objects are placed in onecluster.
12/6/2013 8DoMS, [email protected]
-
8/13/2019 MV7.Cluster Analysis
9/16
Complete linkage rule : Maximum distance Average linkage rule : Average distance
Wards method : Distance b/w 2 clusters is thesum of the squares b/w the 2 clusters summedover all variables.
Centroid method : Distance b/w 2 clusters is thedistance b/w the centroids - less affected outliers,requires metric data
12/6/2013 9DoMS, [email protected]
-
8/13/2019 MV7.Cluster Analysis
10/16
Stage 2: Interpretation - naming, using average scores(for each
variable/cluster)- Significance
Stage 3:
Profiling - describing the characteristics of each cluster-explain how they may differ on relevantdimensions
12/6/2013 10DoMS, [email protected]
-
8/13/2019 MV7.Cluster Analysis
11/16
Mahalanobis distance : Standardized form of E.D. Data isstandardized by scaling responses in terms of standarddeviations and adjustments are made for inter-correlation
b/w variables.
Euclidean Distance
D15 = sqrt{(X 11 - X 15)2
+ (X 21 - X 25)2
+ ... + (X n1 - X n5)2
}
X11 : Respondent scoren: number of variablesMatrix of inter-respondent distances.
If variables are categorical - distance - no. of questions on
which 2 respondents gave the same answers 12/6/2013 11DoMS, [email protected]
-
8/13/2019 MV7.Cluster Analysis
12/16
Dendrogram : A tree graph - Graphicalrepresentation of the results of a clustering
procedure in which the vertical axis consistsof the objects or individuals & thehorizontal axis represents the number of
clusters formed.
12/6/2013 12DoMS, [email protected]
-
8/13/2019 MV7.Cluster Analysis
13/16
Validity:
Two separate samples - see for similarity of results Use two different C.A. Program Discriminant analysis ANOVA
12/6/2013 13DoMS, [email protected]
-
8/13/2019 MV7.Cluster Analysis
14/16
Group M eans & Signif icance level for two groups
MeansVariableCluster 1 Cluster2
F.Ratio Level of Significance
X1 4.94 8.82 55.4 0.0001
X2 6.11 3.10 34.1 0.0001
X3 13.52 17.81 68.6 0.0001
X4 10.87 9.88 2.6 0.1160
X5 5.52 5.92 1.0 0.3212
X6 5.35 5.07 0.4 0.5183
12/6/2013 14DoMS, [email protected]
-
8/13/2019 MV7.Cluster Analysis
15/16
X1, X 2, X 3 - Interpretation - LabelingX4, X 5, X 6 - Not significant
E.I.D. Parry
200 respondents: Two most crucial factors considered
while buying a particular brand of genset. 60%-initial cost of genset, running & maintenance cost Cluster analysis performed - ranking on 9 point scale -
most important to least important. Sample size 50: split into two. Euclidean distance,
single linkage rule.
12/6/2013 15DoMS, [email protected]
-
8/13/2019 MV7.Cluster Analysis
16/16
Six clusters I nitial Cost Running & maintenance cost
1 Very low Very low
2 Low Very low
3 Moderate Low
4 Low Moderate
5 High Very low
6 High Moderate
12/6/2013 16DoMS, [email protected]