data mining techniques clustering. purpose in clustering analysis, there is no pre-classified data...
TRANSCRIPT
![Page 1: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bf861a28abf838c87f93/html5/thumbnails/1.jpg)
Data Mining Techniques Clustering
![Page 2: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bf861a28abf838c87f93/html5/thumbnails/2.jpg)
Purpose
• In clustering analysis, there is no pre-classified data
• Instead, clustering analysis is a process where a set of objects is partitioned into several clusters
• All members in one cluster are similar to each other and different from the members of other clusters, according to some similarity metric (e.g., the opposite of distance between objects)
![Page 3: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bf861a28abf838c87f93/html5/thumbnails/3.jpg)
Cluster Analysis
X (Income)
Y (Age)
Customer(Object)
Variables
Cluster
![Page 4: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bf861a28abf838c87f93/html5/thumbnails/4.jpg)
Cluster Analysis
Data Matrix
DissimilarityMatrix (nn)
n objetcsp variables
![Page 5: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bf861a28abf838c87f93/html5/thumbnails/5.jpg)
Attribute Types Involved in Cluster Analysis
• Interval Variables– An interval variable contains continuous measurements
(e.g., height, weight, temperature, cost, etc.) which follow a linear scale
– It is essential that intervals keep the same importance throughout the scale
• Nominal Variables– A nominal variable takes on more than two states. For
example, the eye color of a person can be blue, brown, green or grey eyes
– These states may be coded as 1, 2, ..., M, however their order and the interval between any two states do not have any meaning
![Page 6: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bf861a28abf838c87f93/html5/thumbnails/6.jpg)
Attribute Types Involved in Cluster Analysis
• Ordinal Variables– An ordinal variable takes on more than two states. For
example, you may ask someone to convey his/her appreciation of some paintings in terms of the following categories: 1=detest, 2=dislike, 3=indifferent, 4=like and 5=admire
– In an ordinal variable, their states are ordered in a meaningful sequence. However, the interval between any two consecutive states are not equally distanced
• Binary Variables– Binary variables have only two possible states. For
example, the gender of a person is either female or male
![Page 7: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bf861a28abf838c87f93/html5/thumbnails/7.jpg)
Dissimilarity (Distance) Measure
![Page 8: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bf861a28abf838c87f93/html5/thumbnails/8.jpg)
Dissimilarity (Distance) Measure
![Page 9: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bf861a28abf838c87f93/html5/thumbnails/9.jpg)
Dissimilarity (Distance) Measure
![Page 10: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bf861a28abf838c87f93/html5/thumbnails/10.jpg)
Dissimilarity (Distance) Measure
![Page 11: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bf861a28abf838c87f93/html5/thumbnails/11.jpg)
Dissimilarity (Distance) Measure
![Page 12: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bf861a28abf838c87f93/html5/thumbnails/12.jpg)
Dissimilarity (Distance) Measure
![Page 13: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bf861a28abf838c87f93/html5/thumbnails/13.jpg)
Dissimilarity (Distance) Measure
![Page 14: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bf861a28abf838c87f93/html5/thumbnails/14.jpg)
Dissimilarity (Distance) Measure
![Page 15: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bf861a28abf838c87f93/html5/thumbnails/15.jpg)
Dissimilarity (Distance) Measure
![Page 16: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bf861a28abf838c87f93/html5/thumbnails/16.jpg)
Categorization of Clustering Methods
• Exclusive vs. Non-Exclusive (Overlapping)• Hierarchical Methods vs. Partitioning Methods• Hierarchical Methods
– Single Link Method– Complete Link Method
• Partitioning Methods– Kohonen Self-Organizing Feature Maps– K-Means Methods– K-Medoids Methods (PAM, CLARA, CLARANS)– Density-Based Methods– …
![Page 17: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bf861a28abf838c87f93/html5/thumbnails/17.jpg)
Hierarchical Methods
DissimilarityMatrix (55)
![Page 18: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bf861a28abf838c87f93/html5/thumbnails/18.jpg)
K-Means Methods
![Page 19: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bf861a28abf838c87f93/html5/thumbnails/19.jpg)
K-Means Methods
![Page 20: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bf861a28abf838c87f93/html5/thumbnails/20.jpg)
K-Means Methods
![Page 21: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bf861a28abf838c87f93/html5/thumbnails/21.jpg)
K-Means Methods
Sensitive toOutlier!
![Page 22: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set](https://reader036.vdocuments.us/reader036/viewer/2022062409/5697bf861a28abf838c87f93/html5/thumbnails/22.jpg)
Exercise 7
Object X Y
1 22 60
2 40 25
3 60 30
4 64 66
5 80 30
6 82 55
Number of clusters = 2
Using Single Link, Complete Link and K-Means to cluster the following data: