definition finding groups of objects such that the objects in a group will be similar (or related)...
TRANSCRIPT
![Page 1: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/1.jpg)
![Page 2: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/2.jpg)
DefinitionFinding groups of objects such that the
objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups Inter-cluster
distances are maximized
Intra-cluster distances are
minimized
![Page 3: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/3.jpg)
Applications• Group related documents for browsing• Group genes and proteins that have
similar functionality• Group stocks with similar price
fluctuations• Reduce the size of large data sets• Group users with similar buying
mentalities
![Page 4: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/4.jpg)
Clustering is ambiguousThere is no correct or incorrect solution for
clustering.
How many clusters?
Four Clusters Two Clusters
Six Clusters
![Page 5: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/5.jpg)
Challenges facedScalabilityAbility to deal with different types of attributesNoise & OutliersComplex shapes and types of dataIncremental clustering and insensitivity to the
order of input recordsHigh dimensionalityConstraint-based clusteringInterpretability and usability
![Page 6: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/6.jpg)
Types of DataData Matrix
n-objects with p-variables.The structure is in the form of a relational table,
or n x p matrixDissimilarity Matrix
object-by-object structure. Stores a collection of proximities that are available for all pair of n objects.
d(i, j) is the dissimilarity between objects i and j.d(i, j) = d(j, i) and d(i, i) = 0
![Page 7: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/7.jpg)
Types of DataInterval- Scaled VariablesBinary VariablesNominalOrdinalRatio-Scaled variablesVariables of Mixed Types
![Page 8: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/8.jpg)
Interval- Scaled Variables
![Page 9: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/9.jpg)
Interval-scaled variables contd…
![Page 10: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/10.jpg)
Binary variablesBinary variable has only two states 0 and 1Dissimilarity between two binary variables is
by a 2*2 contingency table for binary variables
1 0
1 q r q+r
0 s t s+t
q+s r+t p
OBJ i
OBJ j
![Page 11: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/11.jpg)
Dissimilarity between binary variablesName Gende
rFever Cough Test-1 Test-2 Test-3 Test-4
Jack M Y N P N N N
Mary F Y N P N P N
Jim M Y Y N N N N
D(Jack,Mary)=0.33D(Jack,Jim)=0.67D(Mary,Jim)=0.75
![Page 12: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/12.jpg)
Categorical Variables
![Page 13: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/13.jpg)
Ordinalsimilar to nominal variables, but values are
ordered in some sequence.Eg. rank or employees can be assistant,
associate, fullRatio-Scaled variables
Makes a positive measurement on a non-linear scaleEg. Growth of bacteria, radioactivity
Variables of Mixed Types
Other types of data
![Page 14: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/14.jpg)
Types of clusteringHierarchical clustering(BIRCH)
A set of nested clusters organized as a hierarchical tree
Partitional Clustering(k-means,k-mediods)A division data objects into non-overlapping
(distinct) subsets (i.e., clusters) such that each data object is in exactly one subset
Density – Based(DBSCAN)Based on density functions
Grid-Based(STING)Based on nultiple-level granularity structure
Model-Based(SOM)Hypothesize a model for each of the clusters and
find the best fit of the data to the given model
![Page 15: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/15.jpg)
Partitional Clustering
Original Points A Partitional Clustering
![Page 16: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/16.jpg)
Hierarchical Clustering
p4p1
p3
p2
p4 p1
p3
p2
p4p1 p2 p3
p4p1 p2 p3
Traditional Hierarchical Clustering
Non-traditional Hierarchical Clustering
Traditional Dendrogram
Non-traditional Dendrogram
![Page 17: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/17.jpg)
Clustering AlgorithmsPartitional
K-meansK-mediods
HierarchialAgglomerativeDivisive
![Page 18: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/18.jpg)
K-Mean AlgorithmEach cluster is represented by the mean value of
the objects in the clusterInput : set of objects (n), no of clusters (k)Output : set of k clustersAlgo
Randomly select k samples & mark them a initial cluster
Repeat Assign/ reassign in sample to any given cluster to which
it is most similar depending upon the mean of the cluster Update the cluster’s mean until No Change.
![Page 19: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/19.jpg)
K-Means (Array)Step 1: Randomly assign objects to k
clustersStep 2: Find the mean of each clusterStep 3: Re-assign objects to the cluster
with closest mean.Step 4: Go to step2
Repeat until no change.
![Page 20: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/20.jpg)
Example 1Given: {2,3,6,8,9,12,15,18,22} Assume k=3.Solution:
Randomly partition given data set: K1 = 2,8,15 mean = 8.3 K2 = 3,9,18 mean = 10 K3 = 6,12,22 mean = 13.3
Reassign K1 = 2,3,6,8,9 mean = 5.6 K2 = mean = 0 K3 = 12,15,18,22 mean = 16.75
![Page 21: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/21.jpg)
Reassign K1 = 3,6,8,9 mean = 6.5 K2 = 2 mean = 2 K3 = 12,15,18,22 mean = 16.75
Reassign K1 = 6,8,9 mean = 7.6 K2 = 2,3 mean = 2.5 K3 = 12,15,18,22 mean = 16.75
Reassign K1 = 6,8,9 mean = 7.6 K2 = 2,3 mean = 2.5 K3 = 12,15,18,22 mean = 16.75
STOP
![Page 22: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/22.jpg)
Example 2Given {2,4,10,12,3,20,30,11,25} Assume k=2.
Solution:K1 = 2,3,4,10,11,12K2 = 20, 25, 30
![Page 23: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/23.jpg)
Advantages•K-means is relatively scalable and efficient in processing large data sets•The computational complexity of the algorithm is O(nkt) n: the total number of objects k: the number of clusters t: the number of iterations Normally: k<<n and t<<nDisadvantage • Can be applied only when the mean of a cluster is defined• Users need to specify k• K-means is not suitable for discovering clusters with non convex shapes or clusters of very different size• It is sensitive to noise and outlier data points (can influence the mean value)
![Page 24: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/24.jpg)
K-Means (graph)Step1: Form k centroids, randomlyStep2: Calculate distance between centroids
and each objectUse Euclidean’s law do determine min distance:
d(A,B) = (x2-x1)2 + (y2-y1)2
Step3: Assign objects based on min distance to k clusters
Step4: Calculate centroid of each cluster using
C = (x1+x2+…xn , y1+y2+…yn)
n n
Go to step 2.Repeat until no change in centroids.
![Page 25: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/25.jpg)
Example 1There are four types of medicines and each
have two attributes, as shown below. Find a way to group them into 2 groups based on their features.
Medicine Weight pH
A 1 1
B 2 1
C 4 3
D 5 4
![Page 26: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/26.jpg)
SolutionPlot the values on a graph.
Mark any k centeroids
![Page 27: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/27.jpg)
Calculate Euclidean distance of each point from the centeroids.
D = 0 1 3.61 5
1 0 2.83 4.24
Based on minimum distance, we assign points to clusters:K1 = A
K2 = B, C, DCalculate new centeroidsC = 2+4+5 ,1+3+4 = (11/3 , 8/3)
3 3
![Page 28: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/28.jpg)
Marking the new centroids
Continue the iteration, until there is no change in the centroids or clusters.
![Page 29: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/29.jpg)
Final solution
![Page 30: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/30.jpg)
Example 2Use K-means algorithm to create two
clusters. Given:
![Page 31: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/31.jpg)
Example 3.Group the below points into 3 clusters
![Page 32: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/32.jpg)
![Page 33: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/33.jpg)
![Page 34: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/34.jpg)
![Page 35: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/35.jpg)
![Page 36: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/36.jpg)
![Page 37: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/37.jpg)
![Page 38: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/38.jpg)
![Page 39: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/39.jpg)
![Page 40: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/40.jpg)
![Page 41: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/41.jpg)
![Page 42: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/42.jpg)
![Page 43: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/43.jpg)
![Page 44: Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)](https://reader036.vdocuments.us/reader036/viewer/2022062520/5697c0101a28abf838ccb032/html5/thumbnails/44.jpg)