clustering by: avshalom katz. we will be talking about… what is clustering? different kinds of...

28
Clustering By: Avshalom Katz

Upload: walter-gallagher

Post on 16-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

Clustering

By: Avshalom Katz

Page 2: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

We will be talking about…

• What is Clustering?• Different Kinds of Clustering• What is DBSCAN?• Pseudocode• Example of Clustering• Definitions of parameters• Complexity

Page 3: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

What is Clustering?

• clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense.

Page 4: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

Different types of Clustering

• Biology• Information retrieval • Climate• Business • Clustering for utility• Summarization

Page 5: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

Example

Page 6: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

DIFFERENT KINDS OF CLUSTERS

Page 7: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

Well Separated

Page 8: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

Prototype based

Page 9: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

Graph based

Page 10: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

Density based

Page 11: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

Share property (conceptual clusters)

Page 12: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

DBSCAN-IntroductionDensity-Based Spatial Clustering of Applications with Noise

• Since society has started using databases, the amount of information that we are using is increasing exponentially. Due to that, automatic algorithms are entered to every subject.

Page 13: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

Database Example

Page 14: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

Density-Based Spatial Clustering of Applications with Noise

• 1. Minimum point in the density (MINEPS)

• 2. The distance of the point to check the density (EPS).

There are four main steps in the algorithm, and the algorithm gets two parameters:

Page 15: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

Definition 1

• To find all adjacent points. The so called “adjacent” points are called so only of the distance between them is smaller than EPS from what we refer to as P- “point”. All the adjacent points are later entered into Neps (P).

Page 16: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

Definition 2• Is to define the

core group by checking if the point p is in the core with point q by checking if p includes in Neps (q) and the size of the group Neps (p) is grater then MINPTS.

Page 17: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

Definition 3

• Density-reachable the point p is density reachable from point q if there is a sequence of points that the first is p and the last is q, then every couple in the sequence is a directly density reachable

Page 18: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

Definition 4

• Density connected point refers to a single point that can reach two different points, also in different direction. For example in the diagram below we can see that P and Q are density-reachable from O. Therefore, P and Q are are density connected.

Page 19: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

Definition 5

• Cluster C, wrt.erps and MINPTS are non-empty subset of the database, together these two terms below are created:

1. If P is a member of class C and q is density reachable from P and NEPS(P)> MINTPS then q is also a member of C.

2. If p and q are both members of C, then both p and q are density connected to eachother.

Page 20: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

Definition 6

• There are groups of clusters, each point that does not belong to any group is called “noise”.

Page 21: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

= noise

EB

FA

N

P

Q T

S

R

V

U

JC

H

G

I

DOL

KMε

DBSCAN ( Eps = ε , MinPts = 3 )number of adjacent : 5stack : B,C,D,E,Fcurrent ClusterId : green

number of adjacent : 8stack : C,D,E,F,G,H,I,current ClusterId : green

number of adjacent : 8stack : D,E,F,G,H,I,current ClusterId : green

number of adjacent : 9stack : F,G,H,I,Jcurrent ClusterId : green

number of adjacent : 7stack : E,F,G,H,Icurrent ClusterId : green

number of adjacent : 9stack : G,H,I,Jcurrent ClusterId : green

number of adjacent : 6stack : H,I,Jcurrent ClusterId : green

number of adjacent : 7stack : I,Jcurrent ClusterId : green

number of adjacent : 7stack : Jcurrent ClusterId : green

number of adjacent : 5stack : current ClusterId : green

number of adjacent : stack : current ClusterId : purple

number of adjacent : 0stack : current ClusterId : purple

X

number of adjacent : 3 stack : O,P,Qcurrent ClusterId : purple

number of adjacent : 2stack : P,Qcurrent ClusterId : purple

number of adjacent : 5stack : Q,R,S,Tcurrent ClusterId : purple

number of adjacent : 1stack : current ClusterId : purple

Page 22: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

Pseudocode of the algorithm DBSCAN (Eps, MinPts) // SetOfPoints is UNCLASSIFIEDClusterId := nextId(NOISE);FOR i FROM 1 TO SetOfPoints.size DOPoint := SetOfPoints.get(i);IF Point.ClId = UNCLASSIFIED THENIF ExpandCluster(SetOfPoints, Point,ClusterId, Eps, MinPts) THEN ClusterId := nextId(ClusterId)END IFEND IFEND FOREND; // DBSCAN

Page 23: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

ExpandCluster(SetOfPoints, Point, ClId, Eps,MinPts) : Boolean;seeds:=SetOfPoints.regionQuery(Point,Eps);IF seeds.size<MinPts THEN // no core pointSetOfPoint.changeClId(Point,NOISE);RETURN False;ELSE // all points in seeds are density- // reachable from PointSetOfPoints.changeClIds(seeds,ClId);seeds.delete(Point);WHILE seeds <> Empty DOcurrentP := seeds.first();result := SetOfPoints.regionQuery(currentP,Eps);IF result.size >= MinPts THENFOR i FROM 1 TO result.size DOresultP := result.get(i);IF resultP.ClId IN {UNCLASSIFIED, NOISE} THENIF resultP.ClId = UNCLASSIFIED THENseeds.append(resultP);

Page 24: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

• END IF;• SetOfPoints.changeClId(resultP,ClId);• END IF; // UNCLASSIFIED or NOISE• END FOR;• END IF; // result.size >= MinPts• seeds.delete(currentP);• END WHILE; // seeds <> Empty• RETURN True;• END IF• END; // ExpandCluster

Page 25: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

Example

Page 26: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

Define the value of parameter EPS bay MINPTS:

Page 27: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

The complexityThe complexity of ExpandCluster() is o(logN) in the worst case on a data base in size N and there is n iterations of this function ,so it is on * log (n) )

Page 28: Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering

Bibliography • Ankerst, M., Breunig, M. M., Kriegel, H.-P., and Sander, J. (1999). Optics:

ordering points to identify the clustering structure. SIGMOD Rec., 28(2):49-60

• Clustering. (2010, April 19). In Wikipedia, The Free Encyclopedia. Retrieved 14:14, April 19, 2010

from http://en.wikipedia.org/w/index.php?title=Clustering&oldid=357078594

• Ester, M., Kriegel, H.-p., Jörg, S., and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise.

• Ester, M ., Kriegel, H,. Jörg, S., and Xu, X (1995).A DatabaseIn terface forClustering in Large Spatial Databases, Proc. 1st Int. Conf. onKnowledge Discovery and Data Mining, Montreal, Canada, 1995, AAAI Press, 1995.

• Schikuta E., Erhart M.: “The bang-clustering system:Grid-based data

analysis”. Proc. Sec. Int. Symp. IDA-97,Vol. 1280 LNCS, London, UK, Springer-Verlag, 1997.