density-based clustering · main clustering approaches partitioning method →constructs partitions...

Post on 24-Jun-2020

9 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Density-Based ClusteringIzabela Moise, Evangelos Pournaras, Dirk Helbing

Izabela Moise, Evangelos Pournaras, Dirk Helbing 1

Reminder

Unsupervised data miningX Clustering→ k -Means

Izabela Moise, Evangelos Pournaras, Dirk Helbing 2

Main Clustering Approaches

• Partitioning method→ constructs partitions of data points→ evaluates the partitions by some criterion→ k -means, k -medoids

• Density-based method:→ based on connectivity and density functions→ DBSCAN, DJCluster

Izabela Moise, Evangelos Pournaras, Dirk Helbing 3

Density-Based Clustering

Izabela Moise, Evangelos Pournaras, Dirk Helbing 4

Density-Based Clustering

Density-Based Clustering

locates regions of high density that are separated from one anotherby regions of low density.

Izabela Moise, Evangelos Pournaras, Dirk Helbing 4

Main principles

• Two parameters:1. maximum radius of the neighbourhood→ Eps2. minimum number of points in an Eps neighbourhood of a point→ MinPts

• NEps(p) : {q ∈ D s.t . dist(p, q) ≤ Eps}• Key idea: the density of the neighbourhood has to exceed

some threshold.

• The shape of a neighbourhood depends on the dist function

Izabela Moise, Evangelos Pournaras, Dirk Helbing 5

Main principles

• Two parameters:1. maximum radius of the neighbourhood→ Eps2. minimum number of points in an Eps neighbourhood of a point→ MinPts

• NEps(p) : {q ∈ D s.t . dist(p, q) ≤ Eps}• Key idea: the density of the neighbourhood has to exceed

some threshold.

• The shape of a neighbourhood depends on the dist function

Izabela Moise, Evangelos Pournaras, Dirk Helbing 5

Main principles

• Two parameters:1. maximum radius of the neighbourhood→ Eps2. minimum number of points in an Eps neighbourhood of a point→ MinPts

• NEps(p) : {q ∈ D s.t . dist(p, q) ≤ Eps}• Key idea: the density of the neighbourhood has to exceed

some threshold.

• The shape of a neighbourhood depends on the dist function

Izabela Moise, Evangelos Pournaras, Dirk Helbing 5

Core, Border and Noise/Outlier

1

1Jing Gao, SUNY BuffaloIzabela Moise, Evangelos Pournaras, Dirk Helbing 6

Directly Density-Reachable

Directly density-reachable:→ A point p is directly density-reachable from a point q wrt. Eps,MinPts if:

1. p ∈ NEps(q) and

2. |NEps(q)| ≥ MinPts

Izabela Moise, Evangelos Pournaras, Dirk Helbing 7

Directly Density-Reachable

Directly density-reachable:→ A point p is directly density-reachable from a point q wrt. Eps,MinPts if:

1. p ∈ NEps(q) and

2. |NEps(q)| ≥ MinPts

Izabela Moise, Evangelos Pournaras, Dirk Helbing 7

Density-Reachable

• Density-reachable:→ A point p is density-reachable from a point q wrt. Eps,MinPts if there is a chain of points p1, ..., pn, withp1 = q, pn = p, s.t .pi+1 is directly density reachable from pi

• transitive but not symmetric

Izabela Moise, Evangelos Pournaras, Dirk Helbing 8

Density-Connected

Density-connected:→ A point p is density-connected from a point q wrt. Eps, MinPts ifthere is a point o s.t. p and q are density-reachable from o wrt. Epsand MinPts

Izabela Moise, Evangelos Pournaras, Dirk Helbing 9

Density-Connected

Density-connected:→ A point p is density-connected from a point q wrt. Eps, MinPts ifthere is a point o s.t. p and q are density-reachable from o wrt. Epsand MinPts→ symmetric

Izabela Moise, Evangelos Pournaras, Dirk Helbing 9

Density-Connected

Density-connected:→ A point p is density-connected from a point q wrt. Eps, MinPts ifthere is a point o s.t. p and q are density-reachable from o wrt. Epsand MinPts→ symmetric

Izabela Moise, Evangelos Pournaras, Dirk Helbing 9

DBSCAN - Density-Based Spatial Clustering of Applicationswith Noise

Izabela Moise, Evangelos Pournaras, Dirk Helbing 10

Main Principles

One of the most cited clustering algorithms

Main principle:

a cluster is defined as a maximal set of density-connected points.

• Discovers clusters of arbitrary shapes (spherical, elongated,linear), and noise

• Works with spatial datasets:→ geomarketing, tomography, satellite images

• Requires only two parameters (no prior knowledge of thenumber of clusters)

Izabela Moise, Evangelos Pournaras, Dirk Helbing 11

Definition: Cluster

2

2Erik Kropat, University of the Bundeswehr Munich

Izabela Moise, Evangelos Pournaras, Dirk Helbing 12

Definition: Noise

3

3Erik Kropat, University of the Bundeswehr Munich

Izabela Moise, Evangelos Pournaras, Dirk Helbing 13

The Algorithm

1. Randomly select a point p

2. Retrieve all points density-reachable from p wrt. Eps andMinPts

3. If p is a core point, a cluster is formed

4. If p is a border point, no points are density-reachable from p→visit the next data point

5. Continue the process until all points have been processed

Izabela Moise, Evangelos Pournaras, Dirk Helbing 14

Selecting Eps and MinPts

The two parameters can be determined by a heuristic

Observation:• For points in a cluster their k -th nearest neighbours are at

roughly the same distance.

• Noise points have the k -th nearest neighbour at fartherdistance.

Izabela Moise, Evangelos Pournaras, Dirk Helbing 15

4

4Erik Kropat, University of the Bundeswehr Munich

Izabela Moise, Evangelos Pournaras, Dirk Helbing 16

5

5Erik Kropat, University of the Bundeswehr Munich

Izabela Moise, Evangelos Pournaras, Dirk Helbing 17

6

6Erik Kropat, University of the Bundeswehr Munich

Izabela Moise, Evangelos Pournaras, Dirk Helbing 18

Pros and Cons

Pros:

X discovers clusters of arbitrary shapes

X handles noise

X needs density parameters as termination condition

Izabela Moise, Evangelos Pournaras, Dirk Helbing 19

Pros and Cons

Cons:

X cannot handle varying densities

X sensitive to parameters→ hard to determine the correct set ofparameters

X sampling affects density measures

Izabela Moise, Evangelos Pournaras, Dirk Helbing 20

top related