a genetic clustering algorithm for data with non-spherical-shape clusters

27
1 Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and T echnology A genetic clustering algorithm for data with non-spherical-shape clusters Advisor Dr. Hsu Graduat e Sheng-Hsuan Wang Authors Lin Yu Tseng Shiueng Bien Yang Department of Information Management Pattern Recognition 33 (2000) 1251-1259

Upload: zachariah-jayson

Post on 03-Jan-2016

31 views

Category:

Documents


1 download

DESCRIPTION

A genetic clustering algorithm for data with non-spherical-shape clusters. Outline. Motivation Objective Introduction The basic concept of genetic strategy The genetic clustering algorithm Experiments Concluding remarks and Summary Personal opinions Review. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A genetic clustering algorithm for data with non-spherical-shape clusters

1Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

A genetic clustering algorithm for data with non-spherical-shape clusters

Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Authors : Lin Yu Tseng

Shiueng Bien Yang

Department of Information Management

Pattern Recognition 33 (2000) 1251-1259

Page 2: A genetic clustering algorithm for data with non-spherical-shape clusters

2

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Outline

Motivation Objective Introduction The basic concept of genetic strategy The genetic clustering algorithm Experiments Concluding remarks and Summary Personal opinions Review

Page 3: A genetic clustering algorithm for data with non-spherical-shape clusters

3

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Motivation

Some problems of the clustering. The number of clusters? The threshold distance d in neighborhood clustering. Non-spherical-shape clusters.

Page 4: A genetic clustering algorithm for data with non-spherical-shape clusters

4

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Objective

To solve the problem of these traditional clustering algorithm.

A genetic clustering algorithm for clustering. Non-spherical-shape clusters. According to the similarities and automatically find the pr

oper k.

Page 5: A genetic clustering algorithm for data with non-spherical-shape clusters

5

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Introduction

These clustering methods can broadly be classified into two categories: Hierarchical

agglomerative divisive

Non-hierarchical k-means

Page 6: A genetic clustering algorithm for data with non-spherical-shape clusters

6

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Introduction

The problems in most of these clustering algorithms The number of clusters? Non-spherical shape cluster? The threshold of distance for merge?

GA clustering algorithm Searching, as same as clustering.

Page 7: A genetic clustering algorithm for data with non-spherical-shape clusters

7

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Basic concept of Classical Genetic Algorithm

Encoding schemas

Fitness evaluation

Testing the end of the algorithm

Parent selection

Crossover operators

Mutation operators

NO Halt

YES

Page 8: A genetic clustering algorithm for data with non-spherical-shape clusters

8

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

The genetic clustering algorithm

The algorithm CLUSTERING consists of two stages

First stage

Nearest Neighbor

C1, C2, …, Cm

n objects,

O1, O2, …, On

Second stage

GA clustering

merge

Page 9: A genetic clustering algorithm for data with non-spherical-shape clusters

9

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

First Stage

Step 1: find the nearest neighbor of each object Oi.

Step 2: dav, the average of the nearest neighbor distances.

The mean of u ?

Page 10: A genetic clustering algorithm for data with non-spherical-shape clusters

10

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

First Stage

Step 3: compute the adjacency matrix Anxn.

Step 4: connected components be denoted by

C1, C2, …, Cm.

nij

otherwise

dOOifjiA ji

1 where

,

||||,

0

1),(

Page 11: A genetic clustering algorithm for data with non-spherical-shape clusters

11

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Second Stage

The initialization step Population Coding Dinter and Dintra

The three phases of GA Reproduction phase Crossover phase Mutation phase

Encoding schemas

Fitness evaluation

Testing the end of the algorithm

Parent selection

Crossover operators

Mutation operators

NO Halt

YES

Page 12: A genetic clustering algorithm for data with non-spherical-shape clusters

12

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Second Stage

Distance matrix Dmxm of each pair of cluster Ci and Cj.

Page 13: A genetic clustering algorithm for data with non-spherical-shape clusters

13

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Second Stage

The initialization step Population: 50 strings. The length of each string is m:

{C1, C2, …, Cm}

For each string Ri, two sets Ui and U’i are defined

1 1 1 0 0

R1

1 0 1 1 0

R2

m

U1={C1, C2, C3} ; U’1={C4, C5}

U2={C1, C3, C4} ; U’2={C2, C5}

Page 14: A genetic clustering algorithm for data with non-spherical-shape clusters

14

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Second Stage

Intra-distance Dintra and the inter-distance Dinter

U1={C1, C2, C3} ; U’1={C4, C5, C7}

Page 15: A genetic clustering algorithm for data with non-spherical-shape clusters

15

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Second Stage

Reproduction phase Fitness function

SCORE(Ri) = Dinter(Ri)*w – Dintra(Ri), w within [1,3]. Reproducted probability

Crossover phase pc = 0.8.

Mutation phase pm = 0.1.

R1 1 1 1 0 0R2 1 0 1 1 0

N

iii RSCORERSCORE

1

)(/)(

Page 16: A genetic clustering algorithm for data with non-spherical-shape clusters

16

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Merge_Sets_Finding Algorithm

Step 1: Sort the fitness of the strings.

Step 2: Choose Ri.

Step 3: Choose smallest l > i such that .IF no such l exists THEN go to Step 4(discarded)

ELSE i = l and go to Step 2(merge)

Step 4: End.

)(...)()( 21 NRSCORERSCORERSCORE

R1={C1, C2, C3}

R2={C3, C4, C6}

R3={C4, C5}iUUU

Ui ;1

UU l

Page 17: A genetic clustering algorithm for data with non-spherical-shape clusters

17

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Experiments - 1

Noise : distance > 2dav

Original

Page 18: A genetic clustering algorithm for data with non-spherical-shape clusters

18

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Experiments - 1

u=1.2, 8 clusters

7 clusters

Page 19: A genetic clustering algorithm for data with non-spherical-shape clusters

19

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Experiments - 1

6 clusters u=1.5 or 2, 5 clusters

Page 20: A genetic clustering algorithm for data with non-spherical-shape clusters

20

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Experiments - 1

u=1.2, w=2,

4 clusters (best)

3 clusters

Page 21: A genetic clustering algorithm for data with non-spherical-shape clusters

21

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Experiments - 1

2 clusters 4 clusters (direct GA)

Page 22: A genetic clustering algorithm for data with non-spherical-shape clusters

22

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Experiments - 1

4 clusters (k-mean)

Page 23: A genetic clustering algorithm for data with non-spherical-shape clusters

23

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Experiments - 2

Original

4 clusters

3 clusters

2 clusters

Page 24: A genetic clustering algorithm for data with non-spherical-shape clusters

24

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Experiments - 3

Original

4 clusters

Page 25: A genetic clustering algorithm for data with non-spherical-shape clusters

25

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Concluding and Summary

A genetic clustering algorithm CLUSTERING Non-spherical shape. Automatic clustering. Binary searching the proper interval for w.

Page 26: A genetic clustering algorithm for data with non-spherical-shape clusters

26

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Personal Opinions The proper number of cluster decide by the value of w.

Page 27: A genetic clustering algorithm for data with non-spherical-shape clusters

27

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Review

Using GCA to automatic clustering. Split : NN. Merge : Merge_Sets_Finding Algorithm.