Transcript
Page 1: A FICTION WORK ON FUZZY BASED CANCER GENE …€¦ · methods and evaluated them based on the computation time, classification accuracy and ability to reveal biologically meaningful

A FICTION WORK ON FUZZY BASED CANCER GENE

IDENTIFICATION THROUGH CLUSTERING

V. Sridevi 1, P.Vidhya 2

1Assistant Professor, Department of Computer Applications

Dr. N.G.P.Arts and Science College, Coimbatore-48 [email protected]

2Research Scholar, Department of Computer Science,

Dr. N.G.P. Arts and Science College, Coimbatore-48. 2 [email protected]

ABSTRACT:

Cancer disease prediction is one of

the rising research areas. Many algorithms

such as K-Means, Kernel K-Means and

Fuzzy C-Means are used to find the cancer

affected genes in the sample dataset

DLBCL, MLL, SRBTCM, and EWS. The

above specified algorithms are not finding

the cancer genes with more accuracy. So,

the Modified Fuzzy C-Means algorithm is

proposed to grasp the cancer genes. This

paper proposed the cancer identification

methods and evaluated them based on the

computation time, classification accuracy

and ability to reveal biologically meaningful

gene information. This paper highlights the

cancer gene identification using the

Modified Fuzzy C-Means algorithm that is

used efficiently in this work and examined

with better results.

INTRODUCTION:

Cancer gene expression data is used

to contain the keys for discovering the main

problems relating to cancer diagnosis. The

recent advent of DNA microarray technique

is more powerful and that is used to make

continuous monitoring of thousands of gene

expressions. With this large quantity of gene

expression data, researchers started to search

the possibilities of cancer detection using

gene expression data. There are several

methods have been proposed in recent years

with good results. But there are still many

issues which need to be clearly explained.

In this paper, Modified Fuzzy C-

Means algorithm is used to find the cancer

affected genes in the sample dataset. While

comparing the algorithms Modified Fuzzy

C-Means, Kernel Based Clustering, the

Modified Fuzzy C-Means is better. The

Modified Fuzzy C-Means attains the merits

of time concern and correct gene

identification algorithm. The performance of

these algorithms are also evaluated using the

result of the specified algorithms.

PROBLEM DEFINITION

DNA microarray technologies are

most popularly used in gene expression data

which simultaneously monitor the

V Sridevi et al , International Journal of Computer Science & Communication Networks,Vol 4(6),180-186

180

ISSN:2249-5789

Page 2: A FICTION WORK ON FUZZY BASED CANCER GENE …€¦ · methods and evaluated them based on the computation time, classification accuracy and ability to reveal biologically meaningful

expression pattern of thousands of genes.

Microarray cancer data organized as

samples versus genes fashion are being

exploited for the classification of tissue

samples into benign or their subtypes. This

is also essential for identifying possible gene

markers for each cancer subtype, which

helps to diagnosis the particular cancer type.

Two major problems related the

unsupervised analysis of gene expression

data are represented by the accuracy and

reliability of the discovered clusters, and by

the biological fact that the boundaries

between classes of patients or classes of

functionally related genes are sometimes not

clearly defined. The primary goal of this

work consists the process involved in the

exploration of new strategies and in the

development of new clustering methods

which is used to improve the accuracy and

strength of clustering results. A cancer

identification system is developed for

identifying possible gene markers and

applying the modified kernel based fuzzy

clustering algorithm to identify the disease.

CONTRIBUTIONS

The main contributions of the thesis

are as follows. Investigation of new

strategies and development of new

clustering methods are used to improve the

accuracy and robustness of clustering

results. The proposed algorithm is

functioning proficiently when comparing the

algorithms like Fuzzy C-Means, Kernel

Based Fuzzy Clustering and Modified

Kernel Based Fuzzy Clustering Algorithm.

The time concern and accuracy are also

evaluated. Hence, the performance criteria

are also comparatively high.

FUZZY LOGIC

Fuzzy logic is one of the emerging

technique that is widely used. The modeling

of inaccurate and qualitative knowledge, as

well as handling of uncertainty at various

stages is possible through the use of fuzzy

sets. Fuzzy logic is logic of fuzzy sets; a

Fuzzy set has an endless range of truth

values between one and zero. Propositions

in fuzzy logic have a degree of truth, and

membership in fuzzy sets can be fully

inclusive, fully exclusive, or some degree in

between.

FUZZY CLUSTERING

Addition of fuzzy logic with data

mining techniques has become one of the

key constituents of soft computing in

handling the challenges posed by huge

collections of natural data. The essential

idea in fuzzy clustering is the non-unique

partitioning of the data in a collection of

clusters. The data points are assigned the

membership values for each of the clusters.

The fuzzy clustering algorithm is used to

allow the clusters to grow into their natural

shapes.

FUZZY C-MEANS

Fuzzy clustering is a class of

algorithms for cluster analysis in which the

allocation of data points to clusters is not

"hard" (all-or-nothing) but "fuzzy" in the

same sense as fuzzy logic. Clustering is a

mathematical tool that attempts to discover

V Sridevi et al , International Journal of Computer Science & Communication Networks,Vol 4(6),180-186

181

ISSN:2249-5789

Page 3: A FICTION WORK ON FUZZY BASED CANCER GENE …€¦ · methods and evaluated them based on the computation time, classification accuracy and ability to reveal biologically meaningful

structures or certain patterns in a data set,

where the objects inside each cluster show a

certain degree of similarity. Fuzzy clustering

allows each feature vector to belong to more

than one cluster with different membership

degrees (between 0 and 1) and unclear or

fuzzy boundaries between clusters.

Algorithm:

The FCM algorithm is implemented using

the following steps: Let X = {x1, x2, x3 ..., xn} be

the set of data points and V = {v1, v2, v3 ..., vc} be

the set of centers.

1) Randomly select ‘c’ cluster centers.

2) Calculate the fuzzy membership 'µij'

using equation [4.4.2]

3) Compute the fuzzy centers 'vj' using

equation [4.4.3]

4) Repeat step 2) and 3) until the minimum

'J' value is achieved or ||U(k+1) - U(k)|| < β.

where, ‗k‘ is the iteration step. ‗β‘ is the

termination criterion between [0, 1]. ‗U =

(µij)n*c‘ is the fuzzy membership

matrix. ‗J‘ is the objective function, ‗m‘ is a

fuzzifier; m>1

Conditions given in equations above

require that the total membership of one data

feature vector is normalized to 1 and that

element cannot belong to more clusters then

it exists. The fuzzifier, m, should be more

then 1, since when m = 1, the optimization

problem becomes a crisp case. In the

literature, m = 2 is the mostly used value for

the fuzzifier.

Fig.3.3 Data Feature Vectors Fig.3.4 FCM Clustering

V Sridevi et al , International Journal of Computer Science & Communication Networks,Vol 4(6),180-186

182

ISSN:2249-5789

Page 4: A FICTION WORK ON FUZZY BASED CANCER GENE …€¦ · methods and evaluated them based on the computation time, classification accuracy and ability to reveal biologically meaningful

Fig.3.5 Membership Functions obtained by FCM

Thus the FCM algorithm yields the clustered image based on the number of clusters used.

The abnormality region of the input image is grouped into one particular cluster which can be

easily extracted.

EXPERIMENTAL RESULTS:

The following results are happened with modified fuzzy c-means algorithm.

The MLL dataset is used with 30 training sets and number of cluster is 8. The resultant

cluster view is shown in the Fig.5.9. The attained the value of PC is 0.6428 and CE is 0.8100.

V Sridevi et al , International Journal of Computer Science & Communication Networks,Vol 4(6),180-186

183

ISSN:2249-5789

Page 5: A FICTION WORK ON FUZZY BASED CANCER GENE …€¦ · methods and evaluated them based on the computation time, classification accuracy and ability to reveal biologically meaningful

Using modified fuzzy c-means algorithm, dataset taken for evaluation is DLBCL, the

training set is 20 and number of clusters is 3 then the time taken for evaluation 0.15686. The

time taken for evaluation in k-means is 0.633 times higher than fuzzy c-means.

DATASET GENE

SELECTION

RANGE

NO.OF

CLUSTERS

TIME

(Sec)

PC CE

DLBCL

20 3 0.16422 0.7357 0.4661

5 0.15686 0.6583 0.6799

25 3 0.1531 0.7581 0.4503

30 3 0.16858 0.7342 0.4904

MLL

20

3 0.16116 0.7330 0.4915

5 0.15759 0.6888 0.6185

25

5 0.18612 0.6785 0.6473

7 0.19544 0.6612 0.7419

30 3 0.17595 0.7248 0.4979

SRBTCM

20

3 0.17268 0.6837 0.5655

5 0.16916 0.6484 0.7222

4 0.17164 0.6688 0.6526

V Sridevi et al , International Journal of Computer Science & Communication Networks,Vol 4(6),180-186

184

ISSN:2249-5789

Page 6: A FICTION WORK ON FUZZY BASED CANCER GENE …€¦ · methods and evaluated them based on the computation time, classification accuracy and ability to reveal biologically meaningful

25 6 0.16094 0.6583 0.7411

30

3 0.16198 0.6346 0.6362

5 0.1829 0.5512 0.896

EWS

30 6 0.17642 0.6056 0.8417

8 0.16982 0.6428 0.8100

40 7 0.1712 0.5757 0.9131

9 0.16805 0.5966 0.9286

So, fuzzy c-means is better algorithm to identify cancer gene when compared with k-

means algorithm. Modified fuzzy c-means algorithm provides best result than fuzzy c-means

algorithm.

Overall Performances

ALGORITHM

PARAMETERS

TIME ACCURACY PERFORMANCE

k-means 2.889368 7.69 91.52533

Kernel Based Fuzzy Clustering 4.43915 8.58 94.186

Fuzzy C-Means 2.205545 6.65 93.36081

Modified Fuzzy C-Means 0.159944 9.59 95.95383

V Sridevi et al , International Journal of Computer Science & Communication Networks,Vol 4(6),180-186

185

ISSN:2249-5789

Page 7: A FICTION WORK ON FUZZY BASED CANCER GENE …€¦ · methods and evaluated them based on the computation time, classification accuracy and ability to reveal biologically meaningful

CONCLUSION

In this paper, various algorithms are

used to find the cancer affected genes in the

sample dataset. That are K-Means, Kernel

based fuzzy clustering, fuzzy c-means and

modified fuzzy c-means. The sample dataset

which is used for this research work are

DLBCL, MLL, SRBTCM, and EWS. The

specified algorithms are not well functioned,

in a cancer genes. So, the modified fuzzy c-

means algorithm is proposed to grasp the

cancer genes.

REFERENCES

[1] Wang,Y.,Tetko, I. -V., Hall, M. -A.,

Frank, E., Facius, A., Mayer, K. -F.,

And Mewes H. -W., "Gene Selection

From Microarray Data For Cancer

Classification —A Machine Learning

Approach", Comput Biol Chem, 29 (1):

37-46, 2005.

[2] F. Chu And L. Wang, "Applications Of

Support Vector Machines To Cancer

Classification With Microarray Data",

International Journal Of Neural

Systems, Vol. 15, No. 6, 475–484,2005.

[3] Huilin Xiong And Xue-Wen

Chen,"Optimized Kernel Machines For

Cancer Classification Using Gene

Expression Data", Proceedings Of The

2005 IEEE Symposium On

Computational Intelligence In

Bioinformatics And Computational

Biology, Pp.1-7, 2005.

[4] M. Dash andH. Liu, ―Consistency based

search in feature selection,‖ Artificial

Intelligence, vol. 151, pp. 155–176,

2003.

[5] U. Maulik, S. Bandyopadhyay,

andA.Mukhopadhyay, Multiobjective

Genetic Algorithms for Clustering:

Applications in Data Mining and

Bioinformatics.New York: Springer-

Verlag, 2011.

[6] Q. H. Hu,D. R.Yu, and Z. X.Xie,

―Information-preserving hybrid data

reduction based on fuzzy-rough

techniques,‖ Pattern Recognition Lett.,

vol. 27, pp. 414–423, 2006.

[7] A. J. Gentles, S. K. Plevritis, R. Majeti,

and A. A. Alizadeh, ―Association of a

leukemic stem cell gene expression

signature with clinical outcomes in

acute myeloid leukemia,‖ J. Amer.

Med. Assoc., vol. 304, pp. 2706–

2715,2010.

V Sridevi et al , International Journal of Computer Science & Communication Networks,Vol 4(6),180-186

186

ISSN:2249-5789


Top Related