a fiction work on fuzzy based cancer gene …€¦ · methods and evaluated them based on the...
TRANSCRIPT
![Page 1: A FICTION WORK ON FUZZY BASED CANCER GENE …€¦ · methods and evaluated them based on the computation time, classification accuracy and ability to reveal biologically meaningful](https://reader034.vdocuments.us/reader034/viewer/2022042412/5f2c13206f0f8e10f45ee24e/html5/thumbnails/1.jpg)
A FICTION WORK ON FUZZY BASED CANCER GENE
IDENTIFICATION THROUGH CLUSTERING
V. Sridevi 1, P.Vidhya 2
1Assistant Professor, Department of Computer Applications
Dr. N.G.P.Arts and Science College, Coimbatore-48 [email protected]
2Research Scholar, Department of Computer Science,
Dr. N.G.P. Arts and Science College, Coimbatore-48. 2 [email protected]
ABSTRACT:
Cancer disease prediction is one of
the rising research areas. Many algorithms
such as K-Means, Kernel K-Means and
Fuzzy C-Means are used to find the cancer
affected genes in the sample dataset
DLBCL, MLL, SRBTCM, and EWS. The
above specified algorithms are not finding
the cancer genes with more accuracy. So,
the Modified Fuzzy C-Means algorithm is
proposed to grasp the cancer genes. This
paper proposed the cancer identification
methods and evaluated them based on the
computation time, classification accuracy
and ability to reveal biologically meaningful
gene information. This paper highlights the
cancer gene identification using the
Modified Fuzzy C-Means algorithm that is
used efficiently in this work and examined
with better results.
INTRODUCTION:
Cancer gene expression data is used
to contain the keys for discovering the main
problems relating to cancer diagnosis. The
recent advent of DNA microarray technique
is more powerful and that is used to make
continuous monitoring of thousands of gene
expressions. With this large quantity of gene
expression data, researchers started to search
the possibilities of cancer detection using
gene expression data. There are several
methods have been proposed in recent years
with good results. But there are still many
issues which need to be clearly explained.
In this paper, Modified Fuzzy C-
Means algorithm is used to find the cancer
affected genes in the sample dataset. While
comparing the algorithms Modified Fuzzy
C-Means, Kernel Based Clustering, the
Modified Fuzzy C-Means is better. The
Modified Fuzzy C-Means attains the merits
of time concern and correct gene
identification algorithm. The performance of
these algorithms are also evaluated using the
result of the specified algorithms.
PROBLEM DEFINITION
DNA microarray technologies are
most popularly used in gene expression data
which simultaneously monitor the
V Sridevi et al , International Journal of Computer Science & Communication Networks,Vol 4(6),180-186
180
ISSN:2249-5789
![Page 2: A FICTION WORK ON FUZZY BASED CANCER GENE …€¦ · methods and evaluated them based on the computation time, classification accuracy and ability to reveal biologically meaningful](https://reader034.vdocuments.us/reader034/viewer/2022042412/5f2c13206f0f8e10f45ee24e/html5/thumbnails/2.jpg)
expression pattern of thousands of genes.
Microarray cancer data organized as
samples versus genes fashion are being
exploited for the classification of tissue
samples into benign or their subtypes. This
is also essential for identifying possible gene
markers for each cancer subtype, which
helps to diagnosis the particular cancer type.
Two major problems related the
unsupervised analysis of gene expression
data are represented by the accuracy and
reliability of the discovered clusters, and by
the biological fact that the boundaries
between classes of patients or classes of
functionally related genes are sometimes not
clearly defined. The primary goal of this
work consists the process involved in the
exploration of new strategies and in the
development of new clustering methods
which is used to improve the accuracy and
strength of clustering results. A cancer
identification system is developed for
identifying possible gene markers and
applying the modified kernel based fuzzy
clustering algorithm to identify the disease.
CONTRIBUTIONS
The main contributions of the thesis
are as follows. Investigation of new
strategies and development of new
clustering methods are used to improve the
accuracy and robustness of clustering
results. The proposed algorithm is
functioning proficiently when comparing the
algorithms like Fuzzy C-Means, Kernel
Based Fuzzy Clustering and Modified
Kernel Based Fuzzy Clustering Algorithm.
The time concern and accuracy are also
evaluated. Hence, the performance criteria
are also comparatively high.
FUZZY LOGIC
Fuzzy logic is one of the emerging
technique that is widely used. The modeling
of inaccurate and qualitative knowledge, as
well as handling of uncertainty at various
stages is possible through the use of fuzzy
sets. Fuzzy logic is logic of fuzzy sets; a
Fuzzy set has an endless range of truth
values between one and zero. Propositions
in fuzzy logic have a degree of truth, and
membership in fuzzy sets can be fully
inclusive, fully exclusive, or some degree in
between.
FUZZY CLUSTERING
Addition of fuzzy logic with data
mining techniques has become one of the
key constituents of soft computing in
handling the challenges posed by huge
collections of natural data. The essential
idea in fuzzy clustering is the non-unique
partitioning of the data in a collection of
clusters. The data points are assigned the
membership values for each of the clusters.
The fuzzy clustering algorithm is used to
allow the clusters to grow into their natural
shapes.
FUZZY C-MEANS
Fuzzy clustering is a class of
algorithms for cluster analysis in which the
allocation of data points to clusters is not
"hard" (all-or-nothing) but "fuzzy" in the
same sense as fuzzy logic. Clustering is a
mathematical tool that attempts to discover
V Sridevi et al , International Journal of Computer Science & Communication Networks,Vol 4(6),180-186
181
ISSN:2249-5789
![Page 3: A FICTION WORK ON FUZZY BASED CANCER GENE …€¦ · methods and evaluated them based on the computation time, classification accuracy and ability to reveal biologically meaningful](https://reader034.vdocuments.us/reader034/viewer/2022042412/5f2c13206f0f8e10f45ee24e/html5/thumbnails/3.jpg)
structures or certain patterns in a data set,
where the objects inside each cluster show a
certain degree of similarity. Fuzzy clustering
allows each feature vector to belong to more
than one cluster with different membership
degrees (between 0 and 1) and unclear or
fuzzy boundaries between clusters.
Algorithm:
The FCM algorithm is implemented using
the following steps: Let X = {x1, x2, x3 ..., xn} be
the set of data points and V = {v1, v2, v3 ..., vc} be
the set of centers.
1) Randomly select ‘c’ cluster centers.
2) Calculate the fuzzy membership 'µij'
using equation [4.4.2]
3) Compute the fuzzy centers 'vj' using
equation [4.4.3]
4) Repeat step 2) and 3) until the minimum
'J' value is achieved or ||U(k+1) - U(k)|| < β.
where, ‗k‘ is the iteration step. ‗β‘ is the
termination criterion between [0, 1]. ‗U =
(µij)n*c‘ is the fuzzy membership
matrix. ‗J‘ is the objective function, ‗m‘ is a
fuzzifier; m>1
Conditions given in equations above
require that the total membership of one data
feature vector is normalized to 1 and that
element cannot belong to more clusters then
it exists. The fuzzifier, m, should be more
then 1, since when m = 1, the optimization
problem becomes a crisp case. In the
literature, m = 2 is the mostly used value for
the fuzzifier.
Fig.3.3 Data Feature Vectors Fig.3.4 FCM Clustering
V Sridevi et al , International Journal of Computer Science & Communication Networks,Vol 4(6),180-186
182
ISSN:2249-5789
![Page 4: A FICTION WORK ON FUZZY BASED CANCER GENE …€¦ · methods and evaluated them based on the computation time, classification accuracy and ability to reveal biologically meaningful](https://reader034.vdocuments.us/reader034/viewer/2022042412/5f2c13206f0f8e10f45ee24e/html5/thumbnails/4.jpg)
Fig.3.5 Membership Functions obtained by FCM
Thus the FCM algorithm yields the clustered image based on the number of clusters used.
The abnormality region of the input image is grouped into one particular cluster which can be
easily extracted.
EXPERIMENTAL RESULTS:
The following results are happened with modified fuzzy c-means algorithm.
The MLL dataset is used with 30 training sets and number of cluster is 8. The resultant
cluster view is shown in the Fig.5.9. The attained the value of PC is 0.6428 and CE is 0.8100.
V Sridevi et al , International Journal of Computer Science & Communication Networks,Vol 4(6),180-186
183
ISSN:2249-5789
![Page 5: A FICTION WORK ON FUZZY BASED CANCER GENE …€¦ · methods and evaluated them based on the computation time, classification accuracy and ability to reveal biologically meaningful](https://reader034.vdocuments.us/reader034/viewer/2022042412/5f2c13206f0f8e10f45ee24e/html5/thumbnails/5.jpg)
Using modified fuzzy c-means algorithm, dataset taken for evaluation is DLBCL, the
training set is 20 and number of clusters is 3 then the time taken for evaluation 0.15686. The
time taken for evaluation in k-means is 0.633 times higher than fuzzy c-means.
DATASET GENE
SELECTION
RANGE
NO.OF
CLUSTERS
TIME
(Sec)
PC CE
DLBCL
20 3 0.16422 0.7357 0.4661
5 0.15686 0.6583 0.6799
25 3 0.1531 0.7581 0.4503
30 3 0.16858 0.7342 0.4904
MLL
20
3 0.16116 0.7330 0.4915
5 0.15759 0.6888 0.6185
25
5 0.18612 0.6785 0.6473
7 0.19544 0.6612 0.7419
30 3 0.17595 0.7248 0.4979
SRBTCM
20
3 0.17268 0.6837 0.5655
5 0.16916 0.6484 0.7222
4 0.17164 0.6688 0.6526
V Sridevi et al , International Journal of Computer Science & Communication Networks,Vol 4(6),180-186
184
ISSN:2249-5789
![Page 6: A FICTION WORK ON FUZZY BASED CANCER GENE …€¦ · methods and evaluated them based on the computation time, classification accuracy and ability to reveal biologically meaningful](https://reader034.vdocuments.us/reader034/viewer/2022042412/5f2c13206f0f8e10f45ee24e/html5/thumbnails/6.jpg)
25 6 0.16094 0.6583 0.7411
30
3 0.16198 0.6346 0.6362
5 0.1829 0.5512 0.896
EWS
30 6 0.17642 0.6056 0.8417
8 0.16982 0.6428 0.8100
40 7 0.1712 0.5757 0.9131
9 0.16805 0.5966 0.9286
So, fuzzy c-means is better algorithm to identify cancer gene when compared with k-
means algorithm. Modified fuzzy c-means algorithm provides best result than fuzzy c-means
algorithm.
Overall Performances
ALGORITHM
PARAMETERS
TIME ACCURACY PERFORMANCE
k-means 2.889368 7.69 91.52533
Kernel Based Fuzzy Clustering 4.43915 8.58 94.186
Fuzzy C-Means 2.205545 6.65 93.36081
Modified Fuzzy C-Means 0.159944 9.59 95.95383
V Sridevi et al , International Journal of Computer Science & Communication Networks,Vol 4(6),180-186
185
ISSN:2249-5789
![Page 7: A FICTION WORK ON FUZZY BASED CANCER GENE …€¦ · methods and evaluated them based on the computation time, classification accuracy and ability to reveal biologically meaningful](https://reader034.vdocuments.us/reader034/viewer/2022042412/5f2c13206f0f8e10f45ee24e/html5/thumbnails/7.jpg)
CONCLUSION
In this paper, various algorithms are
used to find the cancer affected genes in the
sample dataset. That are K-Means, Kernel
based fuzzy clustering, fuzzy c-means and
modified fuzzy c-means. The sample dataset
which is used for this research work are
DLBCL, MLL, SRBTCM, and EWS. The
specified algorithms are not well functioned,
in a cancer genes. So, the modified fuzzy c-
means algorithm is proposed to grasp the
cancer genes.
REFERENCES
[1] Wang,Y.,Tetko, I. -V., Hall, M. -A.,
Frank, E., Facius, A., Mayer, K. -F.,
And Mewes H. -W., "Gene Selection
From Microarray Data For Cancer
Classification —A Machine Learning
Approach", Comput Biol Chem, 29 (1):
37-46, 2005.
[2] F. Chu And L. Wang, "Applications Of
Support Vector Machines To Cancer
Classification With Microarray Data",
International Journal Of Neural
Systems, Vol. 15, No. 6, 475–484,2005.
[3] Huilin Xiong And Xue-Wen
Chen,"Optimized Kernel Machines For
Cancer Classification Using Gene
Expression Data", Proceedings Of The
2005 IEEE Symposium On
Computational Intelligence In
Bioinformatics And Computational
Biology, Pp.1-7, 2005.
[4] M. Dash andH. Liu, ―Consistency based
search in feature selection,‖ Artificial
Intelligence, vol. 151, pp. 155–176,
2003.
[5] U. Maulik, S. Bandyopadhyay,
andA.Mukhopadhyay, Multiobjective
Genetic Algorithms for Clustering:
Applications in Data Mining and
Bioinformatics.New York: Springer-
Verlag, 2011.
[6] Q. H. Hu,D. R.Yu, and Z. X.Xie,
―Information-preserving hybrid data
reduction based on fuzzy-rough
techniques,‖ Pattern Recognition Lett.,
vol. 27, pp. 414–423, 2006.
[7] A. J. Gentles, S. K. Plevritis, R. Majeti,
and A. A. Alizadeh, ―Association of a
leukemic stem cell gene expression
signature with clinical outcomes in
acute myeloid leukemia,‖ J. Amer.
Med. Assoc., vol. 304, pp. 2706–
2715,2010.
V Sridevi et al , International Journal of Computer Science & Communication Networks,Vol 4(6),180-186
186
ISSN:2249-5789