Transcript
Page 1: Subgraph relative frequency approach for extracting interesting substructur

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME

400

SUBGRAPH RELATIVE FREQUENCY APPROACH FOR EXTRACTING

INTERESTING SUBSTRUCTURES FROM MOLECULAR DATA

Mr. M.A.Srinuvasu1, Dr. P. Padmaja

2, Mr. Y. Dharmateja

3

Department of CSE & IT, GITAM University, Visakhapatnam-530045, INDIA

ABSTRACT

The classification of unseen molecule in molecular data is done by taking the substructures of

the molecule. The mining of interesting substructures in molecular data for classification contain

subgraphs that are characterized by different classes. In this paper, authors suggest a Subgraph

Relative Frequency (SRF) method that screens each frequency subgraph to determine whether the

substructure that occurs frequently is an interesting one or not. SRF thus discovers interesting

subgraphs for each of these classes which are calculated using relative frequencies. To classify an

unknown molecule, SRF first finds the subgraph of the molecule and calculates the interestingness of

the sub-graph for each class, based on the weight. The performance of SRF is compared against

MISMOC and is found to be just as accurate as MISMOC. MISMOC approach requires probability

calculations to find the absolute frequency, thus the complexity is increased. The proposed method

decreases the above complexity by just calculating the relative frequency to determine the

interestingness. The method was experimented on a small predefined molecular data and the analyses

of the result were done. Thus the performance of the proposed SRF approach was found satisfactory

and efficient.

Keywords: Frequent subgraph, graph mining, interestingness, molecular structure classification,

SRF, MISMOC.

I. INTRODUCTION

Data mining tasks help in discovering non-trivial patterns that are difficult to find manually.

Data mining has recently attracted considerable attention from database practitioners and researchers

because of its applicability in many areas such as decision support, market strategy and financial

forecasting. Database technology has been used with great success in traditional data processing. But

with the ability to store enormous amounts of business data, it is important to find a way to mine that

directly from the database and extract nuggets to leverage for business advantage. If the data can be

mined directly, it can be used to find abstractions or relations that improve the understanding of the

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &

TECHNOLOGY (IJCET)

ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), pp. 400-411 © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2013): 6.1302 (Calculated by GISI) www.jifactor.com

IJCET

© I A E M E

Page 2: Subgraph relative frequency approach for extracting interesting substructur

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME

401

data and help in making business decisions. Transactional mining (association rules, decision trees

etc) can be effectively used to find non-trivial patterns in categorical and unstructured data. For

applications that have an inherent structure (e.g. chemical compounds, proteins) graph mining is

appropriate, because mapping the structure data into other representations would lead to loss of

structure. Various kinds of data such as social network data, Protein and other Bioinformatics data

can be effectively represented as graphs [1]. A graph representation provides a natural way to

express relationships within data. Graph based data mining expresses data in the form of graphs, and

focuses on the discovery of interesting sub-graph patterns [2][3]. Graphs are being increasingly used

to models wide range of scientific data. Such widespread usage of graphs has generated considerable

interest in mining patterns from graph databases. Graph mining is appropriate as compared to other

techniques as mapping them into other representations would lose the inherent structure. Graph

mining uses the natural structure of the application domain and mines over that structure. Graph

mining consists of algorithms like SUBDUE [4] (holder et al.KDD’94 is for the incomplete beam

search, WARMR [5] (Dehaspe et al KDD’98) is used for inductive logical programming. Graph

theory –based approaches are classified into two apriori-based approach [6] and pattern- growth

approach [7][8]. Graph mining methods for mining the frequent subgraphs. Frequent subgraph

mining approaches under the apriori-based approach. FSG [6], FFSM (Fast Frequent Subgraph

Mining) and pattern growth approach are gSpan [9], MoFa [12], Gaston [12] are to follow some

search orders like DFS and BFS. To elimination of duplicates subgraphs it use passive vs active. For

discover order of patterns it like to path, tree, or graph. Graph mining they have classification and

clustering. Graph clustering is finding similarity measures in two ways first is feature-based

similarity and structure-based similarity. Graph classification having four types of approaches [10],

first is local structure based approach, second is graph pattern-based approach, next kernel-based

approach and boosting. MISMOC [13] is the method for discovering the interesting molecule

substructures for classification. RE_MISMOC [14] is a method of improving the MISMOC by the

relative frequency. The format of representing the molecule structures in SMILES notation [15].

Areas of applications are Drug discovery, Protein Folding, Comparative Genomics, Cancer Risk

Assessment, Gene evolution. The size and number of molecular structure databases have been grown

rapidly due to the advances in X-ray diffraction or nuclear magnetic resonance (NMR) technologies.

Molecular databases of nucleotide, genome, protein and nucleic acid, etc, the databases continue to

grow in size and diversity, and there is an increasing need for techniques to be developed to mine

these data for interesting patterns.

II. LITERATUR STUDY

L. B. Holder, et al [4] present a method for Substructure discovery in the SUBDUE system.

This algorithm begins with the substructure matching a single vertex in the graph, selects the best

substructures and expands the instances of these substructures by one neighbouring edge in all

possible ways. It retains the best substructures in a list; the total amount of computation exceeds a

given limit. The evaluation of each substructure is guided by the minimum description length

principle and background knowledge rules provided by the user.

M. Kuramochi et al [6] suggested data mining techniques that are being increasingly applied

to non-traditional domains, existing approaches for finding frequent item sets cannot be used as they

cannot model the requirement of these domains. An alternate way of modelling the objects in these

data sets is to use a graph to model the database objects. This paper describes a computationally

efficient algorithm for finding all frequent sub graphs in large graph databases. We evaluated the

performance of the algorithm by experiments with synthetic datasets as well as a chemical compound

dataset. The empirical results show that our algorithm scales linearly with the number of input

transactions and it is able to discover frequent sub graphs from a set of graph transactions reasonably

Page 3: Subgraph relative frequency approach for extracting interesting substructur

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME

402

fast, even though we have to deal with computationally hard problems such as canonical labelling of

graphs and sub graph isomorphism which are not necessary for traditional frequent item set

discovery.

Chan, et al [7] describes an inductive method that is capable of detecting the inherent patterns

in such a sequence and to make predictions about the attributes of future events. Unlike previous AI-

based prediction methods, the proposed method is particularly effective in discovering knowledge in

ordered event sequences even if noisy data are being dealt with. The method can be divided into

three phases: (i) detection of underlying patterns in an ordered event sequence; (ii) construction of

sequence-generation rules based on the detected patterns; and (iii) use of these rules to predict the

attributes of future events.

K. C. C. Chan et al [8] gave a method for the efficient acquisition of classification rules from

training instances which may contain inconsistent, incorrect, or missing information. This algorithm

consists of three phases: (i) the detection of inherent patterns in a set of noisy training data; (iii) the

construction of classification rules based on these patterns; and (iii) the use of these rules to predict

the class membership of an object. Being able to handle uncertainty in the learning process, the

proposed algorithm can be employed for applications in real-world problem domains involving noisy

data.

X. Yan et al [9] discovered a method for frequent graph-based pattern mining in graph

datasets and gSpan (Graph-based substructure patter mining) which is the first algorithm that

explores depth-first search (DFS) in frequent subgraph mining. This algorithm consists of two phases

(i) DFS Lexicographic order (ii) minimum DFS code which forms a novel canonical labelling system

to support DFS search. gSpan discovers all the frequent subgraphs without candidate generation and

false positive pruning.

Winnie W. M. Lam et al [13] describes a novel technique called mining interesting

substructures in molecular data for classification (MISMOC) that can discover interesting frequent

sub graphs not just for the characterization of a molecular class but also for the distinguishing of it

from the others. Using a test statistic, MISMOC screens each frequent sub graph to determine if they

are interesting. For those that are interesting, their degrees of interestingness are determined using an

information-theoretic measure. When classifying an unseen molecule, its structure is then matched

against the interesting sub graphs in each class and a total interestingness measure for the unseen

molecule to be classified into a particular class is determined, which is based on the interestingness

of each matched sub graphs.

Maryam Kohzadi, et al [14] propose a novel technique called RF_MISMOC (Relative

Frequency MISMOC) for computing interestingness of patterns in each class .The performance of

the base algorithm by selecting equal numbers of interesting indicator patterns of classes and also

determining optimum threshold value for selection of indicator patterns. This is an improvement over

the original MISMOC algorithm.

III. RELATED WORK

1. MISMOC algorithm [13] is used in chemical molecular classification by considering the graph

structure for them. MISMOC performs its tasks for searching the frequent subgraphs using an

existing algorithm like FSG [6] or GSpan [7].

Here are some probabilistic formulas for calculating

Discovering interesting frequent subgraphs:

Page 4: Subgraph relative frequency approach for extracting interesting substructur

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME

403

Interesting Measures as a Function of the Weight of Evidence:

Based on the mutual information measures, the weight of evidence is

Classification Using a total interestingness Measures

These are the probabilistic calculations in the MISMOC Algorithm.

2. RF_MISMOC [14], Relative Frequency MISMOC is extracted from the technique of mining

interesting substructures in molecular data for classification. This algorithm motive is to

improvement of MISMOC graph based classification by using relative frequency of the interesting

patterns instead of absolute frequency, numIntrsPatterns is the number of all interesting patterns in

one class and numClass is the number of classes in the problem and the F(x).

Determining interesting patterns as follows- First of all convert training data to sequence of

one and zero. Second apply IODLG algorithm to find the patterns with more frequently than minsup,

next compute value of d parameter for all frequency pattern, next select the threshold between 1-2

and select frequently patterns that have a value of d more than the threshold. Next determine

minimum number of interesting patterns that selected in classes. And sort frequent patterns based on

value of d, and select the first minIntrs patterns from each class.

Illustrative example:

To explain the discovery of frequent subgraphs may, we are given three classes of artificial

molecular data shown in Fig. 1. Each of these three classes of data contains eight molecules

represented in SMILES notation [15] and each molecule consists of atoms connected with bonds.

These molecules can be represented as labelled molecular graph with each node used to represent an

atom and each edge as a bond.

Given the set of graph data as shown in Fig.1, frequent subgraphs can be discovered in each

class1, 2 and 3 using a graph-mining algorithm, such as FSG [6]. In FSG algorithm molecular

structures are converted to subgraph structures then finding the interesting relative frequency and the

unseen molecules are predicted. The unseen molecular structures are in Fig. 2(a), 2(b).

Page 5: Subgraph relative frequency approach for extracting interesting substructur

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME

404

Fig.1. Training molecular data

Let us consider an unseen molecule like

C[I+](O)O[Mo](N=O)([U]1[U][U][U][U][

U]1)([No]1[No][No]1)C(=O)C(F)(F)F

Fig. 2(a)

C[Pt](C)C([Co]SC#N)[Co++](N=[N+]=[N

H2+])([Pu](N)[Pu]=O)[Y](N)=O

Fig. 2(b)

Fig.2. Unseen molecule

Page 6: Subgraph relative frequency approach for extracting interesting substructur

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME

405

III. METHODOLOGY

The (subgraph relative frequency) SRF methodology discovers the unknown molecule

classification. The SRF describes first consider the training molecule data, from that finding the

subgraphs by using FSG [6] algorithm. Based on the FSG it obtains the relative frequency. By

considering relative frequency values evaluate the interestingness frequency subgraphs by threshold.

Consider one unknown molecule not in training molecule data, classifying that unknown molecule

belongs to particular class. The block diagram is shown in Fig 1.

Fig.3. SRF Block Diagram

Subgraph Relative Frequency (SRF) The unseen molecular classification problem, which this paper organization is to be stated as

follows. Let us consider the predefined molecule structure data G (Fig 2.traing molecule data set),

containing n molecules are pre-classified into p classes, the unseen molecular problem concerned

with the discovering of interesting patterns in the data to “unseen” molecule not in G to be correctly

classified into one of the p class. The n molecules of graph G can be represented as G1, G2,..., Gn ,

where Gi = Gi (Vi, Ei), i � {1,....,n},the vertices representing as atom and edge represented as bonds

between atoms. The p classes that the n molecules are their corresponding molecules structures are

classified, which are represented as C(1)

,....C(p)

, where C(i)

={G1(i)

,....., Gci(i)

}�G,i=1,....,p.

In the following we present the details of SRF technique is effectively improving the accuracy of

molecular graph classification. This SRF performance the several various tasks. It first searches for

the frequent subgraphs by using existing algorithm FSG[6]. Next calculate the relative frequency

value and interesting subgraph frequency and at last unseen molecular classification.

A. Discovering frequent subgraph molecule

To discover frequent subgraph in a molecular data base, there are several graph mining

algorithms to choose.SRF using the FSG graph mining algorithm. Given molecular data set

G={G1,........, Gn}by the algorithm to discover a set of frequent sub graphs F(1)

,...., F(p)

, where F(1)

={

F1(i)

,......, Fni(i)

}, i=1,....,p, for each of the corresponding p classes C

(i),...,C

(p).

The FSG algorithm can find all the frequent sub graphs in each class of molecular graphs

using the apriori algorithm. Briefly, FSG [6] described as follows. For each C(i)

in G, i =1,....p. FSG

first finds a set of frequent subgraphs of one-edge and two-edge. Based on these two intermediate

subgraphs, it starts to iterate general frequent candidate subgraph. FSG counts the general frequency

candidates and prune subgraphs that do not satisfies the threshold and verifying the same support

condition to prune the lattice of frequent subgraphs. Finally the frequent subgraphs F(1)

,F(2)

,......,F(p)

.where F(i)

contains all the k-frequent subgraphs ,are generated by each class. Let gk be a k-subgraph

with k-edges, Dk

be the set of all candidate subgraphs with k-edges, Fk(i)

be the set of frequent k-

subgraph for class C(i)

, the algorithm of FGS is summarized in Fig 4.

Training

molecule

data

Finding

sub-graphs

by FSG

Obtains

Relative

frequency

from FSG

table

SRF

condition gets

the interesting

frequency sub

graphs

Classify

the

unknown

molecule

Page 7: Subgraph relative frequency approach for extracting interesting substructur

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME

406

Fig. 4. Algorithm of FSG

B. Discovering interesting frequent subgraphs by SRF

The aim of FSG [6] to discover the frequent subgraph F(i)

={F1(i)

,......., Fni(i)

}, i=1,..., p in each

of the corresponding graph class C(1)

,....., C(p)

. A frequent subgraph, which appears frequently in one

graph, may also do so in another and such frequent subgraphs are not interesting for classification. In

this paper, we presenting a methodology that is SRF used to identify the interesting subgraphs that

are interesting and useful for classification. This methodology is based on the relative frequency

values on use of simple arithmetic calculation and the algorithm given in Fig. 5.The calculation of

discovering the relative frequency of F1(i)

for C(1)

is F1(i)

( C(i)

) - (least frequency of F1(i)

in all of the

classes) repeat the iteration for rest of the relative frequency values of a graph molecule in a classes

C(i)

. The interesting frequency subgraph are shown in table I and found by using the D, the highest

relative frequency value of a subgraph F1(i)

for a class C(i)

subtracted from the second highest relative

frequency value of a subgraph F1(i)

for a class C(i)

. If D ≥ 30% of class size based on this condition

the interesting relative subgraphs are classified. The set of interesting frequent subgraph discovered

from each of C(1)

,....., C(p)

is denoted as F′(i)

= {F1′(i)

,...., Fn′(i)

},i=1,...,p respectively.

Page 8: Subgraph relative frequency approach for extracting interesting substructur

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME

407

Fig.5. Algorithm of SRF

C. Classification of unseen molecule of a class.

Given the interesting frequent subgraphs F1′(1)

,...., Fn′(p)

, discovering for each corresponding p

classes C(1)

,...., C(p)

,an “unseen” molecule graph is not in graph data G, classified by matching it

against the subgraphs in each of F′(i)

,i=1,...,p.

SRF computes the total interestingness defines as summation of the total interesting frequent

subgraphs F1′(i)

for against G to classified into C(i)

as follows:

I(i)

(G) = I(G� C(i)

/G� C(i)

|G is characterized by F1′(i)

,...., Fn′(i)

)

�� I�G � C�i�/G � C�i�|G ��

���is characterized by Fj

′(i))

The total interestingness for G classified into each of C(1)

,...,C(p)

is determined and SRF assign G to

the class, which gives the greatest total interesting measure.

Page 9: Subgraph relative frequency approach for extracting interesting substructur

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME

408

Table I: Interestingness of FSG

Molecule and

Smiles Notation Class

1

Class

2

Class

3

S1(i)

N=O

7

0

8

1

7

0

S2(i)

C[Pt](C)C

2

0

5

3

6

4

S3(i)

N[Pu+][Pu]=O

4

2

5

3

3

0

S4(i)

ClC(Cl)Cl

5

4

1

0

1

0

S5(i)

OS(O)(=O)=O

5

5

0

0

1

1

S6(i)

OP(O)(O)=O

1

0

5

4

1

0

S7(i)

N=[N+]=[N-]

1

0

3

2

1

0

S8(i)

F[C](F)(F)=O

0

0

1

1

4

4

S9(i)

BB\B=B\BB(BOB=O)\B=B\B=B/B

1

0

1

0

4

3

S10(i)

[U]1[U][U][U][U][U]1

1

0

4

3

4

3

S11(i)

C[I](O)O

5

3

2

0

3

1

S12(i)

N[Y]=O

3

2

2

1

1

0

S13(i)

[No]1[No][No]1

4

0

4

0

4

0

S14(i)

[Ir]1[Ir][Ir][Ir]1

4

0

5

1

4

0

S15(i)

N#[S]

3

1

2

0

4

2

Page 10: Subgraph relative frequency approach for extracting interesting substructur

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME

409

Comparison of the proposed (SRF) with MISMOC and RF_MISMOC In this paper SRF address that discovering the interestingness of FSG [6] and “unseen”

molecule classification, by using the relative frequency values. This is to be solved by using the

simple arithmetic calculation and reduces the time complexity form the MISMOC. In MISMOC[13]

is done the same thing discovering the interestingness measure of FSG [6] and “unseen “molecule

classification by using the absolute frequency value and doing the probability calculation it shows

that more complexity. While comparing these to methodologies of the result shows the same.

RE_MISMOC [14] is shown the improved performance of MISMOC graph classification algorithm

by taking relative frequency of pattern in each class, by selecting equal number of interesting

indicator pattern class and determined optimal threshold value for the selection of indicator

performance. In this paper, considering the relative frequency and discovering the “unseen” molecule

classification for a class which is not done in RF_MISMOC. Thus the performance of SRF approach

is to discovering the interesting relative frequency and “Unseen” molecule classification is efficient.

IV. EXPERIMENTAL RESULTS

The interestingness of subgraphs using SRF and MISMOC for the above example is

calculated as follows.

Consider an unknown molecule

Mo

NO

O

No

No No

UU

U

UU

U

O

F

F

F

HOI+

H3C

C[I+](O)O[Mo](N=O)([U]1[U][U][U][U][U]1)

([No]1[No][No]1)C(=O)C(F)(F)F

Fig. 2(a)

These are S1, S8, S10, S11, S13 sub graphs from the table I

The calculation using MISMOC is Class 1 interestingness = Sum of d values for the class 1

of the subgraphs S1, S8, S10, S11, S13 found in unknown molecule= −11.326.Class 2

interestingness = Sum of d values for the class 1 of the subgraphs S1, S8, S10, S11, S13 found in

unknown molecule = -9.104. Class 3 interestingness = Sum of d values for the class 1 of the

subgraphs S1, S8, S10, S11, S13 found in unknown molecule=2.390. Based on the interestingness

values we can classify the unknown molecule. Classification = Class 3.

The calculation using SRF is Class 1 interestingness =2+0+0+0+0=2(2 for S1 relative value in class

1, 0 for S8 relative value in class 1, 0 for S10 relative value in class 1, 0 for S11 relative value in

class 1, 0 for S13 relative value in class 1).Class 2 interestingness = 1+1+1+3+0=6. Class 3

interestingness=3+0+4+3+0=10 Based on the interestingness values classify the unknown molecule.

Classification = Class 3.

Page 11: Subgraph relative frequency approach for extracting interesting substructur

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME

410

Consider an unknown molecule

C[Pt](C)C([Co]SC#N)[Co++](N=[N+]

=[NH2+])([Pu](N)[Pu]=O)[Y](N)=O

Fig. 2(b)

These are S2, S3, S10, S7, S12, S15 sub graphs from the Table I.

The calculation using MISMOC is Class 1 interestingness = Sum of d values for the class 1

of the subgraphs S2, S3, S10, S7, S12, S15 found in unknown molecule = −8.034.Class 2

interestingness = Sum of d values for the class 2 of the S2, S3, S10, S7, S12, S15 found in unknown

molecule = -0.719. Class 3 interestingness = Sum of d values for the class 3 of the subgraphs S2, S3,

S10, S7, S12, S15 found in unknown molecule = -16.704. Based on the interestingness values

classify the unknown molecule. Classification = Class 2.

The calculation using SRF is Class 1 interestingness = Sum of relative values for the class 1 of the

subgraphs S2, S3, S10, S7, S12, S15 found in unknown molecule=4. Class 2 interestingness = Sum

of relative values for the class 2 of the subgraphs S2, S3, S10, S7, S12, S15 found in unknown

molecule=8. Class 3 interestingness = Sum of relative values for the class 3 of the subgraphs S2, S3,

S10, S7, S12, S15 found in unknown molecule=6. Based on the interestingness values classify the

unknown molecule. Classification = Class 2. Similarly perform for classification of class 1 also.

V. CONCLUSION

In this paper, we introduced a new graph – mining technique called SRF (subgraph relative

frequency) to discover the unknown molecule subgraphs from graph databases. In SRF, instead of

the absolute frequency subgraph considering the relative frequency value.SRF methodology is best

and effective way to discover the interesting frequent subgraphs for a class and determine the unseen

molecules substructures classification. This algorithm gives less time complexity from previous

methods.

VI. REFERENCES

[1] J. A. Bondy, Graph Theory With Applications. New York: Elsevier, 1976.

[2] D. Conklin, S. Fortier, and J. Glasgow, “Knowledge discovery in molecular databases,” IEEE

Trans. Knowl. Data Eng., vol. 5, no. 6, pp. 985–987, Dec. 1993.

[3] Y. Yoshida, Y. Ohta, K. Kobayashi, and N. Yugami, “Mining interesting patterns using

estimated frequencies from subpatterns and superpatterns,” Lecture Notes in Computer

Science, vol. 2843, pp. 494–501, 2003.

Page 12: Subgraph relative frequency approach for extracting interesting substructur

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME

411

[4] L. B. Holder, D. J. Cook, and S. Djoko, “Substructure discovery in the SUBDUE system,” in

Proc. AAAI Workshop Knowl. Discov. Databases, 1994, pp. 169–180.

[5] R. D. King, A. Srinivasan, and L. Dehaspe,“Warmr:A data mining tool for chemical data,” J.

Comput.-Aided Mol. Des., vol. 15, no. 2, pp. 173–181, 2001.

[6] M. Kuramochi and G. Karypis, “Frequent sub-graph discovery,” in Proc. 1st IEEE Int. Conf.

Data Mining (ICDM), 2001, pp. 313–320.

[7] K. C. C. Chan, A. K. C. Wong, and D. K. Y. Chiu, “Learning sequential patterns for

probabilistic inductive prediction,” IEEE Trans. Syst., Man Cybern., vol. 24, no. 10, pp.

1532–1547, Oct. 1994.

[8] K. C. C. Chan and A. K. C. Wong, “APACS: A system for automated pattern analysis and

classification,” Comput. Intell.: Int. J., vol. 6, pp. 119– 131, 1990.

[9] X. Yan and J. Han, “gSpan: Graph-based substructure pattern mining,” in Proc. IEEE Int.

Conf. Data Mining, 2002, pp. 721–724.

[10] I.Fischer and T. Meinl, “Graph-based molecular data mining – An overview,” in Proc. IEEE

Int. Conf. Syst., Man Cybern., 2004, vol. 5, pp. 4578–4582.

[11] M. Deshpande, M. Kuramochi, N. Wale, and G. Karypis, “Frequent substructure-based

approaches for classifying chemical compounds,” IEEE Trans. Knowl. Data Eng., vol. 17,

no. 8, pp. 1036–1050, Aug. 2005.

[12] K.Lakshmi and Dr. T. Meyyappan “Frequent Subgraph Mining Algorithms -A Survey And

Framework For Classification”

[13] Winnie W. M. Lam and Keith C. C. Chan,” Discovering Interesting Molecular Substructures

for Molecular Classification,” IEEE Transactions On Nanobioscience, Vol. 9, No. 2, June

2010.

[14] Maryam Kohzadi,Mohammad reza Keyvanpour, “RF_MISMOC: Improvement of MISMOC

graph based classification algorithm.” 1877-7058 © 2011 Published by

ElsevierLtd.doi:10.1016/j.proeng.2011.08.1012.

[15] http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html

[16] M.Siva Parvathi and B.Maheswari, “Minimal Dominating Functions of Corona Product

Graph of a Cycle with a Complete Graph”, International Journal of Computer Engineering &

Technology (IJCET), Volume 4, Issue 4, 2013, pp. 248 - 256, ISSN Print: 0976 – 6367, ISSN

Online: 0976 – 6375.

[17] László Lengyel, “The Role of Graph Transformations in Validating Domain-Specific

Properties”, International Journal of Computer Engineering & Technology (IJCET),

Volume 3, Issue 3, 2012, pp. 406 - 425, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.

[18] Rinal H. Doshi, Dr. Harshad B. Bhadka and Richa Mehta, “Development of Pattern

Knowledge Discovery Framework using Clustering Data Mining Algorithm”, International

Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 3, 2013,

pp. 101 - 112, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.


Top Related