classifier based text mining for radial basis function

8/10/2019 Classifier Based Text Mining For Radial Basis Function

1/6

Classifier Based Text Mining For Radial Basis Function

M.GOVINDARAJANLecturer (Senior Scale)

Department of CSE

Annamalai UniversityAnnamalai Nagar -608002

Tamil Nadu

INDIA

RM.CHANDRASEKARANProfessor

Department of CSE

Annamalai UniversityAnnamalai Nagar -608002

Tamil NaduINDIA

Abstract: - Text Mining is around applying knowledge discovery techniques to unstructured text is termed knowledge

discovery in text (KDT), or Text data mining or Text Mining. In Neural Network that address classification problems,

training set, testing set, learning rate are considered as key tasks. That is collection of input/output patterns that are used

to train the network and used to assess the network performance, set the rate of adjustments. This paper describes a

proposed radial basis function neural net classifier that performs cross validation for original RBF Neural Network. In

order to reduce the optimization of classification accuracy, training time. The feasibility the benefits of the proposed

approach are demonstrated by means of two data sets like mushroom, weather symbolic. It is shown that, for mushroom

(large dataset) the accuracy with Proposed RBF Neural Network was in average around 1.4 % less than with the original

RBF Neural Network and the larger the improvement in speed. For weather symbolic (smaller dataset) the accuracy with

Proposed RBF Neural Network was in average around 35.7 % less than with the original RBF Neural Network and the

smaller the improvement in speed. This algorithm is independent of specify data sets so that many ideas and solutions can

be transferred to other classifier paradigms.

Keywords Radial Basis Function, Classification accuracy, Text mining, Time complexity.

1. Introduction

In supervised learning, we are given a set of examplepairs (x, y, x X, y Y) and the aim is to find a

function f in the allowed class of functions that matches

the examples. In other words, we wish to infer the

mapping implied by the data; the cost function is related

to the mismatch between our mapping and the data and

it implicitly contains prior knowledge about the

problem domain. In this article we start with the

following assumptions.

1.1

Radial basis function networks

Radial Basis Function (RBF) networks [13] are alsofeedforward, but have only one hidden layer. Like

MLP, RBF nets can learn arbitrary mappings; theprimary difference is in the hidden layer. RBF hidden

layer units have a receptive field which has a centre;

that is a particular input value at which they have a

maximal output. Their output tails off as the input

moves away from this point. Generally, the hidden units

have a Gaussian transfer function. This is usually

7th WSEAS Int. Conf. on ARTIFICIAL INTELLIGENCE, KNOWLEDGE ENGINEERING and DATA BASES (AIKED'08),University of Cambridge, UK, Feb 20-22, 2008

ISSN: 1790-5109 Page 476 ISBN: 978-960-6766-41-1
http://en.wikipedia.org/wiki/Supervised_learninghttp://en.wikipedia.org/wiki/Supervised_learningmailto:[email protected]:[email protected]


2/6


3/6

2.1 Related Work

This article focuses on training time and classification

accuracy using cross validation of RBF Neural

Network. Cross validation methods are described in [4].

In general, filter approach was described. The problem

of training time and classification accuracy for neural

networks is discussed in [5] [6]. Here, we discuss

examples of the combination of RBF and PRBF

algorithm. Altogether, we investigated five datasets

where cross validation methods are applied to optimize

RBF algorithm. The following steps are carrying out to

classify the Radial Basis Function[3].

1. Input layer is used to simply input the data.

2. A Gaussian activation function is used at the hidden

layer

3. A linear activation function is used at the outputlayer.

The objective is to have the hidden nodes learn to

respond only to a subset of the input, namely, that

where the Gaussian function is centered. This is usuallyaccomplished via supervised learning. When RBF

functions are used as the activation functions on the

hidden layer, the nodes can be sensitive to a subset of

the input values.

2.2 Motivation for a New Approach

A radial function or a radial basis function (RBF) is a

class of function whose value decreases (or increases)

with the distance from a central point. The Gaussian

activation function is an RBF network is typically an

NN with three layers. The input layer is used to simply

input data. A Gaussian activation function is used at the

hidden layer, while a linear activation function is used

at the output layer. But we proposed RBF is a Class that

implements a normalized Gaussian radial basis function

network. It uses the k-means clustering algorithm to

provide the basis functions and learns either a logistic

regression (discrete class problems) or linear regression

(numeric class problems) on top of that. Symmetric

multivariate Gaussians are fit to the data from each

cluster. If the class is nominal it uses the given number

of clusters per class. It standardizes all numeric

attributes to zero mean and unit variance.

3 Classification with radial basis function

neural network

Supervised training involves providing an ANN with

specified input and output values, and allowing it to

iteratively reach a solution. MLP and RBF employ thesupervised mode of learning.

The RBF design involves deciding on their centers and

the sharpness (standard deviation) of their Gaussians.

Generally, the centres and SD (standard deviations) are

decided first by examining the vectors in the training

data. RBF networks are trained in a similar way as

MLP. The output layer weights are trained using the

delta rule. MLP is the most widely applied neural

network technique. RBF have the advantage that one

can add extra units with their centres near parts pf the

input, which are difficult to classify. Simpleperceptions, MLP, and RBF networks are supervised

networks. In an Unsupervised mode, the network adapts

purely in response to its inputs. Such networks can learn

to pick out structures in their input. One of the most

popular models in the unsupervised framework is the

self-organizing map (SOM), Radial basis function

(RBF) networks combine a number of differentconcepts from approximation theory, clustering, and

neural network theory. A key advantage of RBF

networks for practitioners is the clear and

understandable interpretation of the functionality of

basis functions. Also, fuzzy rules may be extracted fromRBF networks for deployment in an expert system. The

RBF networks used here may be defined as follows.

1) RBF networks have three layers of nodes: input

layer, hidden layer , and output layer .

2) Feed-forward connections exist between input and

hidden layers, between input and output layers (shortcut

connections), and between hidden and output layers.

Additionally, there are connections between a bias node

and each output node. A scalar weight is associated

with the connection between nodes.

3) The activation of each input node (fanout) is equal to

its external input where is the th element of the external

input vector (pattern) of the network (denotes the

number of the pattern).

4) Each hidden node (neuron) determines the Euclidean

distance between its own weight vector and the

activations of the input nodes, i.e., the external input

vector The distance is used as an input of a radial basis

function in order to determine the activation of node.


ISSN: 1790-5109 Page 478 ISBN: 978-960-6766-41-1


4/6

Here, Gaussian functions are employed the parameter of

node is the radius of the basis function; the vector is its

center. Any other function which satisfies the

conditions derived from theorems of Schoenberg or

Micchelli. Localized basis functions such as the

Gaussian or the inverse multiquadric are usually

preferred.5) Each output node (neuron) computes its activation as

a weighted sum The external output vector of thenetwork, consists of the activations of output nodes, i.e.,

. The activation of a hidden node is high if the current

input vector of the network is similar (depending on

the value of the radius) to the center of its basis

function. The center of a basis function can, therefore,

be regarded as a prototype of a hyper spherical cluster

in the input space of the network. The radius of the

cluster is given by the value of the radius parameter. A

radial basis function (RBF) is a real-valued function

whose value depends only on the distance from theorigin. They are used in function approximation, time

series prediction, and control. In artificial neural

networksradial basis functions are utilized as activation

functions.

4 Optimization of RBF Algorithm

In this section, a schematic overview of cross validation

used RBF optimization is given. Then, the standard

techniques are sketched and our innovative extensions

are described in detail.

4.1 Overview

From an algorithmic perspective, optimization is a least

value for the minimization that can be used to solve a

wide range of optimization tasks including the most

important parameters are optimized of neural network.

4.2 Standard Methods of the Cross

Validation

The development of the new approach was guided bythe idea that well known cross validation methods

should be applied as far as possible. To keep the

runtime of the cross validation, only the most important

parameters are optimized. We discuss techniques [2] for

estimating runtime and classifier accuracy, such as the

(i) Holdout

(ii) K-fold cross validation

Holdout: The given data are randomly partitioned into

two independent sets, a training set and test set.

Random sub the algorithm. The idea minimizes

validation techniques described in least wide important

by the should be cross optimized. And sampling is a

variation of the holdout method in which the holdout

method is repeated k times. The overall accuracyestimate is taken as the average of the accuracies

obtained from each iteration.

K-fold cross-validation: The initial data are randomly

partitioned into k mutually exclusive subsets or folds,

s1, s2, s3.sk each of approximately equal to size.

Training and Testing is performed k times. Theaccuracy estimate is the overall number of correct

classifications from the k-iterations, divided by the total

number of samples in the initial data.

In stratified cross validation, the folds are stratified so

that the class distribution of the samples in each fold isapproximately the same as that in the initial data.

Bootstrapping: Given training instances uniformly with

replacement.

Leave-one-out: k-fold cross validation with k set to s ,

number of initial samples. In general, stratified 10-fold

crossvalidation is recommended for estimating

classifier accuracy (even if computation power allows

using more folds) due to its relatively low bias and

variance. The use of such techniques to estimate

classifier accuracy increases the overall computation

time, yet is useful for among several classifiers.

Increases classifier Accuracy:(i) Bagging ( or bootstrap aggregation)

(ii) Boosting

5 Experimental results

In this section we demonstrated the properties and

advantages of our approach by means of two data sets

like mushroom, weather symbolic. The performance of

classification algorithms is usually examined by

evaluating the accuracy of the classification. However,

since classification is often a fuzzy problem, the correct

answer may depend on the user. Traditional algorithm

evaluation approaches such as determining the space

and time overhead can be used, but these approaches

are usually secondary. Classification accuracy [13] is

usually calculated determining the percentage of tuples

placed in the correct class. This ignores the fact that

there also may be a cost associated with an incorrect

assignment to the wrong class. This perhaps should also


ISSN: 1790-5109 Page 479 ISBN: 978-960-6766-41-1
http://en.wikipedia.org/wiki/Origin_(mathematics)http://en.wikipedia.org/wiki/Function_approximationhttp://en.wikipedia.org/wiki/Time_series_predictionhttp://en.wikipedia.org/wiki/Time_series_predictionhttp://en.wikipedia.org/wiki/Control_theoryhttp://en.wikipedia.org/wiki/Artificial_neural_networkhttp://en.wikipedia.org/wiki/Artificial_neural_networkhttp://en.wikipedia.org/wiki/Artificial_neural_networkhttp://en.wikipedia.org/wiki/Artificial_neural_networkhttp://en.wikipedia.org/wiki/Control_theoryhttp://en.wikipedia.org/wiki/Time_series_predictionhttp://en.wikipedia.org/wiki/Time_series_predictionhttp://en.wikipedia.org/wiki/Function_approximationhttp://en.wikipedia.org/wiki/Origin_(mathematics)


5/6

be determined. We examine the Performance of

classification much as is done with information retrieval

systems. With only two classes, there are four possible

outcomes with the classification. The upper left and

lower right quadrants are correct actions. The remaining

two quadrants are incorrect actions.

Table 1

Properties of data sets

The

performance of classification algorithms is usuallyexamined by evaluating the accuracy of the

classification. However, since classification is often a

fuzzy problem, the correct answer may depend on the

user. Traditional algorithm evaluation approaches such

as determining the space and time overhead can be

used, but these approaches are usually secondary.

Classification accuracy [11] is usually calculated by

determining the percentage of tuples placed in the

correct class. This ignores the fact that there also may

be a cost associated with an incorrect assignment to the

wrong class. This perhaps should also be determined.

We examine the Performance of classification much asis done with information retrieval systems. With only

two classes, there are four possible outcomes with the

classification. The upper left and lower right quadrants

are correct actions. The remaining two quadrants are

incorrect actions.

Table 2

Training Time (seconds)

Dataset

Factor of

Proposed

Radial

Basis

Function(PRBF)

Original

Radial

Basis

Function(ORBF)

Faster by

Mushroom 217.23 246.61 29.38

weather.

symbolic 0.05 0.06 0.01

Training Time

0

100

200

300

1 2 3

ORBF

P

R

B

F PRBF

ORBF

Fig.1 Training Time

Table 3 Classification accuracy

Classification Accuracy

0

50

100

150

ORBF

PRBF PRBF

ORBF

Fig.2 Classification Accuracy

6 Conclusions

In this work we developed one text mining classifier

using Neural Network methods to measure the training

time for two data sets like mushroom, weather

symbolic. First, we utilized our developed text mining

algorithms, including text mining techniques based onclassification of data in two data collections. After that,

we employ exiting neural network to deal with measure

the training time for two data sets. Experimental results

Dataset

Factor ofInstances Attribues

Mushroom 8124 23

Weather.

symbolic 14 5

DatasetFactor of

% Correct

using 10-fold crossvalidation

(PRBF)

%

Correctclass(ORBF)

ClassificationAccuracy

Mushroom 65.5465 66.9498 1.4033 %

Weather.symbolic 64.2857 100 35.7143 %


ISSN: 1790-5109 Page 480 ISBN: 978-960-6766-41-1


6/6

show that for mushroom (large dataset) the accuracy

with Proposed RBF Neural Network was in average

around 1.4 % less than with the original RBF Neural

Network and the larger the improvement in speed. For

weather symbolic (smaller dataset) the accuracy with

Proposed RBF Neural Network was in average around

35.7 % less than with the original RBF Neural Networkand the smaller the improvement in speed.

Acknowledgement

Authors gratefully acknowledge the authorities of

Annamalai University for the facilities offered and

encouragement to carry out this work. This part of work

is supported in part by the first author got Career

Award for Young Teachers (CAYT) grant from All

India Council for Technical Education, New Delhi.They would also like to thank the reviewers for their

valuable remarks

References:[1] Guobin Ou,Yi Lu Murphey, Multi-class

pattern classification using neural

networks, Pattern Recognition 40 (2007)

[2] M.Govindarajan, Dr.RM.Chandrasekaran,

Classifier Based Text Mining for Neural

Network Proceeding of XII international

conference on computer, electrical and

system science and engineering, may 24-26,

Vienna , Austria, waste.org,2007. pp. 200-

205

[3] Oliver Buchtala, Manual Klimek and

Bernhard Sick, Member, IEEE

Evolutionary Optimization of Radial Basis

Function Classifier for Data Mining

Applications, IEEE Transactions on

systems,man,andcybernets,vol.35,No.5,

October,2005

[4] Jiawei Han , Micheline Kamber Data

Mining Concepts and Techniques

Elsevier, 2003, pages 303 to 311 , 322 to

325.

[5] Intrusion Detection: Support Vector

Machines and Neural Networks, Srinivas

Mukkamala, Guadalupe Janoski, Andrew

Sung {srinivas, silfalco, , Department of

Computer Science New Mexico Institute

of Mining and Technology, Socorro, New

Mexico 87801, 2002, IEEE

[6] N. Jovanovic, V. Milutinovic, and Z.

Obradovic, Member, IEEE, Foundations

of Predictive Data Mining (2002)

[7] Yochanan Shachmurove, Department of

Economics,The City College of the City,

University of New York and The

University of Pennsylvania, Dorota

Witkowska, Department ofManagement,Technical University of

Lodz CARESS Working Paper #00-

11Utilizing Artificial Neural Network

Model to Predict Stock Markets

September 2000

[8] Bharath, Ramachandran. Neural Network

Computing. McGraw-Hill, Inc., New York,

1994. pp. 4-43.

[9] Luger, George F., and Stubblefield,

William A. Artificial Intelligence:

Structures and Strategies for Complex

Problem Solving, (2nd Edition).Benjamin/Cummings Publishing Company,

Inc., California, 1993, pp. 516-527.

[10] Andrew T.Wilson Off-line Handwriting

Recognition Using Artificial Neural

Networks

[11] Skapura, David M., Building Neural

Networks. ACM Press, New York. pp. 29-

33.

[12]Bhavit Gyan, University of Canterbury,

Kevin E. Voges, University of Canterbury

Nigel K. Ll. Pope, Griffith University

Artificial Neural Networks in Marketingfrom 1999 to 2003: A Region of Origin

and Topic Area Analysis

[13] Margaret H.Dunham, Data Mining-

Introductory and Advanced Topics

Pearson Education, 2003, pages 106-112


ISSN: 1790-5109 Page 481 ISBN: 978-960-6766-41-1

classifier based text mining for radial basis function

Documents