new chapter 3 proposed improved bee colony...

34

CHAPTER 3

PROPOSED IMPROVED BEE COLONY OPTIMIZATION

BASED ON ROUGH SET ALGORITHM

3.1 INTRODUCTION

When processing large databases, two major obstacles are

encountered: i) numerous samples ii) high dimensionality of the feature space.

For example, documents are represented by several thousands of words;

images are composed of millions of pixels, where each word or pixel is here

understood as a feature. Currently, processing methods are often unable to

handle such high dimensional data, mostly due to numerous difficulties in

processing, requirements in storage and transmission within a reasonable

time. To reduce the computational time, it is a common practice to project the

data onto a smaller, latent space. Moreover, such a space is often beneficial

for further investigation due to noise reduction and desired feature extraction

properties. Smaller dimensions are also advantageous when visualizing and

analyzing the data. Thus, in order to extract desirable information,

dimensionality reduction or feature selection methods are often applied. A

feature selection algorithm should be seen as a computational approach to a

definition of relevance, although in many cases these definitions are followed

in a somewhat loose sense.

35

3.2 EXISTING METHODS

Rough Set Theory based Approaches

In rough set based feature selection, the goal is to omit attributes

(features) from decision systems such that objects in different decision classes

can still be discerned. A popular way to evaluate attribute subsets with respect

to this criterion is based on the notion of dependency degree. In the standard

approach, attributes are expected to be qualitative; in the presence of

quantitative attributes, the methodology can be generalized using fuzzy rough

sets, to handle gradual (in) discernibility among attribute values more

naturally. However, both the extended approach as well as its crisp

counterpart, exhibit a strong sensitivity to noise; a change in a single object

may significantly influence the outcome of the reduction procedure. Jensen

and Shen (2008) proposed an extension of the fuzzy-rough feature selection

methodology, based on interval-valued fuzzy sets, as a means to counter this

problem through the representation of missing values in an intuitive way.

Pan et al (2008) proposed a hybrid feature selection approach based

on Rough sets and Bayesian network classifiers. In this approach, the

classification result of a Bayesian network is used as the criterion for the

optimal feature subset selection. The Bayesian network classifier used here is

a kind of naive Bayesian classifier. It is employed to implement classification

by learning the samples consisting of a set of texture features. In order to

simplify feature reduction using Rough Sets, a discrete method based on C-

means clustering method is also presented. This approach is applied to extract

residential areas from panchromatic SPOT5 images. Experimental results

show that the proposed method not only improves classification quality but

also reduces computational cost.

36

Chiang and Ho (2008) presented a novel rough-based feature

selection method for gene expression data analysis. This method can find the

relevant features without requiring the number of clusters to be known a priori

and identify the centers that approximate to the correct ones. They also

introduced a prediction scheme that combines the rough-based feature

selection method with radial basis function neural network. Further, the effect

of different feature selection methods and classifiers on this prediction

process are evaluated using the Naive Bayes and linear support vector

machine as classifiers and the performance is compared with other feature

selection methods, including information gain and principle component

analysis. The performance is demonstrated by several published datasets. The

results show that the proposed method can achieve better classification

accuracy.

Shang and Shen (2008) presented a methodological approach for

developing image classifiers that work by exploiting the technical potential of

both fuzzy-rough feature selection and neural network-based classification.

The use of fuzzy-rough feature selection allows the induction of low-

dimensionality feature sets from sample descriptions of real-valued feature

patterns of a (typically much) higher dimensionality. The employment of a

neural network, trained using the induced subset of features, ensures the

runtime classification performance. The reduction of feature sets reduces the

sensitivity of such a neural network-based classifier to its structural

complexity. It also minimises the impact of feature measurement noise to the

classification accuracy. This method is evaluated by applying the approach to

classifying real medical cell images, supported with comparative studies.

Cornelis and Jensen (2008) considered a more flexible

methodology based on the recently introduced Vaguely Quantified Rough Set

(VQRS) model. This method can handle both crisp (discrete-valued) and

37

fuzzy (real-valued) data and encapsulates the existing noise-tolerant data

reduction approach using Variable Precision Rough Sets (VPRS), as well as

the traditional rough set model, as special cases.

Xie et al (2008) used the VPRS model as a tool to support Group

Decision-Making in credit risk management. It was considered that the

classification in decision tables consisting of risk exposure may be partially

erroneous and a variable precision factor was used to adjust the classification

error. In this work, firstly VPRS and AHP were combined to obtain the

weight of conditional attribute sets decided by each decision-maker. Then, the

Integrated Risk Exposure of attributes is obtained based on the three VPRS-

based models. To verify the effectiveness of these methods, an illustrative

example is presented. The experimental results suggest that the VPRS-based

IRE has advantages in recognizing important attributes.

Fazayeli et al (2008) studied the Rough Set theory as a method of

feature selection based on tolerant classes that extends the existing equivalent

classes. The determination of initial tolerant classes is a challenging and

important task for accurate feature selection and classification. The

Expectation-Maximization clustering algorithm is applied to determine

similar objects. This method generates fewer features with either a higher or

the same accuracy compared to two existing methods namely, Fuzzy Rough

Feature Selection and Tolerance-based Feature Selection, on a number of

benchmarks from the UCI repository.

Song et al (2008) proposed a semi-supervised dimensionality

reduction framework which can efficiently handle the unlabeled data. Underthe framework, several classical methods, such as principal componentanalysis, linear discriminant analysis, maximum margin criterion, localitypreserving projections and their corresponding kernel versions can be seen as

special cases. For high-dimensional data, a low-dimensional embedding result

38

can be given for both discriminating multi-class sub-manifolds and preservinglocal manifold structure. Experiments show that these algorithms cansignificantly improve the accuracy rates of the corresponding supervised andunsupervised approaches.

Yao and Zhao (2008) addressed attribute reduction in decision-theoretic rough set models regarding different classification properties, such

as decision-monotocity, confidence, coverage, generality and cost. It isimportant to note that many of these properties can be truthfully reflected by a

single measure in the Pawlak rough set model. On the other hand, they needto be considered separately in probabilistic models. A straightforward

extension of the measure is unable to evaluate these properties. This studyprovides a new insight into the problem of attribute reduction.

Jensen and Shen (2009) proposed an extension of the fuzzy-rough

feature selection methodology, based on interval-valued fuzzy sets, as ameans to counter this problem via the representation of missing values in anintuitive way. Jensen et al (2009) proposed another approach, based on fuzzy-rough sets. The algorithm is experimentally evaluated against leading

classifiers, including fuzzy and rough rule inducers and shown to be effective.

Zainal et al (2009) investigated the effectiveness of rough settheory in identifying important features in building an intrusion detection

system. Rough set theory was also used to classify the data. KDD Cup 99 datawas used for validating the results. Empirical results indicate that rough set is

comparable to other feature selection techniques deployed by few otherresearchers.

Swarm Intelligence based Approaches

Ke et al (2008) introduced a new approach based on ant colony

optimization (ACO) for attribute reduction. To verify the proposed algorithm,

39

numerical experiments are carried out on thirteen small or medium-sized

datasets and three gene expression datasets. The results demonstrate that this

algorithm can provide competitive solutions efficiently.

Liu et al (2009) introduced two nature inspired population-based

computational optimization techniques, Particle Swarm Optimization (PSO)

and Genetic Algorithm (GA) for rough set reduction. PSO discovers the best

feature combinations in an efficient way to observe the change of positive

region as the particles proceed throughout the search space. The performance

of the two algorithms is evaluated using some benchmark datasets and the

corresponding computational experiments are discussed. Empirical results

indicate that both methods are ideal for all the considered problems and

particle swarm optimization technique outperformed the genetic algorithm

approach by obtaining more number of reducts for the datasets. A real world

application in fMRI data analysis is also illustrated which is helpful for

cognition research.

Bello et al (2009) achieved promising results solving the feature

selection problem through a joint effort between rough set theory and

evolutionary computation techniques. In particular, two new heuristic search

algorithms are introduced namely, Dynamic Mesh Optimization and another

approach which splits the search process carried out by swarm intelligence

methods.

Wang and Ma (2009) proposed an efficient algorithm called as

Feature Forest algorithm for generation of the reducts of a medical dataset. In

the algorithm, the given dataset is transformed into a forest to form

discernibility string that is the concatenation of some of the features and the

disjunctive normal form is computed to reduct features based on feature

forest. In addition, experimental results on different datasets show that the

40

algorithms can efficiently reduce storage cost and be computationally

inexpensive.

Mishra et al (2009) proposed a novel method for dimensionality

reduction of a feature set by choosing a subset of the original features that

contains most of the essential information, using the same criteria as the Ant

Colony Optimization (ACO) hybridized with Rough Set Theory called as

Rough ACO. This method is successfully applied for choosing the best

feature combinations and then applying the Upper and Lower Approximations

to find the reduced set of features from a gene expression data.

As seen in the literature, the Rough Set theory has higher

performance than the other methods. However, it is not possible in the theory

to say whether two attribute values are similar and to what extent they are the

same; for example, two close values may only differ as a result of noise, but

in the standard RST-based approach they are considered to be as different as

two values of a different order of magnitude. Dataset discretization must take

place before reduction methods based on crisp rough sets can be applied. This

is often still inadequate, however, as the degrees of membership of values to

discretised values are not considered at all. To solve this problem, a number

of variations in this theory have been proposed. Among those, the Swarm

Intelligence (SI) based methods perform better than the others.

3.3 ROUGH SET THEORY

Rough set theory (Pawlak, 1991) is an extension of conventional

set theory that supports approximations in decision making. Rough Set

Attribute Reduction (Chouchoulas and Shen, 2001) provides a filter-based

tool by which knowledge may be extracted from a domain in a concise way,

retaining the information content whilst reducing the amount of knowledge

involved. Central to RSAR is the concept of indiscernibility. Let I = (U,A) be

41

an information system, where U is a non-empty set of finite objects (the

universe) and A is a non-empty finite set of attributes such that a : U Va for

every a A. With any P A, there is an associated equivalence relation

IND(P):

IND(P) = { (x,y), U2 | a P a(x) = a(y) } (3.1)

The partition of U, generated by IND(P) is denoted U/P and can be

calculated as follows:

U/P = { a P : U / IND({a}), (3.2)

where

A B = { X Y : X A, Y B, X Y } (3.3)

If (x; y) IND(P), then x and y are indiscernible by attributes from

P. The equivalence classes of the P-indiscernibility relation are denoted [x]P.

Let X U, the P-lower approximation XP and upper

approximation XP of set X can now be defined as:

XP= { x | [x]P X } (3.4)

XP = { { x | [x]P X } (3.5)

Let P and Q be equivalence relations over U, then the positive,

negative and boundary regions can be defined as:

XP)Q(POSQ/UX

P (3.6)

42

XPU)Q(NEGQ/UX

P (3.7)

XPXP)Q(BNDQ/UXQ/UX

P (3.8)

The positive region contains all objects of U that can be classified

to classes of U/Q using the knowledge in attribute P.

An important issue in data analysis is discovering dependencies

between attributes. Intuitively, a set of attributes Q depends totally on a set of

attributes P, denoted P Q, if all attribute values from Q are uniquely

determined by values of attributes from P. If there exists a functional

dependency between values of Q and P, then Q depends totally on P.

Dependency can be defined in the following way:

For P,Q A, it is said that Q depends on P in a degree k (0 k 1),

denoted P kQ, if

|U||)Q(POS|)Q(k P

P (3.9)

If k = 1, Q depends totally on P, if 0 < k < 1, Q depends partially

(in a degree k) on P and if k = 0 then Q does not depend on P. Based on these

fundamentals, two important reduction methods have been discussed namely,

QuickReduct and Entropy-based method.

3.4 QUICKREDUCT

The reduction of attributes is achieved by comparing equivalence

relations generated by sets of attributes. Attributes are removed so that the

reduced set provides the same quality of classification as the original. A

43

reduct is defined as a subset R of the conditional attribute set C such

that )D()D( CR . A given dataset may have many attribute reduct sets, so

the set R of all reducts is defined as:

})D()D(,CX:X{R CR (3.10)

The intersection of all the sets in R is called the core, the elements

of which are those attributes that cannot be eliminated without introducing

more contradictions to the dataset. In RSAR, a reduct with minimum

cardinality is searched for; in other words an attempt is made to locate a

single element of the minimal reduct set Rmin R :

Rmin = { X : X R, Y R, |X| |Y| } (3.11)

The problem of finding a minimal reduct of an information system

has been the subject of much research (Alpigini et al 2002). The most basic

solution to locating such a reduct is to simply generate all possible reducts

and choose any one with minimal cardinality. Obviously, this is an expensive

solution to the problem and is only practical for very simple datasets. Most of

the time, only one minimal reduct is required. Therefore, all the calculations

involved in discovering the rest are pointless. To improve the performance of

the above method, an element of pruning can be introduced. By noting the

cardinality of any pre- discovered reducts, the current possible reduct can be

ignored if it contains more elements. However, a better approach is needed;

one that will avoid wastage of computational effort. The QuickReduct

algorithm as given below, attempts to calculate a minimal reduct without

exhaustively generating all possible subsets. It starts off with an empty set and

adds in turn, one at a time, those attributes that result in the greatest increase

in dependency, until this produces its maximum possible value for the dataset.

44

The QuickReduct Algorithm

QUICKREDUCT (C,D)

C, the set of all conditional features;

D, the set of decision features.

i. R { }

ii. do

iii. T R

iv. x (C – R)

v. )()(}{ DDif TxR

vi. T R U { x }

vii. R T

viii. until )()( DD CR

ix. return R

Note that an intuitive understanding of QuickReduct implies that,

for a dimensionality of n, (n2+n)/2 evaluations of the dependency function

may be performed for the worst-case dataset. According to the QuickReduct

algorithm, the dependency of each attribute is calculated and the best

candidate chosen. The next best feature is added until the dependency of the

reduct candidate equals the consistency of the dataset (1 if the dataset is

consistent). This process, however, is not guaranteed to find a minimal reduct.

Using the dependency function to discriminate between candidates may lead

the search down a non-minimal path. It is impossible to predict which

combinations of attributes will lead to an optimal reduct based on changes in

45

dependency with the addition or deletion of single attributes. It does result in

a close-to-minimal reduct, but still useful in reducing dataset dimensionality

to a great extent.

3.5 PROPOSED BEE COLONY BASED RSAR (BeeRSAR)

Nature is inspiring researchers to develop models for solving their

problems. Optimization is an instance field in which these models are

frequently developed and applied. Genetic algorithm simulating natural

selection and genetic operators, Particle Swarm Optimization algorithm

simulating flock of birds and school of fishes, Artificial Immune System

simulating the cell masses of immune system, ACO algorithm simulating

foraging behavior of ants and Artificial Bee Colony algorithm simulating

foraging behavior of honeybees are typical examples of nature inspired

optimization algorithms.

Artificial Bee Colony (ABC) algorithm, proposed by Karaboga

(2005) for real parameter optimization, is a recently introduced optimization

algorithm and simulates the foraging behaviour of bee colony for

unconstrained optimization problems (Basturk and Karaboga, 2006, Karaboga

and Basturk, 2007a, 2007b, 2008). For solving constrained optimization

problems, a constraint handling method was incorporated with the algorithm.

In a real bee colony, there are some tasks performed by specialized

individuals. These specialized bees try to maximize the nectar amount stored

in the hive by performing efficient division of labour and self-organization.

The minimal model of swarm-intelligent forage selection in a honey bee

colony, that ABC algorithm adopts, consists of three kinds of bees: employed

bees, onlooker bees and scout bees. Half of the colony comprises employed

bees and the other half includes the onlooker bees. Employed bees are

responsible for exploiting the nectar sources explored before and giving

46

information to the other waiting bees (onlooker bees) in the hive about the

quality of the food source site which they are exploiting. Onlooker bees wait

in the hive and decide a food source to exploit depending on the information

shared by the employed bees. Scouts randomly search the environment in

order to find a new food source depending on an internal motivation or

possible external clues or at random. Main steps of the ABC algorithm

simulating these behaviours are given in the algorithm:

Bee Colony Optimization Algorithm

i) Initialize the food source positions.

ii) Each employed bee produces a new food source in her food

source site and exploits the better source.

iii) Each onlooker bee selects a source depending on the quality of

her solution, produces a new food source in selected source

site and exploits the better source.

iv) Determine the source to be abandoned and allocate its

employed bee as scout for searching new food sources.

v) Memorize the best food source found so far.

vi) Repeat steps 2-5 until the stopping criterion is met.

The above procedure can be implemented for feature reduction. Let

the bees select the feature subsets at random and calculate their fitness and

find the best one at each iteration. This procedure is repeated for a number of

iterations to find the optimal subset. Figure 3.1 demonstrates the steps in the

proposed bee colony based reduct algorithm.

47

Figure 3.1 Bee Colony based Reduct Algorithm

In the first step of the algorithm, the employed bee produces the

feature subset in random. Consider a conditional feature set C containing N

features. Then ‘p’ numbers of bees are chosen as the population size. From

this population half of the bees are considered as employed bees and the

remaining are considered as onlooker bees. For each employed bee, N random

numbers are generated between 1 and N and assigned to them. From these

random numbers, the feature subset is constructed by performing a round

operation and then extracting only the unique numbers from the set.

For example, consider the random numbers {1.45, 1.76, 3.33,

1.01}, where N=4. First, the truncation operation is performed. Then, the set

Set of all conditional anddecision features

Initialize the Population

Calculate the fitness value

Produce the Feature Set

Greedy Selection

Solution for Onlookers

Greedy Selection - Onlookers

Determine the abandonedsolution and scouts

Calculate the Cycle bestFeature

48

is modified as { 1 1 3 1 }. From the above result, the unique numbers alone

are extracted as { 1 3}, representing the feature subset, meaning that the 1st

and 3rd feature values alone are selected. In the second step of the algorithm,

for each employed bee, whose total number equals to the half of the number

of food sources, a new source is produced by:

)xx(xv kjijijijij (3.19)

where ij is a uniformly distributed real random number within the range [-

1,1], k is the index of the solution chosen randomly from the colony (k = int

(rand * N) + 1), j = 1, . . .,D and D is the dimension of the problem. After

producing vi, this new solution is compared to solution xi and the employed

bee exploits the better source. In the third step of the algorithm, an onlooker

bee chooses a food source with the probability and produces a new source in

selected food source site. As for employed bee, the better source is decided to

be exploited.

The indiscernibility relation is calculated for each feature subset as

objective value ( i). This value has to be maximized. From this objective

value the fitness value is calculated for each bee as given in the following

equation:

otherwise)f(abs10fif)f1/(1

fiti

iii

(3.20)

The probability is calculated by means of fitness value using the

following equation.

N

1jj

ii

fit

fitP (3.21)

where iti is the fitness of the solution xi. After all onlookers are distributed to

the sources, sources are checked whether they are to be abandoned. If the

49

number of cycles that a source cannot be improved is greater than a

predetermined limit, the source is considered to be exhausted.

The employed bee associated with the exhausted source becomes a

scout and makes a random search in problem domain by the following

equation.

rand*)xx(xx minj

maxj

minjij

(3.22)

The pseudocode of the proposed method is given as:

Bee Colony based RSAR(BeeRSAR) Algorithm

ROUGHBEE (C,D)

C, the set of all conditional features;

D, the set of decision features.

i) Select the initial parameter values for BCO

ii) Initialize the population (xi)

iii) Calculate the objective and fitness value

iv) Find the optimum feature subset as global.

v) do

a. Produce new feature subset (vi)

b. Apply the greedy selection between xi and vi

c. Calculate the fitness and probability values

d. Produce the solutions for onlookers

e. Apply the greedy selection for onlookers

f. Determine the abandoned solution and scouts

50

g. Calculate the cycle best feature subset

h. Memorize the best optimum feature subset

vi) repeat for maximum number of cycles

The following parameters are used in the proposed method :

The population size (number of bees) : 10

The dimension of the population : N

Lower bound : 1

Upper bound : N

Maximum number of iterations : 1000

The number of runs : 3

The computational complexity of the Bee Colony based RSAR algorithm is

calculated as (n2 m log n), with ‘n’ number of bees and ‘m’ number of

features. This is the complexity for the worst-case situation, where the

algorithm guarantees the near optimal solution.

3.6 PROPOSED BEE COLONY BASED INDEPENDENT

QUICKREDUCT (BeeIQR)

As an extension of the previous approach, a novel Rough set

approach is proposed in this work, to find the reducts and to reduce the

computational complexity and also acquire the most accurate feature subset.

In this proposed method, initially the instances are grouped based on the

decision attribute, then the reduct is found for each classes. The common

attributes from all these reduct sets are grouped to form the core reduct and

the remaining attributes are considered for further reduction. From each set of

51

reducts, the BCO algorithm based RSAR model is applied to receive the final

reduct.

The problem of finding a minimal reduct of an information system

has been the subject of much research. The most basic solution to locating

such a reduct is to simply generate all possible reducts and choose any with

minimal cardinality. Obviously, this is an expensive solution to the problem

and is only practical for very simple datasets. Most of the time only one

minimal reduct is required, so all the calculations involved in discovering the

rest are pointless. To improve the performance of the above method, an

element of pruning can be introduced. By noting the cardinality of any pre-

discovered reducts, the current possible reduct can be ignored if it contains

more elements. However, a better approach is needed - one that will avoid

wasted computational effort. An intuitive understanding of QuickReduct

implies that, for a dimensionality of n, (n2+n)/2 evaluations of the dependency

function may be performed for the worst-case dataset. According to the

QuickReduct algorithm, the dependency of each attribute is calculated and the

best candidate chosen. The next best feature is added until the dependency of

the reduct candidate equals the consistency of the dataset (1 if the dataset is

consistent). This process, however, is not guaranteed to find a minimal reduct.

Using the dependency function to discriminate between candidates may lead

the search down a non-minimal path. It is impossible to predict which

combinations of attributes will lead to an optimal reduct based on changes in

dependency with the addition or deletion of single attributes. It does result in

a close-to-minimal reduct, though which is still useful in greatly reducing

dataset dimensionality. Figure 3.2 shows the steps involved in the proposed

IQRBee algorithm.

52

Figure 3.2 IQRBee Algorithm

Normally all the reduct algorithms start with an empty set and add

in turn, one at a time which requires greater computations. Here, the

computation time is reduced as follows: Initially the feature space is clustered

based on decision attributes and then the reduct is found for each cluster. For

example if there are M number of feature rows, NC number of conditional

attributes and ND number of decision attributes, the feature rows are clustered

based on decision attributes at first. For each cluster the reduct is received as

Ri, where i=1,2,…,ND. From this set of reducts, the most common attributes

are taken out as core reduct (Rc). Then the ABC algorithm is applied to select

the random number of features from each cluster (Ri), to find the optimum

feature subset.

After choosing Rc, with the remaining attributes at each Ri, the

employed bee produces the feature subset in random. Consider a domain

which contains ND number of unique decision values, then the same number

of bees (p) are chosen as the population size. From this population, half of the

bees are considered as employed bees and the remaining are considered as

Set of all conditional anddecision features

Cluster the Domain and findthe reduct for each class

Construct the Core Reductand Reduct Sets

Construct the Populationbased on these Reducts

Apply BCO Algorithm

53

onlooker bees. For each employed bee, a random subset from one reduct set is

assigned. The random sets assigned to all the bees are combined to form the

feature subset. For example, Consider a database containing 10 number of

conditional attributes (c1,c2,…,c10) and 3 number of decision attributes with

500 records. Initially the records are clustered into 3 groups based on the

decision attribute and then the reduct is applied for each group. For example,

consider that the reducts obtained are,

R1 = { c1,c3,c4,c9 }

R2 = { c3,c4,c8 }

R3 = { c3,c4,c6,c7,c10 }.

From these reducts, the common attributes are chosen as the core

reduct. In this example, Rc = {c3,c4}. These attributes are removed from each

reduct.

R1 = { c1,c9 }; R2 = { c8 }; R3 = { c6,c7,c10 }

In the next step, these three bees are employed to construct a reduct

by selecting random subsets from these reducts and combined with the core to

find the optimum one. For example,

Rc + Bee1 = { c1 } + Bee2 = { c8 } + Bee 3 = { c6,c10 }

{ c3,c4,c1,c8,c6,c10 }

This reduct is evaluated using BCO. The pseudocode of the

proposed method is given as:

54

Bee Colony based Independent QuickReduct Algorithm

IQRBEE (C,D)

C, the set of all conditional features; D, the set of decision features.

i) Cluster the domain and Find the reduct for each class

ii) Construct the core reduct and reduct sets

iii) Select the initial parameter values for ABC

iv) Initialize the population (xi)

v) Calculate the objective and fitness value

vi) Find the optimum feature subset as global.

vii) do

a. Produce new feature subset (vi)

b. Apply the greedy selection between xi and vi

c. Calculate the fitness and probability values

d. Produce the solutions for onlookers

e. Apply the greedy selection for onlookers

f. Determine the abandoned solution and scouts

g. Calculate the cycle best feature subset

h. Memorize the best optimum feature subset

viii) Repeat for maximum number of cycles

The following parameters are used in the proposed method :

The population size (number of bees) : p(number of Classes)

The dimension of the population : p×N

Lower bound : 1

Upper bound : N

Maximum number of iterations : 1000

The number of runs : 3

55

Here the reducts are found for clusters based on the decision attributes, thus

the computational complexity can be reduced from (n2 m log n) to (1/nc n2 m

log n), where ‘nc’ is the number of clusters, when the number of cluster

increases it is easy to find the reduct by using the improved BeeRSAR.

3.7 PROPOSED WEIGHTED BEE COLONY BASED RSAR

(WBeeRSAR)

Another limitation is, all the feature selection algorithms start

constructing the feature subset and evaluate the performance. The feature

subset construction is performed without considering the relevance of the

attribute. Here the feature reduction is proposed along with weights of each

attribute. Initially the Information gain (Han and Kamber, 2001) is calculated

for each attribute and maintained as its weight. The indiscernibility relation

multiplied with information gain value is calculated for each feature subset as

objective value ( i). Then the bees are allowed to select the feature subsets at

random and calculate their fitness and find the best one at each iteration.

Further the computational complexity can be reduced from ((1/nc) n2 m log n)

to ((1/g)(1/nc) n2 m log n), where ‘g’ is the information gain of the dataset.

3.8 PERFORMANCE ANALYSIS

The performance of the proposed approaches discussed in this

chapter has been tested with ten different medical datasets (Appendix A),

downloaded from the UCI machine learning data repository. Once the values

are predicted for missing attributes, then the reduced feature set is received

from two novel methods based on Rough set theory; Rough Set Theory hybrid

with Bee Colony Optimization (BeeRSAR), Weighted Bee Colony based

RSAR (WBeeRSAR) and Bee Colony based Independent QuickReduct

(BeeIQR). Table 3.1 shows the reduct results of the methods on the 10

56

different medical datasets discussed. It shows the size of the reduct found for

each method.

Table 3.1 Reducts found for the DatasetsDatasets Features RSAR EBR Ant

RSARGenRSAR

PSO-RSAR

BeeRSAR

WBeeRSAR

Dermatology 34 10 10 8-9 10-11 7-8 7 7Cleveland Heart 13 7 7 6-7 6-7 6-7 6 6HIV 21 13 13 10-11 11-13 9-10 8 8Lung Cancer 56 4 4 4 6-7 4 4 4Wisconsin 09 5 5 5 5 4-5 4 4Echocardiogram 12 8 8 6-7 7-8 6-7 5 5Primary Tumor 17 12 12 10-11 10-12 10-11 10 10

Arrhythmia 279 212 205 162-175

165-180

160-170

155-170

154-169

SPECTF Heart 44 12 11 9-10 9-11 8-10 7-9 7-9Cardiotocography 23 16 16 14-15 14-16 13-15 11 11-12

Table 3.2 shows the reducts received from the proposed method for

each dataset. The underlined attributes in the final reduct are the wavers, that

is, in some iteration they occur in the reduct and in some other iteration they

do not. Figure 3.1 shows the comparison on feature reduction between the

proposed and the existing methods.

Table 3.2 Attributes in the Reduct from IQRBee

IQRBeeDatasets Features Core Reduct Final Reduct No.of

AttributesDermatology 34 { 1 } { 1,3,5,8,15,24,33 } 6-7Cleveland Heart 13 { 1 } {1,5,7,8,9} 5HIV 21 { 1,3,9,12,15 } { 1,3,5,7,9,12,15 } 6-7Lung Cancer 56 { 1,4 } { 1,4,8,14 } 4Wisconsin 09 { 1,8 } { 1,4,6,8 } 4Echocardiogram 12 { 3, 9 } {3, 7, 9, 11 } 5

Primary Tumor 17 { 1, 6, 7, 10 } { 1, 5, 6, 7, 10, 11, 15,16, 17 } 9

Cardiotocography 23 { 2, 4, 7, 8, 20 } { 2, 4, 5, 7, 8, 15, 17,19, 20 } 9

SPECTF Heart 44 { 2, 22, 25, 30 } { 2, 15, 22, 25, 30, 35 } 6Arrhythmia 279 150-160

57

Figure 3.3 Reducts found for each Dataset

As illustrated in the results and in the figure the proposed methods

finds the better reducts than the other approaches. All the bee colony based

methods can reduce the feature set to 1/3 ratio. In Dermatology for the total

set of 34 features has been reduced to 6 which is in the ratio of 1/6, for

Cleveland Heart 13 features are reduced to 5 of 1/3 ratio, for HIV dataset 21

features are 7 as 1/3 ratio, for Lung Cancer dataset 56 features are reduced to

4 as 1/13 ratio, for Wisconsin dataset 9 features are reduced to 4 as 1/2 ratio,

for Echocardiogram dataset 12 features are reduced to 5 as 1/2 ratio, for

Primary Tumor, the 17 features are reduced to 9 features as1/2 ratio and for

Cardiotocography, the 23 features are reduced to 9 as 1/2 ratio, for

Arrhythmia dataset the 279 features are reduced to 150 and for SPECTF

HEART 44 features are reduced to 6 features as 1/7 ratio. A proposed genetic

based kNN classifier named as GkNN classifier is employed to analyze the

classification performance.

58

Table 3.3 Comparison of Reducts based on Run Time Complexity(Seconds)

Runtime (Seconds)Datasets

Features # RSAR EB

RAntRSAR

GenRSAR

PSO-RSAR

BeeRSAR

WBeeRSAR

IQRBee

Dermatology 34 72 85 97 120 99 107 67 57

Cleveland Heart 13 48 61 73 96 75 83 43 33

HIV 21 62 75 87 110 89 97 57 47

Lung Cancer 56 125 138 150 173 152 160 120 110

Wisconsin 09 35 48 60 83 62 70 30 20

Echocardiogram 12 45 58 70 93 72 80 40 30

Primary Tumor 17 52 65 77 100 79 87 47 37

Arrhythmia 279 155 173 201 235 205 220 190 120

SPECTF Heart 44 50 62 74 82 75 85 60 42

Cardiotocography 23 70 83 95 118 97 105 65 55

Table 3.3 compares the run time complexity between each reduct

methods. As shown in the table, the traditional RSAR and EBR methods find

the reduct faster than the swarm based methods Ant, Genetic and PSO. Also

the BeeRSAR method is still slower when compared to RSAR and EBR

methods. But the modified versions of BCO such as WBeeRSAR and IQRBee

overcome this limitation by achieving lower time complexity.

Table 3.4 shows the comparison of classification accuracy of

proposed approach with the existing methods. It is clearly shown that the

reducts from IQRBee reaches greater accuracy than the other methods.

59

Table 3.4 Classification (%) Performance of Reducts

Dataset DermatologyClevelandHeart

HIVLungCancer

Wisconsin

IQRBee 92.36 ± 0.2286.54 ±0.36

86.29 ± 0.1883.03 ±0.18

88.70 ± 0.35

WBeeRSAR90.65 ± 0.3485.23 ±0.81

86.03 ± 0.3282.88 ±0.12

86.44 ± 0.46

BeeRSAR 91.70 ± 0.7484.70 ±0.74

85.70 ± 0.7482.37 ±0.39

84.70 ± 0.74

PSORSAR 88.89 ± 0.5284.89 ±0.52

84.89 ± 0.5279.63 ±0.31

84.89 ± 0.52

AntRSAR 85.32 ± 0.3486.32 ±0.34

85.32 ± 0.3479.53 ±0.37

85.32 ± 0.34

GenRSAR 86.39 ± 0.4285.39 ±0.42

84.39 ± 0.4278.89 ±0.71

86.39 ± 0.42

EBR 78.89 ± 0.2179.71 ±0.17

77.76 ± 0.7977.63 ±0.28

81.12 ± 0.18

RSAR 76.03 ± 0.2777.07 ±0.31

75.07 ± 0.5477.95 ±0.14

78.60 ± 0.26

Dataset EchocardiogramPrimaryTumor

ArrhythmiaSPECTFHEART

Cardiotocography

IQRBee 91.25 ± 0.1185.45 ±0.63

88.93 ± 0.8184.30 ±0.81

89.70 ± 0.53

WBeeRSAR89.54 ± 0.1484.32 ±0.18

87.35 ± 0.2383.88 ±0.21

88.44 ± 0.64

BeeRSAR 88.70 ± 0.4383.07 ±0.47

85.50 ± 0.4782.72 ±0.93

87.70 ± 0.47

PSORSAR 87.19 ± 0.2582.29 ±0.25

84.93 ± 0.2580.23 ±0.13

85.89 ± 0.25

AntRSAR 85.23 ± 0.4582.23 ±0.43

84.24 ± 0.4380.43 ±0.73

85.32 ± 0.43

GenRSAR 85.93 ± 0.2481.93 ±0.24

82.93 ± 0.2479.69 ±0.17

84.39 ± 0.24

EBR 78.78 ± 0.1278.17 ±0.71

78.62 ± 0.9777.53 ±0.82

81.12 ± 0.81

RSAR 76.30 ± 0.7276.70 ±0.13

76.70 ± 0.4576.45 ±0.41

79.60 ± 0.62

60

In the feature subset selection step, the QuickReduct and EBR

methods produced the same reduct every time, unlike GenRSAR, AntRSAR,

PSORSAR and BeeRSAR which found different reducts and sometimes

different reduct cardinalities. On the whole, it appears to be the case that

BeeRSAR and BeeIQR outperform the other methods. Compared to the other

methods, BeeRSAR consumes more time to find the reduct. BeeIQR resolves

this issue by finding the optimum reduct in minimal time. As it is illustrated

in the results, the proposed IQRBee method comes out with a very minimal

reduct than the others which shows its superior performance with the greater

accuracy of 92.3% on Dermatology, 86.5% on Cleveland Heart, 86.3% on

HIV, 83% on Lung Cancer and 88.7% on Wisconsin Breast Cancer Database.

Next to IQRBee, BeeRSAR achieves the greater accuracy. The other

optimization algorithms GenRSAR, AntRSAR and PSORSAR are in the next

level with classification accuracies between 85-89%. The EBR and the

standard rough set algorithms are in the lowest level, with classification

accuracies of 75-80%.

Based on the performance and results of the proposed Bee Colony

Optimization method for feature Selection developed and reported in this

Thesis, a paper entitled ”A Novel Rough Set Reduct Algorithm for Medical

Domain based on Bee Colony Optimization” is published in the Journal of

Computing, Vol.2, Issue 6, June 2010, pp. 49-54.

The proposed Independent Rough Set approach hybrid with Bee

Colony Optimization for feature Selection is analyzed and based on the

conclusion of the analysis, a paper entitled ”An Independent Rough Set

Approach Hybrid with Artificial Bee Colony Algorithm for Dimensionality

Reduction” is published in the American Journal of Applied Sciences, Vol.8,

Issue 3, March 2011, pp. 261-266.

61

Based on the Weighted Bee Colony based Reduct approach

proposed in this Thesis, a paper entitled, ”A Weighted Bee Colony

Optimization hybrid with Rough Set Reduct Algorithm for Feature Selection

in the Medical Domain”, International Journal of Granular Computing,

Rough Sets and Intelligent Systems, Vol. 2, Issue 2, pp. 123 – 140, 2011.

3.9 CONCLUSION

Feature Selection is a main research direction of rough set

application. However, this technique often fails to find better reducts. This

work demonstrates the fundamental concepts of rough set theory and explains

two basic reducts namely QuickReduct and Entropy-Based Reduct. These

methods can produce close to the minimal set, not optimal. The swarm

intelligence methods have been used to guide this method to find the minimal

reduct sets. Here, three different computation intelligence based reducts have

been discussed; GenRSAR, AntRSAR and PSO-RSAR. Though these

methods perform well, there is no consistency since they are dealing with

more random parameters. In this work, a Bee Colony Optimization algorithm

hybrid with Rough set theory has been proposed to find minimal reducts

which does not require random parameter assumption. As an extension, a

novel approach of Rough Set-based Attribute Reduction is proposed for

feature selection to receive a more accurate reduct. Initially the instances are

grouped based on the class attribute. Then the reduct is found for each group.

The intersection operation is performed to select the common attributes from

all these reducts to generate the core reduct. With the remaining attributes, the

BCO algorithm based RSAR model is applied to receive the final reduct.

Experiments are carried out on five different datasets from UCI machine

learning repository. The performance of the reducts is analyzed with GKNN

classifier and compared with six different algorithms. The results show that

the proposed BeeRSAR method achieved a maximum accuracy of 91%, the

62

WBeeRSAR attains a maximum accuracy of 89% and IQRBee achieved a

maximum accuracy of 92% which outpaces the other existing methods.

new chapter 3 proposed improved bee colony...

Documents