engineering project

DETERMINATION OF ABNORMALITIES IN ULTRASOUND KIDNEY IMAGES

USING ASSOCIATION RULE BASED NEURAL NETWORK

ABIRAMI.P

Reg.No: 91009534002

Of

P.S.N.A COLLEGE OF ENGINEERING AND TECHNOLOGY

DINDIGUL-624 622

A PROJECT REPORT

Submitted to the

FACULTY OF COMPUTER SCIENCE AND ENGINEERING

In partial fulfillment of the requirement for

the award of the degree

Of

MASTER OF ENGINEERING

IN

COMPUTER SCIENCE AND ENGINEERING

ANNA UNIVERSITY: TIRUCHIRAPPALLI-620024

JUNE 2011

ANNA UNIVERSITY:: TIRUCHIRAPALLI

TIRUCHIRAPALLI-620 024

BONAFIDE CERTIFICATE

Certified that this project titled “DETERMINATION OF

ABNORMALITIES IN ULTRASOUND KIDNEY IMAGES USING

ASSOCIATION RULE BASED NEURAL NETWORK” is the bonafide work

of “P.ABIRAMI (91009534002)” who carried out the research under my

supervision. Certified further, that to the best of my knowledge the work reported

herein does not form part of any other project report or dissertation on the basis of

which a degree or award was conferred on an earlier occasion on this or any other

candidate.

SIGNATURE SIGNATURE

Dr. R. SARAVANAN M.E., Ph.D Mr.S.SatheesBabu M.E.,HEAD OF THE DEPARTMENT ASSOCIATE PROFESSOR Dept of Computer Science & Engg, Dept of Computer Science & Engg,PSNA College of Engg & Tech., PSNA College of Engg & Tech., Dindigul. Dindigul.

Submitted for viva-voce examination held on ……………..2011

INTERNAL EXAMINER EXTERNAL EXAMINER

ABSTRACT

The objective of this work is to develop an automatic diagnosis system for

detecting kidney diseases based on association rules (AR) and neural network

(NN). The proposed method distinguishes two categories namely normal and

abnormal (medical renal disease or cortical cyst). For each segmented ultra sound

kidney images 20 features are extracted. AR is used for reducing the number of

features and NN is used for intelligent classification of US kidney images. Apriori

algorithm is used for association mining which reduces the 20 features to12

features. Neural network classifies the kidney images as normal or abnormal. The

AR and NN model is used to obtain fast automatic diagnostic system.

ACKNOWLEDGEMENT

At the outset I wholeheartedly thank the Almighty, who has been my strength in

times of weakness and hope in times of despair, the sole creator of all the creations in

this world and hence this project. I thank my parents who have encouraged me with

good spirit by their incessant prayers to complete this project.

I would like to express my sincere thanks to our management for providing me

various formalities needed for successful completion of my project work.

I express my sincere thanks to our beloved Principal Dr. S. Sakthivel B.E.,

M.Sc (Engg.), MBA., Ph.D., for permitting me to do the project work. I would like to

cordially thank our Head of the Department Dr.R. Saravanan M.E., Ph.D., for his

kind co-operation and advice.

I am indebted to our internal guide Mr.S.SatheesBabu M.E., for the keen

interest shown by her in my project and comforting words of encouragement offered by

her from time to time.

I extend my thanks to Mrs.K.DhanaLakshmi M.E., for her whole support and

successful completion. Finally, I thank all faculty members, non-teaching staff

members, friends and all my well wishers who directly and indirectly support of this

work.

TABLE OF CONTENTS

CHAPTER TITLE PAGE NO

ABSTRACT iii

LIST OF TABLES viii

LIST OF FIGURES ix

LIST OF ABBREVIATIONS x

1 INTRODUCTION 1

1.1 Data mining 1

1.1.1 Data mining architecture 1

1.1.2 Steps in data mining 2

1.1.3 Association rules 3

1.2 Neural Networks 4

1.2.1Advantages 4

1.2.2 Types of Neural Network 4

1.2.2.1 Single Layer Perceptron 4

1.2.2.2 Multi Layer Perceptron 5

1.3 Image processing 6

2 LITERATURE SURVEY 8

3 SYSTEM ANALYSIS 12

3.1 Objective 12

3.2 Existing system 12

3.3 Drawbacks of existing system 12

3.4 Proposed system 13

3.5 Base paper comparative study 13

3.6 Tool analysis 14

4 SYSTEM DESIGN 16

4.1Module Design 16

4.1.1 Feature extraction 16

4.1.1.1 First order gray level feature 16

4.1.1.2 Second order gray level feature 16

4.1.1.3 Power Spectral feature 20

4.1.1.4 Gabor feature 22

4.1.2 Apriori algorithm 23

4.1.3 Classification using MLP-BP 25

5 SYSTEM IMPLEMENTATION 28

5.1 Software Requirements 28

5.2 Hardware Requirements 28

5.3 Implementation of feature extraction 28

5.4 Implementation of Apriori algorithm 29

5.5Classification using MLP-BP 30

6 SYSTEM TESTING 31

6.1 Testing objectives and purpose 31

6.2 System testing 32

6.2.1 Unit testing 32

6.2.2 Integration testing 32

6.2.3 Validation testing 34

7 CONCLUSION 36

REFERENCE 37

APPENDIX-I

LIST OF TABLES

Table No. Description Page No.

4.1 Matrix format of test image 22

4.2 General format of GLCM 23

4.3 GLCM for δ=1 and θ=0° 23

4.4 GLCM for δ=1 and θ=90° 24

LIST OF FIGURES

Figure No. Description Page No.

1.1 Data Mining architecture 3

1.2 Multi layer perceptron 7

4.1 System design 20

4.2 Threshold Logic unit 31

LIST OF ABBREVATIONS

ANN Artificial Neural Networks

NN Neural Network

AR Association Rules

MLP Multi Layer Perceptron

BP Back Propagation

SLP Single Layer Perceptron

TLU Threshold Logic Unit

CHAPTER 1

INTRODUCTION

1.1 DATA MINING

Data mining is the process of extracting patterns from data. Data mining is

seen as an increasingly important tool by modern business to transform data

into business intelligence giving an informational advantage. It is currently used

in a wide range of profiling practices, such as marketing, surveillance, fraud

detection, and scientific discovery.

Data mining techniques can be implemented rapidly on existing software and

hardware platforms to enhance the value of existing information resources, and

can be integrated with new products and systems as they are brought on-line.

Data mining is ready for application in the business community because it is

supported by three technologies that are now sufficiently mature:

Massive data collection

Powerful multiprocessor computers

Data mining algorithms

1.1.1 DATA MINING ARCHITECTURE

Data Mining is the extraction of hidden predictive information from large

database, is a powerful new technology with great potential to help companies

focus on the most important information in their data warehouses. Data mining

software is one of a number of analytical tools for analyzing data. It allows user

to analyze data from many different dimensions or angles, categorize it,

summaries the relationships identified. Technically, data mining is the process

of finding correlations or patterns among dozens of fields in large relational

databases.

Fig 1.1 Data mining architecture

1.1.2 STEPS IN DATA MINING

Data Selection: We may not all the data we have collected in the first step. So

in this step we select only those data which we think useful for data mining.

Data Cleaning: The data we have collected are not clean and may contain

errors, missing values, noisy or inconsistent data. So we need to apply different

techniques to get rid of such anomalies.

Data Transformation: The data even after cleaning are not ready for mining as

we need to transform them into forms appropriate for mining. The techniques

used to accomplish this are smoothing, aggregation, normalization etc.

User interface

Pattern Evaluation

Data mining engine

Database or Data warehouse

Knowledge base

Data Mining: Now we are ready to apply data mining techniques on the data to

discover the interesting patterns. Techniques like clustering and association

analysis are among the many different techniques used for data mining.

Pattern Evaluation and Knowledge Presentation: This step involves

visualization, transformation, removing redundant patterns etc from the patterns

we generated.

Decisions / Use of Discovered Knowledge: This step helps user to make use of

the knowledge acquired to take better decisions.

1.1.3 ASSOCIATION RULES

Association rule mining finds interesting associations and/or correlation

relationships among large set of data items. Association rules show attributes

value conditions that occur frequently together in a given dataset. A typical and

widely-used example of association rule mining is market Basket Analysis. The

various algorithms are as follows

Apriori algorithm

Eclat algorithm

FP-growth algorithm

One -attribute rule

Zero- attribute rule

1.2 NEURAL NETWORKS

An Artificial Neural Network (ANN) is an information processing paradigm

that is inspired by the way biological nervous systems, such as the brain,

process information. The key element of this paradigm is the novel structure of

the information processing system. It is composed of a large number of highly

interconnected processing elements working in unison to solve specific

problems. ANNs, like people, learn by example. An ANN is configured for a

specific application, such as pattern recognition or data classification, through a

learning process.

1.2.1 ADVANTAGES

A neural network can perform tasks that a linear program can not.

When an element of the neural network fails, it can continue without any

problem by their parallel nature.

A neural network learns and does not need to be reprogrammed.

It can be implemented in any application.

It can be implemented without any problem.

1.2.2 TYPES OF NEURAL NETWORK

1.2.2.1 SINGLE LAYER PERCEPTRON

The earliest kind of neural network is a single-layer perceptron network, which

consists of a single layer of output nodes; the inputs are fed directly to the

outputs via a series of weights. In this way it can be considered the simplest

kind of feed-forward network. The sum of the products of the weights and the

inputs is calculated in each node, and if the value is above some threshold

(typically 0) the neuron fires and takes the activated value (typically 1);

otherwise it takes the deactivated value (typically -1).

1.2.2.1.1 ADVANTAGES

Easy to setup and train.

Outputs are weighted sum of inputs: interpretable representation.

1.2.2.1.2 LIMITATIONS

Can only represent a limited set of functions.

Decision boundaries must be hyper plane.

Can only perfectly separate linearly separable data.

1.2.2.2 MULTILAYER PERCEPTRON

A multilayer perceptron (MLP) is a feed forward artificial neural network

model that maps sets of input data onto a set of appropriate output. An MLP

consists of multiple layers of nodes in a directed graph, with each layer fully

connected to the next one. Except for the input nodes, each node is a neuron

with a nonlinear activation function. MLP utilizes a supervised learning

technique called backpropagation for training the network.

Fig 1.2 Multi layer perceptron

1.2.2.3 ADVANTAGES

Adaptive learning: An ability to learn how to do tasks based on the

data given for training or initial experience.

MLP/Neural networks do not make any assumption regarding the

underlying probability density functions or other probabilistic

information about the pattern classes under consideration in

comparison to other probability based models.

They yield the required decision function directly via training.

A two layer backpropagation network with sufficient hidden nodes

has been proven to be a universal approximator.

1.3 IMAGE PROCESSING

Image Processing is any form of signal processing for which the input is an

image, such as a photograph or video frame; the output of image processing

may be either an image or, a set of characteristics or parameters related to the

image. Most image-processing techniques involve treating the image as a two-

dimensional signal and applying standard signal-processing techniques to it.

Image processing is a physical process used to convert an image signal into a

physical image. The image signal can be either digital or analog. The actual

output itself can be an actual physical image or the characteristics of an image.

The most common type of image processing is photography. In this process, an

image is captured using a camera to create a digital or analog image.

CHAPTER 2

LITERATURE SURVEY

[1]An expert system for detection of breast cancer based on association

rules and neural network-Murat Karabatak a Fýrat University, (2009)

The source of the literature survey presents an automatic diagnosis system for

detecting breast cancer based on association rules (AR) and neural network

(NN). In this study, AR is used for reducing the dimension of breast cancer

database and NN is used for intelligent classification. The AR + NN system

performance is compared with NN model. In this study, an automatic diagnosis

system for detecting breast cancer based on association rules (AR) and neural

network (NN) is presented. Feature extraction is the key for pattern recognition

and classification. The best classi.er will perform poorly if the features are not

chosen well. A feature extractor should reduce the feature vector to a lower

dimension, which contains most of the useful information from the original

vector. So, AR is used for reducing the dimension of breast cancer database and

NN is used for intelligent classification. The proposed AR + NN system

performance is compared with NN model. The dimension of input feature space

is reduced from nine to four by using AR. In test stage, 3-fold cross validation

method was applied to the Wisconsin breast cancer database to evaluate the

proposed system performances.

[2] A HYBRID FUZZY-NEURAL SYSTEM FOR COMPUTER-AIDED

DIAGNOSIS OF ULTRASOUND KIDNEY IMAGES USING

PROMINENT FEATURES.[K.Bommanna Raja & M.Madheswaran &

K.Thyagarajah.]

The objective of this work is to develop and implement a computer-aided

decision support system for an automated diagnosis and classification of

ultrasound kidney images. The proposed method distinguishes three kidney

categories namely normal, medical renal diseases and cortical cyst. For the each

pre-processed ultrasound kidney Image, 36 features are extracted. Two types of

decision support systems, optimized multi-layer back propagation network and

hybrid fuzzy-neural system have been developed with these features for

classifying the kidney categories. The performance of the hybrid fuzzy-neural

system is compared with the optimized multi-layer back propagation network in

terms of classification efficiency, training and testing time. The results obtained

show that fuzzy-neural system provides higher classification efficiency with

minimum training and testing time. It has also been found that instead of using

all 36 features, ranking the features enhance classification efficiency. The

outputs of the decision support systems are validated with medical expert to

measure the actual efficiency. The overall discriminating capability of the

systems is accessed with performance evaluation measure, f-score. It has been

observed that the performance of fuzzy neural system is superior compared to

optimized multi-layer back propagation network. Such hybrid fuzzy-neural

system with feature extraction algorithms and pre-processing scheme helps in

developing computer-aided diagnosis system for ultrasound kidney images and

can be used as a secondary observer in clinical decision making.

[3] P. Rajendran, M.Madheswaran. Hybrid medical image classification

using AR mining with decision tree algorithm. JOURNAL OF

COMPUTING, JANUARY 2010.

The main focus of image mining in the proposed method is concerned

with the classification of brain tumor in the CT scan brain images. The major

steps involved in the system are: pre-processing, feature extraction, association

rule mining and hybrid classifier. The pre-processing step has been done using

the median filtering process and edge features have been extracted using canny

edge detection technique. The two image mining approaches with a hybrid

manner have been proposed in this paper. The frequent patterns from the CT

scan images are generated by frequent pattern tree (FP-Tree) algorithm that

mines the association rules. The decision tree method has been used to classify

the medical images for diagnosis. This system enhances the classification

process to be more accurate. The hybrid method improves the efficiency of the

proposed method than the traditional image mining methods. The experimental

result on prediagnosed database of brain images showed 97% sensitivity and

95% accuracy respectively. The physicians can make use of this accurate

decision tree classification phase for classifying the brain images into normal,

benign and malignant for effective medical diagnosis.

[4] Haiwei Pan, Jianzhong Li, and Zhang Wei. Mining Interesting

Association Rules in Medical Images (2007).

Image mining is more than just an extension of data mining to image

domain but an interdisciplinary endeavor. Very few people have systematically

investigated this field. Mining association rules in medical images is an

important part in domain-specific application image mining because there are

several technical aspects which make this problem challenging. In this paper,

we extend the concept of association rule based on object and image in medical

images, and propose two algorithms to discover frequent item-sets and mine

interesting association rules from medical images. We describe how to

incorporate the domain knowledge into the algorithms to enhance the

interestingness. Some interesting results are obtained by our program and we

believe many of the problems we come across are likely to appear in other

domains.

CHAPTER 3

SYSTEM ANALYSIS

3.1 OBJECTIVE

To produce an accurate classification of ultrasound kidney image using neural

network. Apriori algorithm is used for association mining which is used to

select the most relevant features of the given image.

3.2 EXISTING SYSTEM

The techniques such as association rule based neural network is used for the

classification of malignant and benign patterns in digitized mammograms.

Back-propagation neural network for the classification of the suspicious lesions

extracted using a fuzzy rule-based detection system and they obtained higher

accuracy.

A comparative study of a radial basis function (RBF) and a multilayer

perceptron (MLP) based neural networks for the classification of breast

abnormalities using the texture features and concluded that MLP obtained 4%

higher accuracy than RBF.

3.3 DRAWBACKS OF EXISTING SYSTEM

Segmentation and feature extraction technique for reliable classification of

microcalcifications which achieves low classification rate (78%) on DDSM

database.

Accuracy of the system may be high in the training data set and may drop in the

test data.

3.4 PROPOSED SYSTEM

In the proposed system the features are extracted from the kidney image.

There are four feature extraction techniques they are First order gray level

statistical features, Second order gray level statistical features, Power

spectral features and Gabor features.

Different features are extracted to study the gray level intensity distribution

of kidney region. Totally 20 features are extracted from the segmented

kidney images.

Apriori algorithm reduces the number of features to 12. As the features are

reduced from 20 to 12 before passing to the MLP, the classification accuracy

will be increased.

MLP classifies the 3 types of categories

Normal

Medical renal disease

Cortical cyst.

3.5 BASE PAPER COMPARITIVE STUDY

An automatic diagnosis system for detecting breast cancer based on

association rules (AR) and neural network (NN). AR is used for reducing the

dimension of breast cancer database and NN is used for intelligent

classification. The AR + NN system performance is compared with NN model.

An automatic diagnosis system for detecting breast cancer based on association

rules (AR) and neural network (NN) is presented. Feature extraction is the key

for pattern recognition and classification. The best classifier will perform poorly

if the features are not chosen well. A feature extractor should reduce the feature

vector to a lower dimension, which contains most of the useful information

from the original vector. So, AR is used for reducing the dimension of breast

cancer database and NN is used for intelligent classification. The proposed AR

+ NN system performance is compared with NN model. The dimension of input

feature space is reduced from nine to four by using AR. In test stage, 3-fold

cross validation method was applied to the Wisconsin breast cancer database to

evaluate the proposed system performances.

Modifying the existing by using the ultrasound kidney images as the input and

apply feature extraction techniques to extract the features and then apriori

algorithm is used to select the relevant features. MLP-BP classifies the given

image as normal/abnormal.

3.6 TOOL ANALYSIS

Math Works MATLAB 7.9 high-level technical computing language provides

interactive environment for the development of algorithms and a modern tool

for data analysis. MATLAB compared with traditional programming languages

(C / C + +, Java, Pascal, FORTRAN) allows an order to reduce the solution

time for typical tasks and greatly simplifies the development of new algorithms.

MATLAB (matrix laboratory) is a numerical computing environment and

fourth-generation programming language. Developed by Math Works,

MATLAB allows matrix manipulations, plotting of functions and data,

implementation of algorithms, creation of user interfaces, and interfacing with

programs written in other languages, including C, C++, and FORTRAN.

Although MATLAB is intended primarily for numerical computing, an optional

toolbox uses the MuPAD symbolic engine, allowing access to symbolic

computing capabilities. An additional package, Simulink, adds graphical multi-

domain simulation and Model-Based Design for dynamic and embedded

systems.

In 2004, MATLAB had around one million users across industry and academia.

MATLAB users come from various backgrounds of engineering, science, and

economics. MATLAB is widely used in academic and research institutions as

well as industrial enterprises.

CHAPTER 4

SYSTEM DESIGN

4.1 MODULE DESIGN

4.1.1 FEATURE EXTRACTION

The feature extraction techniques are applied on the segmented kidney images

and each technique are explained as follows.

4.1.1.1 FIRST ORDER GRAY LEVEL STATISTICAL

FEATURE

The first order gray level statistical features are estimated for preprocessed ultra

sound kidney images. They features are Mean, Dispersion, Variance,

Energy, Skewness, kurtosis. Variance is the sum of difference between intensity

of the central pixel and its neighborhood.

4.1.1.2 SECOND ORDER GRAY LEVEL STATISTICAL FEATURE

The spatial gray level dependency matrix is one of the most widely used

techniques for statistical texture description. All known visually distinct texture

pairs can be discriminated using this method.

GLCM ALGORITHM

Texture is one of the important characteristics used in identifying objects or

regions of interest in an image. Texture contains important information about

the structural arrangement of surfaces. The textural features based on gray-tone

spatial dependencies have a general applicability in image classification. The

three fundamental pattern elements used in human interpretation of images are

spectral, textural and contextual features. Spectral features describe the average

tonal variations in various bands of the visible and/or infrared portion of an

electromagnetic spectrum. Textural features contain information about the

spatial distribution of tonal variations within a band. The fourteen textural

features proposed by Haralick contain information about image texture

characteristics such as homogeneity, gray-tone linear dependencies, contrast,

number and nature of boundaries present and the complexity of the image.

Contextual features contain information derived from blocks of pictorial data

surrounding (i,j)th entry of the matrices represents the probability of going from

pixel with gray level (i) to another with a gray level( j) under predefined angles.

Usually for statistical texture analysis these angles are defined at 0°, 45°, 90°

and 135°.

Test image

Table 4.1 Matrix format of test image

0 0 1 1

0 0 1 1

0 2 2 2

2 2 3 3

General form of GLCM

Table 4.2 General form of GLCM

Gray

tone

0 1 2 3

0 #(0,0) #(0,1) #(0,2) #(0,3)

1 #(1,0) #(1,1) #(1,2) #(1,3)

2 #(2,0) #(2,1) #(2,2) #(2,3)

3 #(3,0) #(3,1) #(3,2) #(3,3)

4 2 1 0

2 4 0 0

1 0 6 1

0 0 1 2

Table 4.3 GLCM for δ=1 and θ=0°

GLCM for δ=1 and θ=90°

6 0 2 0

0 4 2 0

2 2 2 2

0 0 2 0

Table 4.4 GLCM for δ=1 and θ=90°

Energy:

One approach to generating texture features is to use local kernels to detect

various types of texture. After the convolution with the specified kernel, the

texture energy measure (TEM) is computed by summing the absolute values in

a local neighborhood:

(4.1)

If n kernels are applied, the result is an n-dimensional feature vector at each

pixel of the image being analyzed.

Correlation:

Correlation is a measure of image linearity.

(4.2)

(4.3)

Correlation will be high if an image contains a considerable amount of linear

structure.

Entropy:

Entropy is a measure of information content. It measures the randomness of

intensity distribution.

(4.4)

Such a matrix corresponds to an image in which there are no preferred gray

level pairs for the distance vector d.

Entropy is highest when all entries in P[i,j] are of similar magnitude, and small

when the entries in P[i,j] are unequal.

Homogeneity

A homogeneous image will result in a co-occurrence matrix with a combination

of high and low P[i,j]’s.

(4.5)

Where the range of gray levels is small the P [i, j] will tend to be clustered

around the main diagonal. A heterogeneous image will result in an even spread

of P [i, j]’s.

4.1.1.3 POWER SPECTRAL FEATURE

The spectral features estimated by using fast Fourier transform .It is used for

various analysis, diagnosis and evaluation of biological systems. An important

application of Power Spectral feature is to detect and characterize binary

images.

The periodogram computes the power spectra for the entire input signal:

(4.6)

Where F (signal) is the Fourier transform of the signal, and N is the

normalization factor, which Igor´s DSP Periodogram operation defaults to the

number of samples in the signal. The calculation of the periodogram is

improved by spectral windowing and Igor´s DSP Periodogram operation

supports the same windows as the FFT operation does. The result of the

periodogram is often normalized by a multiplication factor to make the result

satisfy Parseval´s

(4.7)

Theorem which presumes the two-sided frequency-domain FFT result is

computed from the time-domain signal data, and where N is again the number

of time-domain values in the signal.

Normalization of the periodogram result to meet this criterion follows several

different conventions in the literature, (and depends on the average power of

any spectral windowing function and also on whether the periodogram is one-

or two-sided), so the DSP Periodogram operation allows the user to specify the

desired normalization using the /NOR parameter. DSPPeriodogram/NOR=

(numpnts (signal)/2) signal

When using a window function, the amount of power in the signal is reduced. A

compensating multiplier of 1/average (window[i] ^2) should be applied to the

result to compensate for this. For a Hanning window this value is theoretically

0.375. Because the normalization factor is a denominator, you would divide N

by 0.375 to compensate for the Hanning window:

DSP Periodogram/NOR=(numpnts(signal)/(2*0.375)) signal

4.1.1.4 GABOR FEATURE

A Gabor filter can be seen as a sinusoidal plane of a particular frequency and

orientation, modulated by a Gaussian envelop. A 2D Gabor function g(x, y) and

its Fourier transform G(u, v) are defined as

(4.8)

Where j = −1, and W is the frequency of the modulated sinusoid.

(4.9)

A self-similar filter dictionary can be obtained by associating an appropriate

scale factor α and a rotation parameter with the mother wavelet g(x, y). M and

N represent the scales and orientations of the Gabor wavelets.

(4.10)

Where θ = nΠ/K and K is the total number of orientations.

4.1.2ASSOCIATION RULES USING APRIORI ALGORITHM

Association rule find interesting associations or relationships among large set of

data items. Association rule show attributes value conditions that occur

frequently together in a given dataset. They allow capturing all possible rules

that explain the presence of some attributes according to the presence of other

attributes.For example, the rule {onions, potatoes}->{burger} found in the sales

data of a supermarket would indicate that if a customer buys onions and

potatoes together, he or she is likely to also buy burger. Such information can

be used as the basis for decisions about marketing activities such as, e.g.,

promotional pricing or product placements.

4.1.2.1 APRIORI ALGORITHM

The Apriori Algorithm is an influential algorithm for mining frequent itemsets

for Boolean association rules.

Apriori ()

L1= {large 1-itemsets}

K=2

While Lk-1 ≠Ø do

begin

Ck=apriori_gen (Lk-1)

For all transactions t in D do

begin

Ct=subset (Ck,t)

For all candidate c € Ct do

c.count=c.count+1

end

Lk={c € Ck|c.count ≥ minsup}

K=k+1

end

Apriori first scans the transaction databases D in order to count the support of

each item I in I, and determines the set of large 1itemsets. Then, iteration is

performed for each of the computation the set of 2-itemsets, 3-itemsets, and so

on. The kth iteration consists of two steps. The first step is to generate the

candidate set from large item set. The second step is to scan the data base in

order to compute the support count of each candidate sets. The candidate

generation algorithm is given as follows.

Apriori_gen (Lk-1)

Ck=Ø

For all itemsets X € Lk-1 and Y € Lk-1 do

If X1=Y1^ . . .Xk-2 = Yk-2 ^ Xk-1<Yk-1 then begin

C=X1X2…Xk-1 Yk-1

Add C to Ck

end

Different features are extracted to study the gray level intensity distribution of

kidney region. Totally 20 features are extracted from the segmented kidney

images. Apriori algorithm reduces the number of features to 12 and this is given

as the input to the MLP-BP.

4.1.3 CLASSIFICATION USING MLP-BP

A neural network consists of an interconnected group of artificial neurons.

Multilayer perceptron neural network with back propagation is used for

classification. The intelligent classification is realized in this layer by using

Features, which are obtained from AR. The initial weights are Random .The

number of neuron on the layers

Input: 12

Hidden: 2

Output: 1

THRESHOLD

Scientists trying to understand the working of human brain think our brain has

networks of Neurons. Each neuron can conduct (the signal) or not conduct

depending on the input, its weights and a threshold value. In scientific terms,

neurons fire or not depending on the summed strength of their inputs across

synapses of various strengths. Initially, the neural network has random weights

and thus does not do the given task well. As we practice, the brain keeps

adjusting the weights and the threshold values of neurons in the network. After

a while, when the weights are adjusted, we call the neural network, trained and

we can do the task well.

As can be seen from the diagram, a TLU has various inputs X1, X2... Xn. These

inputs are multiplied with the corresponding weights and added together. If this

sum is greater than the threshold value, the output is a high (1). Otherwise, the

result is low (0).

To start with, the weights in the TLU and the threshold value are randomly

decided. Then, the TLU is presented the expected output for a particular input. For

the given input, the output of the TLU is also noted. Usually, because the weights

are random, the TLU responds in error. This error is used to adjust weights so that

the TLU produces the required output for the given input. Similarly, all the

expected values in the training set are used to adjust the weights.

Fig 4.1 Threshold logic unit

Once the TLU is trained, it will respond correctly for all inputs in the training

set. Also, now that the TLU is trained, we can use it to calculate the output for

inputs not in the training set.

The threshold value can be incremented by 0.05 from 0.1 to 1.0. Classification

efficiency can be changed for each setting and best possible threshold value is

assigned. The input image is determined as normal category if the achieved

output value of MLP-BP is less than or equal to 0.35. If the value is greater than

0.35 and less than 0.75 then input image will be Medical renal category. If the

value is greater than 0.75 then it will be cortical cyst category.

4.2 SYSTEM DESIGN

Input image

Fig 4.2 Data flow diagram

CHAPTER 5

First order gray level statistical

Features

Second order gray level

statistical Feature

Power spectral Features

Gabor Features

Feature Extraction

Association Rules generation using Apriori algorithm

Medical renal disease

AbnormalNormal

Cortical cyst

SYSTEM IMPLEMENTATION

5.1 SOFTWARE REQUIREMENTS

Operating System: Windows XP(Platform that supports MATLAB)

Language: MATLAB

Version: MATLAB 7.9

5.2 HARDWARE REQUIREMENTS

Pentium IV – 2.7 GHz

1GB DDR RAM onwards

250 GB Hard Disk

5.3 IMPLEMENTATION OF FEATURE EXTRACTION

The segmented ultra sound kidney images are taken as the input and different

feature extraction techniques are applied to the segmented kidney images. The

four feature techniques First order gray level statistical features, Second order

gray level Statistical features, Power spectral features, Gabor features.

5.4 IMPLEMENTATION OF APRIORI ALORITHM

In data mining, association rule learning is a popular and well researched

method for discovering interesting relations between variables in large

databases. Apriori is the best-known algorithm to mine association rules. It uses

a breadth-first search strategy to counting the support of itemsets and uses a

candidate generation function which exploits the downward closure property of

support.

The implementation of apriori is done only for the training datasets and for the

test data we can use directly the MLP-BP classification without going for

apriori.

5.5 CLASSIFICATION USING MLP-BP

Multilayer perceptron neural network with back propagation is used for

classification. The intelligent classification is realized in this layer by using

Features, which are obtained from AR. Classification efficiency can be changed

for each setting and best possible threshold value is assigned. The input image

is determined as normal category if the achieved output value of MLP-BP is

less than or equal to 0.35. If the value is greater than 0.35 and less than 0.75

then input image will be Medical renal category. If the value is greater than

0.75 then it will be cortical cyst category.

CHAPTER 6

SYSTEM TESTING

6.1 TESTING OBJECTIVES AND PURPOSE

Testing is the major quality measure employed during software

development. After the coding phase, computer programs available are

executed for testing purposes. Testing not only has to uncover errors introduced

during coding, but also locates errors committed during the previous phases.

Thus the aim of testing is to uncover requirements, design or coding errors in

the program.

No system design is ever perfect. Communication problem,

programmer’s negligence or time constraints create error that must be

eliminated before the system is ready for user acceptance testing. A system is

tested for on-line response, volume of transaction stress, recovery from failure,

and non usability. Following system testing is acceptance testing or live running

the system with live data by the actual user.

Testing is the one step in the software engineering process that could

be viewed as destructive rather than constructive. Testing requires that the

developer discard the preconceived notions of the “correctness” of the software

just developed and overcome the conflict of interest that occurs when errors are

uncovered.

6.2 SYSTEM TESTING

This is the phase where the bug in the programs was to be found and

corrected. One of the goals during dynamic testing is produce a test suite, where

the salary calculated with the desired outputs such as reports in this case. This is

applied to ensure that the modification of the program does have any side

effects. This type of testing is called regression testing.

Testing generally removes all the residual bugs and improves the

reliability of the program. The basic types of testing are,

Unit testing

Integration testing

Validation testing

6.2.1 Unit Testing

This is the first level of testing. In here different modules are

tested against the specification produced during the design of the modules. Unit

testing is done for the verification of code during the coding of single program

module in an isolated environment. Unit testing first focuses on the modules

independently of one another to locate errors.

After coding each task is tested and run individually. All

unnecessary coding were removed and it was ensured that all the modules

worked, as the programmer would expect. Logical errors found were corrected.

So by working all the modules independently and verifying the outputs of each

module in the presence of staff was concluded that the program was functioning

as expected.

6.2.2 Integration Testing

Data can be lost access an interface, one module can have as adverse

effort on another sub functions when combined, may not produce the desired

major functions. Integration testing is a systematic testing for constructing the

program structure, while at the same time conducting test to uncover errors

associated within the interface. The objectives are to take unit tested as a whole.

Here correction is difficult because the vast expenses of the entire program

complicate the isolation of causes. Thus in the integration testing step, all the

errors uncovered are corrected for the next testing steps.

Problem

The problem is that the user need not enter the number of input, hidden and

output layers it should be automatically generated form the apriori to the MLP

input for training data set and from the feature extraction for the test data.

Solution:

6.2.3 Validation Testing

This provides the final assurance that the software meets all

functional, behavioral and performance requirements. The software is completely

assembled as a package. Validation succeeds when the software functions in a

manner in which the user expects.

Validation refers to the process of using software in a live environment in

order to find errors. During the course of validating the system, failures may occur

and sometimes the coding has to be changed according to the requirement. Thus

the feedback from the validation phase generally produces changes in the software.

Once the application was made free of all logical and interface errors,

inputting dummy data ensured that the software developed satisfied all the

requirements of the user.

CHAPTER 7

CONCLUSION

The system determines the abnormalities in ultra sound kidney image based

on association rules and neural network. The MLP-BP classifies US kidney

images as Normal/Medial Renal Disease/cortical cyst. The intelligent

classification is realized in this layer by using features, which are obtained from

AR. The combination of association rules and neural network produces higher

accuracy in classification of three kidney categories than the existing system.

This system may be enhanced with the following tasks in the future

o Different types of neural network can be implemented and their performance

can be compared.

o We can increase the number of hidden layer to increase the performance.

REFERENCES

1. An expert system for detection of breast cancer based on association rules

and neural network-Murat Karabatak a Fýrat University, (2009)

2 A Hybrid Fuzzy-Neural System for Computer-Aided Diagnosis of ultrasound

Kidney using Prominent FeaturesA [K.Bommanna Raja & M.Madheswaran &

K.Thyagarajah.]

3 P. Rajendran, M.Madheswaran. Hybrid medical image classification using

AR mining with decision tree algorithm. JOURNAL OF COMPUTING,

JANUARY 2010.

4 Haiwei Pan, Jianzhong Li, and Zhang Wei. Mining Interesting Association

Rules in Medical Images (2007).

5. Maryellen, L., Giger, N. K., Armato, S. G., Computer-aided diagnosis in

medical imaging. IEEE Trans. Med. Imag. 20(12):1205–1208, 2001.

6. Erickson, B. J., Bartholmai, B., Computer-aided detection and diagnosis at

the start of the third millennium. J. Digit. Imaging 15:59–68, 2002.

APPENDIX-I

CONFERENCE DETAILS

engineering project

Documents