engineering project
TRANSCRIPT
DETERMINATION OF ABNORMALITIES IN ULTRASOUND KIDNEY IMAGES
USING ASSOCIATION RULE BASED NEURAL NETWORK
ABIRAMI.P
Reg.No: 91009534002
Of
P.S.N.A COLLEGE OF ENGINEERING AND TECHNOLOGY
DINDIGUL-624 622
A PROJECT REPORT
Submitted to the
FACULTY OF COMPUTER SCIENCE AND ENGINEERING
In partial fulfillment of the requirement for
the award of the degree
Of
MASTER OF ENGINEERING
IN
COMPUTER SCIENCE AND ENGINEERING
ANNA UNIVERSITY: TIRUCHIRAPPALLI-620024
JUNE 2011
ANNA UNIVERSITY:: TIRUCHIRAPALLI
TIRUCHIRAPALLI-620 024
BONAFIDE CERTIFICATE
Certified that this project titled “DETERMINATION OF
ABNORMALITIES IN ULTRASOUND KIDNEY IMAGES USING
ASSOCIATION RULE BASED NEURAL NETWORK” is the bonafide work
of “P.ABIRAMI (91009534002)” who carried out the research under my
supervision. Certified further, that to the best of my knowledge the work reported
herein does not form part of any other project report or dissertation on the basis of
which a degree or award was conferred on an earlier occasion on this or any other
candidate.
SIGNATURE SIGNATURE
Dr. R. SARAVANAN M.E., Ph.D Mr.S.SatheesBabu M.E.,HEAD OF THE DEPARTMENT ASSOCIATE PROFESSOR Dept of Computer Science & Engg, Dept of Computer Science & Engg,PSNA College of Engg & Tech., PSNA College of Engg & Tech., Dindigul. Dindigul.
Submitted for viva-voce examination held on ……………..2011
INTERNAL EXAMINER EXTERNAL EXAMINER
ABSTRACT
The objective of this work is to develop an automatic diagnosis system for
detecting kidney diseases based on association rules (AR) and neural network
(NN). The proposed method distinguishes two categories namely normal and
abnormal (medical renal disease or cortical cyst). For each segmented ultra sound
kidney images 20 features are extracted. AR is used for reducing the number of
features and NN is used for intelligent classification of US kidney images. Apriori
algorithm is used for association mining which reduces the 20 features to12
features. Neural network classifies the kidney images as normal or abnormal. The
AR and NN model is used to obtain fast automatic diagnostic system.
ACKNOWLEDGEMENT
At the outset I wholeheartedly thank the Almighty, who has been my strength in
times of weakness and hope in times of despair, the sole creator of all the creations in
this world and hence this project. I thank my parents who have encouraged me with
good spirit by their incessant prayers to complete this project.
I would like to express my sincere thanks to our management for providing me
various formalities needed for successful completion of my project work.
I express my sincere thanks to our beloved Principal Dr. S. Sakthivel B.E.,
M.Sc (Engg.), MBA., Ph.D., for permitting me to do the project work. I would like to
cordially thank our Head of the Department Dr.R. Saravanan M.E., Ph.D., for his
kind co-operation and advice.
I am indebted to our internal guide Mr.S.SatheesBabu M.E., for the keen
interest shown by her in my project and comforting words of encouragement offered by
her from time to time.
I extend my thanks to Mrs.K.DhanaLakshmi M.E., for her whole support and
successful completion. Finally, I thank all faculty members, non-teaching staff
members, friends and all my well wishers who directly and indirectly support of this
work.
TABLE OF CONTENTS
CHAPTER TITLE PAGE NO
ABSTRACT iii
LIST OF TABLES viii
LIST OF FIGURES ix
LIST OF ABBREVIATIONS x
1 INTRODUCTION 1
1.1 Data mining 1
1.1.1 Data mining architecture 1
1.1.2 Steps in data mining 2
1.1.3 Association rules 3
1.2 Neural Networks 4
1.2.1Advantages 4
1.2.2 Types of Neural Network 4
1.2.2.1 Single Layer Perceptron 4
1.2.2.2 Multi Layer Perceptron 5
1.3 Image processing 6
2 LITERATURE SURVEY 8
3 SYSTEM ANALYSIS 12
3.1 Objective 12
3.2 Existing system 12
3.3 Drawbacks of existing system 12
3.4 Proposed system 13
3.5 Base paper comparative study 13
3.6 Tool analysis 14
4 SYSTEM DESIGN 16
4.1Module Design 16
4.1.1 Feature extraction 16
4.1.1.1 First order gray level feature 16
4.1.1.2 Second order gray level feature 16
4.1.1.3 Power Spectral feature 20
4.1.1.4 Gabor feature 22
4.1.2 Apriori algorithm 23
4.1.3 Classification using MLP-BP 25
5 SYSTEM IMPLEMENTATION 28
5.1 Software Requirements 28
5.2 Hardware Requirements 28
5.3 Implementation of feature extraction 28
5.4 Implementation of Apriori algorithm 29
5.5Classification using MLP-BP 30
6 SYSTEM TESTING 31
6.1 Testing objectives and purpose 31
6.2 System testing 32
6.2.1 Unit testing 32
6.2.2 Integration testing 32
6.2.3 Validation testing 34
7 CONCLUSION 36
REFERENCE 37
APPENDIX-I
LIST OF TABLES
Table No. Description Page No.
4.1 Matrix format of test image 22
4.2 General format of GLCM 23
4.3 GLCM for δ=1 and θ=0° 23
4.4 GLCM for δ=1 and θ=90° 24
LIST OF FIGURES
Figure No. Description Page No.
1.1 Data Mining architecture 3
1.2 Multi layer perceptron 7
4.1 System design 20
4.2 Threshold Logic unit 31
LIST OF ABBREVATIONS
ANN Artificial Neural Networks
NN Neural Network
AR Association Rules
MLP Multi Layer Perceptron
BP Back Propagation
SLP Single Layer Perceptron
TLU Threshold Logic Unit
CHAPTER 1
INTRODUCTION
1.1 DATA MINING
Data mining is the process of extracting patterns from data. Data mining is
seen as an increasingly important tool by modern business to transform data
into business intelligence giving an informational advantage. It is currently used
in a wide range of profiling practices, such as marketing, surveillance, fraud
detection, and scientific discovery.
Data mining techniques can be implemented rapidly on existing software and
hardware platforms to enhance the value of existing information resources, and
can be integrated with new products and systems as they are brought on-line.
Data mining is ready for application in the business community because it is
supported by three technologies that are now sufficiently mature:
Massive data collection
Powerful multiprocessor computers
Data mining algorithms
1.1.1 DATA MINING ARCHITECTURE
Data Mining is the extraction of hidden predictive information from large
database, is a powerful new technology with great potential to help companies
focus on the most important information in their data warehouses. Data mining
software is one of a number of analytical tools for analyzing data. It allows user
to analyze data from many different dimensions or angles, categorize it,
summaries the relationships identified. Technically, data mining is the process
of finding correlations or patterns among dozens of fields in large relational
databases.
Fig 1.1 Data mining architecture
1.1.2 STEPS IN DATA MINING
Data Selection: We may not all the data we have collected in the first step. So
in this step we select only those data which we think useful for data mining.
Data Cleaning: The data we have collected are not clean and may contain
errors, missing values, noisy or inconsistent data. So we need to apply different
techniques to get rid of such anomalies.
Data Transformation: The data even after cleaning are not ready for mining as
we need to transform them into forms appropriate for mining. The techniques
used to accomplish this are smoothing, aggregation, normalization etc.
User interface
Pattern Evaluation
Data mining engine
Database or Data warehouse
Knowledge base
Data Mining: Now we are ready to apply data mining techniques on the data to
discover the interesting patterns. Techniques like clustering and association
analysis are among the many different techniques used for data mining.
Pattern Evaluation and Knowledge Presentation: This step involves
visualization, transformation, removing redundant patterns etc from the patterns
we generated.
Decisions / Use of Discovered Knowledge: This step helps user to make use of
the knowledge acquired to take better decisions.
1.1.3 ASSOCIATION RULES
Association rule mining finds interesting associations and/or correlation
relationships among large set of data items. Association rules show attributes
value conditions that occur frequently together in a given dataset. A typical and
widely-used example of association rule mining is market Basket Analysis. The
various algorithms are as follows
Apriori algorithm
Eclat algorithm
FP-growth algorithm
One -attribute rule
Zero- attribute rule
1.2 NEURAL NETWORKS
An Artificial Neural Network (ANN) is an information processing paradigm
that is inspired by the way biological nervous systems, such as the brain,
process information. The key element of this paradigm is the novel structure of
the information processing system. It is composed of a large number of highly
interconnected processing elements working in unison to solve specific
problems. ANNs, like people, learn by example. An ANN is configured for a
specific application, such as pattern recognition or data classification, through a
learning process.
1.2.1 ADVANTAGES
A neural network can perform tasks that a linear program can not.
When an element of the neural network fails, it can continue without any
problem by their parallel nature.
A neural network learns and does not need to be reprogrammed.
It can be implemented in any application.
It can be implemented without any problem.
1.2.2 TYPES OF NEURAL NETWORK
1.2.2.1 SINGLE LAYER PERCEPTRON
The earliest kind of neural network is a single-layer perceptron network, which
consists of a single layer of output nodes; the inputs are fed directly to the
outputs via a series of weights. In this way it can be considered the simplest
kind of feed-forward network. The sum of the products of the weights and the
inputs is calculated in each node, and if the value is above some threshold
(typically 0) the neuron fires and takes the activated value (typically 1);
otherwise it takes the deactivated value (typically -1).
1.2.2.1.1 ADVANTAGES
Easy to setup and train.
Outputs are weighted sum of inputs: interpretable representation.
1.2.2.1.2 LIMITATIONS
Can only represent a limited set of functions.
Decision boundaries must be hyper plane.
Can only perfectly separate linearly separable data.
1.2.2.2 MULTILAYER PERCEPTRON
A multilayer perceptron (MLP) is a feed forward artificial neural network
model that maps sets of input data onto a set of appropriate output. An MLP
consists of multiple layers of nodes in a directed graph, with each layer fully
connected to the next one. Except for the input nodes, each node is a neuron
with a nonlinear activation function. MLP utilizes a supervised learning
technique called backpropagation for training the network.
Fig 1.2 Multi layer perceptron
1.2.2.3 ADVANTAGES
Adaptive learning: An ability to learn how to do tasks based on the
data given for training or initial experience.
MLP/Neural networks do not make any assumption regarding the
underlying probability density functions or other probabilistic
information about the pattern classes under consideration in
comparison to other probability based models.
They yield the required decision function directly via training.
A two layer backpropagation network with sufficient hidden nodes
has been proven to be a universal approximator.
1.3 IMAGE PROCESSING
Image Processing is any form of signal processing for which the input is an
image, such as a photograph or video frame; the output of image processing
may be either an image or, a set of characteristics or parameters related to the
image. Most image-processing techniques involve treating the image as a two-
dimensional signal and applying standard signal-processing techniques to it.
Image processing is a physical process used to convert an image signal into a
physical image. The image signal can be either digital or analog. The actual
output itself can be an actual physical image or the characteristics of an image.
The most common type of image processing is photography. In this process, an
image is captured using a camera to create a digital or analog image.
CHAPTER 2
LITERATURE SURVEY
[1]An expert system for detection of breast cancer based on association
rules and neural network-Murat Karabatak a Fýrat University, (2009)
The source of the literature survey presents an automatic diagnosis system for
detecting breast cancer based on association rules (AR) and neural network
(NN). In this study, AR is used for reducing the dimension of breast cancer
database and NN is used for intelligent classification. The AR + NN system
performance is compared with NN model. In this study, an automatic diagnosis
system for detecting breast cancer based on association rules (AR) and neural
network (NN) is presented. Feature extraction is the key for pattern recognition
and classification. The best classi.er will perform poorly if the features are not
chosen well. A feature extractor should reduce the feature vector to a lower
dimension, which contains most of the useful information from the original
vector. So, AR is used for reducing the dimension of breast cancer database and
NN is used for intelligent classification. The proposed AR + NN system
performance is compared with NN model. The dimension of input feature space
is reduced from nine to four by using AR. In test stage, 3-fold cross validation
method was applied to the Wisconsin breast cancer database to evaluate the
proposed system performances.
[2] A HYBRID FUZZY-NEURAL SYSTEM FOR COMPUTER-AIDED
DIAGNOSIS OF ULTRASOUND KIDNEY IMAGES USING
PROMINENT FEATURES.[K.Bommanna Raja & M.Madheswaran &
K.Thyagarajah.]
The objective of this work is to develop and implement a computer-aided
decision support system for an automated diagnosis and classification of
ultrasound kidney images. The proposed method distinguishes three kidney
categories namely normal, medical renal diseases and cortical cyst. For the each
pre-processed ultrasound kidney Image, 36 features are extracted. Two types of
decision support systems, optimized multi-layer back propagation network and
hybrid fuzzy-neural system have been developed with these features for
classifying the kidney categories. The performance of the hybrid fuzzy-neural
system is compared with the optimized multi-layer back propagation network in
terms of classification efficiency, training and testing time. The results obtained
show that fuzzy-neural system provides higher classification efficiency with
minimum training and testing time. It has also been found that instead of using
all 36 features, ranking the features enhance classification efficiency. The
outputs of the decision support systems are validated with medical expert to
measure the actual efficiency. The overall discriminating capability of the
systems is accessed with performance evaluation measure, f-score. It has been
observed that the performance of fuzzy neural system is superior compared to
optimized multi-layer back propagation network. Such hybrid fuzzy-neural
system with feature extraction algorithms and pre-processing scheme helps in
developing computer-aided diagnosis system for ultrasound kidney images and
can be used as a secondary observer in clinical decision making.
[3] P. Rajendran, M.Madheswaran. Hybrid medical image classification
using AR mining with decision tree algorithm. JOURNAL OF
COMPUTING, JANUARY 2010.
The main focus of image mining in the proposed method is concerned
with the classification of brain tumor in the CT scan brain images. The major
steps involved in the system are: pre-processing, feature extraction, association
rule mining and hybrid classifier. The pre-processing step has been done using
the median filtering process and edge features have been extracted using canny
edge detection technique. The two image mining approaches with a hybrid
manner have been proposed in this paper. The frequent patterns from the CT
scan images are generated by frequent pattern tree (FP-Tree) algorithm that
mines the association rules. The decision tree method has been used to classify
the medical images for diagnosis. This system enhances the classification
process to be more accurate. The hybrid method improves the efficiency of the
proposed method than the traditional image mining methods. The experimental
result on prediagnosed database of brain images showed 97% sensitivity and
95% accuracy respectively. The physicians can make use of this accurate
decision tree classification phase for classifying the brain images into normal,
benign and malignant for effective medical diagnosis.
[4] Haiwei Pan, Jianzhong Li, and Zhang Wei. Mining Interesting
Association Rules in Medical Images (2007).
Image mining is more than just an extension of data mining to image
domain but an interdisciplinary endeavor. Very few people have systematically
investigated this field. Mining association rules in medical images is an
important part in domain-specific application image mining because there are
several technical aspects which make this problem challenging. In this paper,
we extend the concept of association rule based on object and image in medical
images, and propose two algorithms to discover frequent item-sets and mine
interesting association rules from medical images. We describe how to
incorporate the domain knowledge into the algorithms to enhance the
interestingness. Some interesting results are obtained by our program and we
believe many of the problems we come across are likely to appear in other
domains.
CHAPTER 3
SYSTEM ANALYSIS
3.1 OBJECTIVE
To produce an accurate classification of ultrasound kidney image using neural
network. Apriori algorithm is used for association mining which is used to
select the most relevant features of the given image.
3.2 EXISTING SYSTEM
The techniques such as association rule based neural network is used for the
classification of malignant and benign patterns in digitized mammograms.
Back-propagation neural network for the classification of the suspicious lesions
extracted using a fuzzy rule-based detection system and they obtained higher
accuracy.
A comparative study of a radial basis function (RBF) and a multilayer
perceptron (MLP) based neural networks for the classification of breast
abnormalities using the texture features and concluded that MLP obtained 4%
higher accuracy than RBF.
3.3 DRAWBACKS OF EXISTING SYSTEM
Segmentation and feature extraction technique for reliable classification of
microcalcifications which achieves low classification rate (78%) on DDSM
database.
Accuracy of the system may be high in the training data set and may drop in the
test data.
3.4 PROPOSED SYSTEM
In the proposed system the features are extracted from the kidney image.
There are four feature extraction techniques they are First order gray level
statistical features, Second order gray level statistical features, Power
spectral features and Gabor features.
Different features are extracted to study the gray level intensity distribution
of kidney region. Totally 20 features are extracted from the segmented
kidney images.
Apriori algorithm reduces the number of features to 12. As the features are
reduced from 20 to 12 before passing to the MLP, the classification accuracy
will be increased.
MLP classifies the 3 types of categories
Normal
Medical renal disease
Cortical cyst.
3.5 BASE PAPER COMPARITIVE STUDY
An automatic diagnosis system for detecting breast cancer based on
association rules (AR) and neural network (NN). AR is used for reducing the
dimension of breast cancer database and NN is used for intelligent
classification. The AR + NN system performance is compared with NN model.
An automatic diagnosis system for detecting breast cancer based on association
rules (AR) and neural network (NN) is presented. Feature extraction is the key
for pattern recognition and classification. The best classifier will perform poorly
if the features are not chosen well. A feature extractor should reduce the feature
vector to a lower dimension, which contains most of the useful information
from the original vector. So, AR is used for reducing the dimension of breast
cancer database and NN is used for intelligent classification. The proposed AR
+ NN system performance is compared with NN model. The dimension of input
feature space is reduced from nine to four by using AR. In test stage, 3-fold
cross validation method was applied to the Wisconsin breast cancer database to
evaluate the proposed system performances.
Modifying the existing by using the ultrasound kidney images as the input and
apply feature extraction techniques to extract the features and then apriori
algorithm is used to select the relevant features. MLP-BP classifies the given
image as normal/abnormal.
3.6 TOOL ANALYSIS
Math Works MATLAB 7.9 high-level technical computing language provides
interactive environment for the development of algorithms and a modern tool
for data analysis. MATLAB compared with traditional programming languages
(C / C + +, Java, Pascal, FORTRAN) allows an order to reduce the solution
time for typical tasks and greatly simplifies the development of new algorithms.
MATLAB (matrix laboratory) is a numerical computing environment and
fourth-generation programming language. Developed by Math Works,
MATLAB allows matrix manipulations, plotting of functions and data,
implementation of algorithms, creation of user interfaces, and interfacing with
programs written in other languages, including C, C++, and FORTRAN.
Although MATLAB is intended primarily for numerical computing, an optional
toolbox uses the MuPAD symbolic engine, allowing access to symbolic
computing capabilities. An additional package, Simulink, adds graphical multi-
domain simulation and Model-Based Design for dynamic and embedded
systems.
In 2004, MATLAB had around one million users across industry and academia.
MATLAB users come from various backgrounds of engineering, science, and
economics. MATLAB is widely used in academic and research institutions as
well as industrial enterprises.
CHAPTER 4
SYSTEM DESIGN
4.1 MODULE DESIGN
4.1.1 FEATURE EXTRACTION
The feature extraction techniques are applied on the segmented kidney images
and each technique are explained as follows.
4.1.1.1 FIRST ORDER GRAY LEVEL STATISTICAL
FEATURE
The first order gray level statistical features are estimated for preprocessed ultra
sound kidney images. They features are Mean, Dispersion, Variance,
Energy, Skewness, kurtosis. Variance is the sum of difference between intensity
of the central pixel and its neighborhood.
4.1.1.2 SECOND ORDER GRAY LEVEL STATISTICAL FEATURE
The spatial gray level dependency matrix is one of the most widely used
techniques for statistical texture description. All known visually distinct texture
pairs can be discriminated using this method.
GLCM ALGORITHM
Texture is one of the important characteristics used in identifying objects or
regions of interest in an image. Texture contains important information about
the structural arrangement of surfaces. The textural features based on gray-tone
spatial dependencies have a general applicability in image classification. The
three fundamental pattern elements used in human interpretation of images are
spectral, textural and contextual features. Spectral features describe the average
tonal variations in various bands of the visible and/or infrared portion of an
electromagnetic spectrum. Textural features contain information about the
spatial distribution of tonal variations within a band. The fourteen textural
features proposed by Haralick contain information about image texture
characteristics such as homogeneity, gray-tone linear dependencies, contrast,
number and nature of boundaries present and the complexity of the image.
Contextual features contain information derived from blocks of pictorial data
surrounding (i,j)th entry of the matrices represents the probability of going from
pixel with gray level (i) to another with a gray level( j) under predefined angles.
Usually for statistical texture analysis these angles are defined at 0°, 45°, 90°
and 135°.
Test image
Table 4.1 Matrix format of test image
0 0 1 1
0 0 1 1
0 2 2 2
2 2 3 3
General form of GLCM
Table 4.2 General form of GLCM
Gray
tone
0 1 2 3
0 #(0,0) #(0,1) #(0,2) #(0,3)
1 #(1,0) #(1,1) #(1,2) #(1,3)
2 #(2,0) #(2,1) #(2,2) #(2,3)
3 #(3,0) #(3,1) #(3,2) #(3,3)
4 2 1 0
2 4 0 0
1 0 6 1
0 0 1 2
Table 4.3 GLCM for δ=1 and θ=0°
GLCM for δ=1 and θ=90°
6 0 2 0
0 4 2 0
2 2 2 2
0 0 2 0
Table 4.4 GLCM for δ=1 and θ=90°
Energy:
One approach to generating texture features is to use local kernels to detect
various types of texture. After the convolution with the specified kernel, the
texture energy measure (TEM) is computed by summing the absolute values in
a local neighborhood:
(4.1)
If n kernels are applied, the result is an n-dimensional feature vector at each
pixel of the image being analyzed.
Correlation:
Correlation is a measure of image linearity.
(4.2)
(4.3)
Correlation will be high if an image contains a considerable amount of linear
structure.
Entropy:
Entropy is a measure of information content. It measures the randomness of
intensity distribution.
(4.4)
Such a matrix corresponds to an image in which there are no preferred gray
level pairs for the distance vector d.
Entropy is highest when all entries in P[i,j] are of similar magnitude, and small
when the entries in P[i,j] are unequal.
Homogeneity
A homogeneous image will result in a co-occurrence matrix with a combination
of high and low P[i,j]’s.
(4.5)
Where the range of gray levels is small the P [i, j] will tend to be clustered
around the main diagonal. A heterogeneous image will result in an even spread
of P [i, j]’s.
4.1.1.3 POWER SPECTRAL FEATURE
The spectral features estimated by using fast Fourier transform .It is used for
various analysis, diagnosis and evaluation of biological systems. An important
application of Power Spectral feature is to detect and characterize binary
images.
The periodogram computes the power spectra for the entire input signal:
(4.6)
Where F (signal) is the Fourier transform of the signal, and N is the
normalization factor, which Igor´s DSP Periodogram operation defaults to the
number of samples in the signal. The calculation of the periodogram is
improved by spectral windowing and Igor´s DSP Periodogram operation
supports the same windows as the FFT operation does. The result of the
periodogram is often normalized by a multiplication factor to make the result
satisfy Parseval´s
(4.7)
Theorem which presumes the two-sided frequency-domain FFT result is
computed from the time-domain signal data, and where N is again the number
of time-domain values in the signal.
Normalization of the periodogram result to meet this criterion follows several
different conventions in the literature, (and depends on the average power of
any spectral windowing function and also on whether the periodogram is one-
or two-sided), so the DSP Periodogram operation allows the user to specify the
desired normalization using the /NOR parameter. DSPPeriodogram/NOR=
(numpnts (signal)/2) signal
When using a window function, the amount of power in the signal is reduced. A
compensating multiplier of 1/average (window[i] ^2) should be applied to the
result to compensate for this. For a Hanning window this value is theoretically
0.375. Because the normalization factor is a denominator, you would divide N
by 0.375 to compensate for the Hanning window:
DSP Periodogram/NOR=(numpnts(signal)/(2*0.375)) signal
4.1.1.4 GABOR FEATURE
A Gabor filter can be seen as a sinusoidal plane of a particular frequency and
orientation, modulated by a Gaussian envelop. A 2D Gabor function g(x, y) and
its Fourier transform G(u, v) are defined as
(4.8)
Where j = −1, and W is the frequency of the modulated sinusoid.
(4.9)
A self-similar filter dictionary can be obtained by associating an appropriate
scale factor α and a rotation parameter with the mother wavelet g(x, y). M and
N represent the scales and orientations of the Gabor wavelets.
(4.10)
Where θ = nΠ/K and K is the total number of orientations.
4.1.2ASSOCIATION RULES USING APRIORI ALGORITHM
Association rule find interesting associations or relationships among large set of
data items. Association rule show attributes value conditions that occur
frequently together in a given dataset. They allow capturing all possible rules
that explain the presence of some attributes according to the presence of other
attributes.For example, the rule {onions, potatoes}->{burger} found in the sales
data of a supermarket would indicate that if a customer buys onions and
potatoes together, he or she is likely to also buy burger. Such information can
be used as the basis for decisions about marketing activities such as, e.g.,
promotional pricing or product placements.
4.1.2.1 APRIORI ALGORITHM
The Apriori Algorithm is an influential algorithm for mining frequent itemsets
for Boolean association rules.
Apriori ()
L1= {large 1-itemsets}
K=2
While Lk-1 ≠Ø do
begin
Ck=apriori_gen (Lk-1)
For all transactions t in D do
begin
Ct=subset (Ck,t)
For all candidate c € Ct do
c.count=c.count+1
end
Lk={c € Ck|c.count ≥ minsup}
K=k+1
end
Apriori first scans the transaction databases D in order to count the support of
each item I in I, and determines the set of large 1itemsets. Then, iteration is
performed for each of the computation the set of 2-itemsets, 3-itemsets, and so
on. The kth iteration consists of two steps. The first step is to generate the
candidate set from large item set. The second step is to scan the data base in
order to compute the support count of each candidate sets. The candidate
generation algorithm is given as follows.
Apriori_gen (Lk-1)
Ck=Ø
For all itemsets X € Lk-1 and Y € Lk-1 do
If X1=Y1^ . . .Xk-2 = Yk-2 ^ Xk-1<Yk-1 then begin
C=X1X2…Xk-1 Yk-1
Add C to Ck
end
Different features are extracted to study the gray level intensity distribution of
kidney region. Totally 20 features are extracted from the segmented kidney
images. Apriori algorithm reduces the number of features to 12 and this is given
as the input to the MLP-BP.
4.1.3 CLASSIFICATION USING MLP-BP
A neural network consists of an interconnected group of artificial neurons.
Multilayer perceptron neural network with back propagation is used for
classification. The intelligent classification is realized in this layer by using
Features, which are obtained from AR. The initial weights are Random .The
number of neuron on the layers
Input: 12
Hidden: 2
Output: 1
THRESHOLD
Scientists trying to understand the working of human brain think our brain has
networks of Neurons. Each neuron can conduct (the signal) or not conduct
depending on the input, its weights and a threshold value. In scientific terms,
neurons fire or not depending on the summed strength of their inputs across
synapses of various strengths. Initially, the neural network has random weights
and thus does not do the given task well. As we practice, the brain keeps
adjusting the weights and the threshold values of neurons in the network. After
a while, when the weights are adjusted, we call the neural network, trained and
we can do the task well.
As can be seen from the diagram, a TLU has various inputs X1, X2... Xn. These
inputs are multiplied with the corresponding weights and added together. If this
sum is greater than the threshold value, the output is a high (1). Otherwise, the
result is low (0).
To start with, the weights in the TLU and the threshold value are randomly
decided. Then, the TLU is presented the expected output for a particular input. For
the given input, the output of the TLU is also noted. Usually, because the weights
are random, the TLU responds in error. This error is used to adjust weights so that
the TLU produces the required output for the given input. Similarly, all the
expected values in the training set are used to adjust the weights.
Fig 4.1 Threshold logic unit
Once the TLU is trained, it will respond correctly for all inputs in the training
set. Also, now that the TLU is trained, we can use it to calculate the output for
inputs not in the training set.
The threshold value can be incremented by 0.05 from 0.1 to 1.0. Classification
efficiency can be changed for each setting and best possible threshold value is
assigned. The input image is determined as normal category if the achieved
output value of MLP-BP is less than or equal to 0.35. If the value is greater than
0.35 and less than 0.75 then input image will be Medical renal category. If the
value is greater than 0.75 then it will be cortical cyst category.
4.2 SYSTEM DESIGN
Input image
Fig 4.2 Data flow diagram
CHAPTER 5
First order gray level statistical
Features
Second order gray level
statistical Feature
Power spectral Features
Gabor Features
Feature Extraction
Association Rules generation using Apriori algorithm
Medical renal disease
AbnormalNormal
Cortical cyst
SYSTEM IMPLEMENTATION
5.1 SOFTWARE REQUIREMENTS
Operating System: Windows XP(Platform that supports MATLAB)
Language: MATLAB
Version: MATLAB 7.9
5.2 HARDWARE REQUIREMENTS
Pentium IV – 2.7 GHz
1GB DDR RAM onwards
250 GB Hard Disk
5.3 IMPLEMENTATION OF FEATURE EXTRACTION
The segmented ultra sound kidney images are taken as the input and different
feature extraction techniques are applied to the segmented kidney images. The
four feature techniques First order gray level statistical features, Second order
gray level Statistical features, Power spectral features, Gabor features.
5.4 IMPLEMENTATION OF APRIORI ALORITHM
In data mining, association rule learning is a popular and well researched
method for discovering interesting relations between variables in large
databases. Apriori is the best-known algorithm to mine association rules. It uses
a breadth-first search strategy to counting the support of itemsets and uses a
candidate generation function which exploits the downward closure property of
support.
The implementation of apriori is done only for the training datasets and for the
test data we can use directly the MLP-BP classification without going for
apriori.
5.5 CLASSIFICATION USING MLP-BP
Multilayer perceptron neural network with back propagation is used for
classification. The intelligent classification is realized in this layer by using
Features, which are obtained from AR. Classification efficiency can be changed
for each setting and best possible threshold value is assigned. The input image
is determined as normal category if the achieved output value of MLP-BP is
less than or equal to 0.35. If the value is greater than 0.35 and less than 0.75
then input image will be Medical renal category. If the value is greater than
0.75 then it will be cortical cyst category.
CHAPTER 6
SYSTEM TESTING
6.1 TESTING OBJECTIVES AND PURPOSE
Testing is the major quality measure employed during software
development. After the coding phase, computer programs available are
executed for testing purposes. Testing not only has to uncover errors introduced
during coding, but also locates errors committed during the previous phases.
Thus the aim of testing is to uncover requirements, design or coding errors in
the program.
No system design is ever perfect. Communication problem,
programmer’s negligence or time constraints create error that must be
eliminated before the system is ready for user acceptance testing. A system is
tested for on-line response, volume of transaction stress, recovery from failure,
and non usability. Following system testing is acceptance testing or live running
the system with live data by the actual user.
Testing is the one step in the software engineering process that could
be viewed as destructive rather than constructive. Testing requires that the
developer discard the preconceived notions of the “correctness” of the software
just developed and overcome the conflict of interest that occurs when errors are
uncovered.
6.2 SYSTEM TESTING
This is the phase where the bug in the programs was to be found and
corrected. One of the goals during dynamic testing is produce a test suite, where
the salary calculated with the desired outputs such as reports in this case. This is
applied to ensure that the modification of the program does have any side
effects. This type of testing is called regression testing.
Testing generally removes all the residual bugs and improves the
reliability of the program. The basic types of testing are,
Unit testing
Integration testing
Validation testing
6.2.1 Unit Testing
This is the first level of testing. In here different modules are
tested against the specification produced during the design of the modules. Unit
testing is done for the verification of code during the coding of single program
module in an isolated environment. Unit testing first focuses on the modules
independently of one another to locate errors.
After coding each task is tested and run individually. All
unnecessary coding were removed and it was ensured that all the modules
worked, as the programmer would expect. Logical errors found were corrected.
So by working all the modules independently and verifying the outputs of each
module in the presence of staff was concluded that the program was functioning
as expected.
6.2.2 Integration Testing
Data can be lost access an interface, one module can have as adverse
effort on another sub functions when combined, may not produce the desired
major functions. Integration testing is a systematic testing for constructing the
program structure, while at the same time conducting test to uncover errors
associated within the interface. The objectives are to take unit tested as a whole.
Here correction is difficult because the vast expenses of the entire program
complicate the isolation of causes. Thus in the integration testing step, all the
errors uncovered are corrected for the next testing steps.
Problem
The problem is that the user need not enter the number of input, hidden and
output layers it should be automatically generated form the apriori to the MLP
input for training data set and from the feature extraction for the test data.
Solution:
6.2.3 Validation Testing
This provides the final assurance that the software meets all
functional, behavioral and performance requirements. The software is completely
assembled as a package. Validation succeeds when the software functions in a
manner in which the user expects.
Validation refers to the process of using software in a live environment in
order to find errors. During the course of validating the system, failures may occur
and sometimes the coding has to be changed according to the requirement. Thus
the feedback from the validation phase generally produces changes in the software.
Once the application was made free of all logical and interface errors,
inputting dummy data ensured that the software developed satisfied all the
requirements of the user.
CHAPTER 7
CONCLUSION
The system determines the abnormalities in ultra sound kidney image based
on association rules and neural network. The MLP-BP classifies US kidney
images as Normal/Medial Renal Disease/cortical cyst. The intelligent
classification is realized in this layer by using features, which are obtained from
AR. The combination of association rules and neural network produces higher
accuracy in classification of three kidney categories than the existing system.
This system may be enhanced with the following tasks in the future
o Different types of neural network can be implemented and their performance
can be compared.
o We can increase the number of hidden layer to increase the performance.
REFERENCES
1. An expert system for detection of breast cancer based on association rules
and neural network-Murat Karabatak a Fýrat University, (2009)
2 A Hybrid Fuzzy-Neural System for Computer-Aided Diagnosis of ultrasound
Kidney using Prominent FeaturesA [K.Bommanna Raja & M.Madheswaran &
K.Thyagarajah.]
3 P. Rajendran, M.Madheswaran. Hybrid medical image classification using
AR mining with decision tree algorithm. JOURNAL OF COMPUTING,
JANUARY 2010.
4 Haiwei Pan, Jianzhong Li, and Zhang Wei. Mining Interesting Association
Rules in Medical Images (2007).
5. Maryellen, L., Giger, N. K., Armato, S. G., Computer-aided diagnosis in
medical imaging. IEEE Trans. Med. Imag. 20(12):1205–1208, 2001.
6. Erickson, B. J., Bartholmai, B., Computer-aided detection and diagnosis at
the start of the third millennium. J. Digit. Imaging 15:59–68, 2002.
APPENDIX-I
CONFERENCE DETAILS