formulating offline nondestructive validation of...
TRANSCRIPT
FORMULATING OFFLINE NONDESTRUCTIVE
VALIDATION OF SOLID DRUG SURFACE MORPHOLOGY
USING MICROSCOPIC MULTISPECTRAL HIGH
RESOLUTION IMAGING
___________________________________________________________________________
FAHIMA TAHIR
___________________________________________________________________________
DEPARTMENT OF COMPUTER SCIENCE
LAHORE COLLEGE FOR WOMEN UNIVERSITY, LAHORE-
PAKISTAN
2015
FORMULATING OFFLINE NONDESTRUCTIVE
VALIDATION OF SOLID DRUG SURFACE
MORPHOLOGY USING MICROSCOPIC
MULTISPECTRAL HIGH RESOLUTION IMAGING ____________________________________________________________________
A THESIS SUBMITTED TO LAHORE COLLEGE FOR WOMEN
UNIVERSITY IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR
THE DEGREE OF DOCTOR OF PHILOSOPHY IN COMPUTER SCIENCE
By
FAHIMA TAHIR
03-B/LCWU-5703
_____________________________________________________________________
DEPARTMENT OF COMPUTER SCIENCE
LAHORE COLLEGE FOR WOMEN UNIVERSITY, LAHORE-
PAKISTAN
2015
CERTIFICATE
This is to certify that the research work described in this thesis submitted by
Ms. Fahima Tahir to Department of Computer Science, Lahore College for Women
University has been carried out under my/our direct supervision. I have personally gone
through the raw data and certify the correctness and authenticity of all results reported
herein. I further certify that thesis data have not been used in part or full, in a manuscript
already submitted or in the process of submission in Partial/complete fulfillment of the
award of any other degree from any other institution or home or abroad. We also
certified that the enclosed manuscript, has been to paid under my supervision and I
endorse its evaluation for the award of Ph.D. degree through the official procedure of
University.
________________
Dr. Muhammad Abuzar Fahiem
Supervisor
Date:
Verified By
________________
Name
Chairperson
Department of _______
Stamp
_________________
Controller of Examination
Stamp
Date: ___________
Dedicated to my parents
ACKNOWLEDGMENTS
All gratitude is to Almighty Allah and peace be upon the Holy Prophet Muhammad
(SAW). I heartily want to express my gratitude to everyone who has supported and
helped me in order to complete my Ph.D. dissertation
I am deeply grateful to my supervisor Dr. Muhammad Abuzar Fahiem for extending
his dedicated support and advice throughout the research. He not only continuously
guided me with his precious knowledge but also remained a source of motivation and
encouragement for me.
I am especially thankful to my institute Lahore College for Women University for
facilitating me during this research.
I would also like to extend thanks to all of my friends and colleagues who were the
honest critics and helped me with their fruitful comments and suggestions.
Last but not the least; I am most indebted to my family. The prayers of my parents
remain constantly with me, which lead me towards the successful completion of this
research. I am sincerely thankful for the cooperation of my siblings, who have always
been there for me.
CONTENTS
List of Tables i
List of Figures ii
List of Equations iv
List of Abbreviations v
Abstract ix
Chapter no. 1 Introduction 1
1.1 Drug Dosage Forms 1
1.1.1 Solid Dosage Forms 2
1.1.1.1 Tablets 2
1.1.1.2 Capsules 3
1.1.2 Liquid Dosage Forms 4
1.1.3 Semisolid Dosage Forms 5
1.2 Substandard Medicines 5
1.2.1 Counterfeits 6
1.2.2 Expired 6
1.2.3 Environment Affected 7
Chapter no. 2 Review of Literature 10
2.1 Data Acquisition 10
2.1.1 Chromatographic Techniques 11
2.1.1.1 Thin Layer Chromatography 12
2.1.1.2 HPLC 12
2.1.2 Spectroscopic Techniques 12
2.1.2.1 Mass Spectrometry 13
2.1.2.2 Nmr Spectroscopy 14
2.1.2.3 X-Ray Diffraction 14
2.1.2.4 Scanning Electron Microscopy 15
2.1.2.5 Vibrational Spectroscopic Techniques 16
2.1.3 Imaging Techniques 24
2.1.4 Spectral Imaging Techniques 26
2.1.4.1 Hyperspectral Imaging 27
2.1.4.2 Multispectral Imaging 32
2.2 Preprocessing Techniques 32
2.2.1 Smoothing 33
2.2.2 Normalization 34
2.2.3 Standard Normal Variate Correction 34
2.2.4 Multiplicative Scatter Correction 35
2.2.5 Savitzky-Golay Derivative Conversion 36
2.2.6 Image Enhancement 36
2.3 Feature Extraction Techniques 37
2.3.1 Low Level Feature Extraction 38
2.3.2 High Level Feature Extraction 38
2.3.3 Textural Feature Extraction 39
2.4 Feature Reduction Techniques 41
2.4.1 Information Gain 42
2.4.2 Symmetrical Uncertainty 43
2.4.3 One-R 44
2.4.4 Chi-Square 44
2.4.5 Gain Ratio 44
2.4.6 Relief-F 45
2.4.7 Principal Component Analysis 45
2.5 Classification Techniques 45
2.5.1 Pearson’s Correlation Coefficient 46
2.5.2 Euclidean Distance 46
2.5.3 K-Mean Clustering 47
2.5.4 Fuzzy Clustering 47
2.5.5 Partial Least Square Discriminant Analysis 48
2.5.6 Artificial Neural Networks 48
2.5.7 Naïve Bayes 49
2.5.8 K-Nearest Neighbor 50
2.5.9 Support Vector Machine 50
Chapter no. 3 Proposed Approach – Microscopic Imaging 64
3.1 Microscopic Imaging 64
3.1.1 Image Acquisition 65
3.1.2 Preprocessing 67
3.1.2.1 Grayscale Conversion 68
3.1.2.2 Contrast Enhancement 68
3.1.3 Feature Extraction 69
3.1.3.1 Gray-Level Co-Occurrence Matrix 70
3.1.3.2 Histogram Features 70
3.1.3.3 Run Length Matrix 70
3.1.3.4 Autoregressive Model 71
3.1.3.5 Wavelet Transformations 71
3.1.4 Feature Reduction 71
3.1.5 Classification 72
Chapter no. 4 Proposed Approach – Multispectral Analysis 75
4.1 Multispectral Analysis 75
4.1.1 Spectrum Acquisition 75
4.1.2 Preprocessing 76
4.1.3 Feature Extraction 78
4.1.3.1 Wavelet Transformation 79
4.1.4 Classification 79
Chapter no. 5 Analysis and Discussion 81
5.1 Microscopic Imaging 81
5.2 Multispectral Analysis 105
5.3 Hybrid 120
Chapter no. 6 Conclusion and Future Recommendations 130
References 132
Plagiarism Report xi
List of Publications and Reprints xii
i
LIST OF TABLES
Table No. Title Page No.
2.1 Comparison between various quality assessment
techniques for drugs
54
2.2 Comparison between different researches for the analysis
of medicines
58
3.1 Dataset description 67
3.2 List Of Top 15 Selected Features From CS, GR And RF 73
5.1 LOO results for all individual and combined datasets
using 281 features
83
5.2 LOO results for all individual and combined datasets
using top 15 features
86
5.3 LOO results for all individual and combined datasets
using top 2 features
91
5.4 Accuracies for test datasets using 281 features 94
5.5 Accuracies for test datasets using top 15 selected features 97
5.6 Accuracies for test datasets using top 2 features 102
5.7 Results achieved by experiment I 108
5.8 Results achieved by experiment II 115
5.9 Results Against Combined Datasets Using Experiment
III
118
5.10 Test accuracies using hybrid approach for individual
datasets
123
5.11 Test accuracies using hybrid approach for combined
datasets
125
5.12 Comparison of highest accuracies achieved using MI,
MA and hybrid approaches
127
ii
LIST OF FIGURES
Figure No. Title Page No.
2.1 The electromagnetic spectrum 10
2.2 Two level decomposition for the computation of WC 41
3.1 Basic flow of the proposed MI approach 64
3.2 Detailed diagram of the proposed MI approach 66
3.3 The sample images of the DSPPs and NSPPs in each
dataset (a) images contained in dataset H1 (b) images
contained in dataset H2 (c) images contained in dataset
H3 (d) images contained in dataset T1 (e) images
contained in dataset T2 (f) images contained in dataset T3
(g) images contained in dataset M1 (h) images contained
in dataset M2 (i) images contained in dataset M3
68
4.1 Flow of the proposed MA approach 75
4.2 Multispectral data for NSPP and DSPP within UV
wavelength. (a) Spectra of NSPP and humidity affected
DSPP datasets H1, H2 and H3. (b) Spectra of NSPP and
temperature affected DSPP datasets T1, T2 and T3. (c)
Spectra of NSPP and moisture affected DSPP datasets
M1, M2 and M3.
77
4.3 Multispectral data for NSPP and DSPP within Visible
wavelength. (a) Spectra of NSPP and humidity affected
DSPP datasets H1, H2 and H3. (b) Spectra of NSPP and
temperature affected DSPP datasets T1, T2 and T3. (c)
Spectra of NSPP and moisture affected DSPP datasets
M1, M2 and M3.
78
4.4 Multispectral data for NSPP and DSPP within IR
wavelength. (a) Spectra of NSPP and humidity affected
DSPP datasets H1, H2 and H3. (b) Spectra of NSPP and
temperature affected DSPP datasets T1, T2 and T3. (c)
Spectra of NSPP and moisture affected DSPP datasets
M1, M2 and M3.
80
5.1 LOO results against all individual and combined datasets
using 281 features
84
iii
5.2 LOO results against all individual and combined datasets
using top 15 features
85
5.3 LOO results against all individual and combined datasets
using top 2 features
90
5.4 HV results against all individual and combined datasets
using all 281 features
93
5.5 HV results against all individual and combined datasets
using top 15 features
101
5.6 HV results against all individual and combined datasets
using top 2 features
104
5.7 Comparison of accuracies against UV, IR and Visible for
experiment I
106
5.8 Comparison of sensitivity against UV, IR and Visible for
experiment I
107
5.9 Comparison of accuracies against UV, IR and Visible for
experiment II
112
5.10 Comparison of sensitivity against UV, IR and Visible for
experiment II
113
5.11 Comparison of accuracies against UV, IR and Visible for
experiment III
114
5.12 Basic flow of the hybrid approach 121
5.13 HV results using hybrid approach for individual datasets 124
5.14 Test accuracies using hybrid approach for combined
datasets
126
5.15 Comparison of accuracies achieved using all three
approaches
129
iv
LIST OF EQUATIONS
Equation No Title Page No.
2.1 Formula for smoothing a signal 34
2.2 Formula for SNVC 35
2.3 Formula for the estimation of correlation coefficients
of MSC
35
2.4 Formula for the corrections in MSC 36
2.5 Entropy calculation when y is independent variable 42
2.6 Entropy calculation when y is dependent on x 43
2.7 Formula for IG 43
2.8 Symmetrical uncertainty formula 43
2.9 Formula for chi-square 44
2.10 Formula for gain ratio 45
2.11 K-mean clustering 47
2.12 Centroid calculation for FC 48
3.1 Contrast enhancement formula 69
3.2 Formula for dataset representation 70
3.3 Formula for EC 74
v
LIST OF ABBREVIATIONS
ANN Artificial Neural Networks
APIs Active Pharmaceutical Ingredients
AR Autoregressive
BA Bioavailability
CA Cluster Analysis
CI Chemical Imaging
CLSM Confocal Laser Scanning Microscopy
CS Chi-Square
DESI Desorption Electrospray Ionization
DSPP Defective Solid Pharmaceutical Product
ED Euclidean Distance
EM Electromagnetic
ER Electromagnetic Radiation
ES Electromagnetic Spectrum
FC Fuzzy Clustering
FD First Derivative
FDA Food and Drug Administration
FIR Far Infrared
GLCM Gray-level Co-occurrence Matrix
GR Gain Ratio
HCA Hierarchical Cluster Analysis
vi
HIS Hyperspectral Imaging
HPLC High Performance Liquid Chromatography
HV Holdout Validation
IE Image Enhancement
IG Information Gain
IP Image Processing
IR Infrared
KNN K-Nearest Neighbor
LDA Linear Discriminant Analysis
LOO Leave-one-out
LSLS Local Straight Line Screening
MI Microscopic Imaging
MIR Middle Infrared
ML Machine Learning
MLR Multiple Linear Regression
MN Max-min Normalization
MS Mass Spectrometry
MSC Multiplicative Scatter Correction
MSI Multispectral Imaging
NB Naïve Bayes
NDRA National Drug Regularity Authority
NIR Near Infrared
vii
NIRCI Near Infrared Chemical Imaging
NIRS Near Infrared Spectroscopy
NMR Nuclear Magnetic Resonance
NPRA National Pharmaceutical Regulatory Authority
NSPP Non-defective Solid Pharmaceutical Product
Oil-in-water O/W
OOS Out of Specification
PCA Principal Component Analysis
PLS Partial Least Square
PLS-DA Partial Least Square Discriminant Analysis
QDA Quadratic Discriminant Analysis
RF Relief-F
RGB Red-Green-Blue
RLM Run Length Matrix
RS Raman Spectroscopy
SD Second Derivative
SDC Savitzky-Golay Derivative Conversion
SEM Scanning Electron Microscopy
SFFC Spurious/Falsely-labeled/Falsified/Counterfeit
SIMCA Soft Independent Modelling of Class Analogy
SNVC Standard Normal Variate Correction
viii
SP Signal Processing
SPP Solid Pharmaceutical Product
SS Semisolid
SU Symmetrical Uncertainty
SVM Support Vector Machine
TLC Thin Layer Chromatography
TOF Time of Flight
ToF-SIMS Time-of-Flight Secondary-Ion Mass Spectrometry
VS Vibrational Spectroscopy
W/O Water-in-oil
WC Wavelet Coefficients
WHO World Health Organization
WT Wavelet Transformation
XRD X-ray Diffraction
ix
ABSTRACT
The non-destructive analysis of a Solid Pharmaceutical Product (SPP) is essential to
verify the quality without destroying the product. This analysis may be performed using
various image processing and signal processing techniques on images and multispectral
data. Based on this analysis, an SPP may be classified as defective or non-defective.
The SPP (categorized as defective) are exposed to three different environmental factors
(humidity, temperature and moisture) over different time periods and the variations in
data are analyzed to judge the effects of these factors on classification of an SPP. In this
research, we have proposed two non-destructive methods to identify defective and non-
defective SPPs using their surface morphology. In first approach, multiple textural
features are extracted using microscopic images of the surface of the defective and non-
defective SPPs. These textural features are Gray Level Co-occurrence Matrix, Run
Length Matrix, Histogram, Auto Regressive Model and HAAR Wavelet. Total textural
features extracted from microscopic images are 281. The features are reduced using
three feature reduction techniques; Chi-square, Gain Ratio and Relief-F. We have
formulated three feature sets, through experimentation, with 281, 15 and 2 features. We
have used four classifiers namely Support Vector Machine, K-Nearest Neighbors,
Naïve Bayes and Ensemble of Classifiers, to calculate the accuracy of proposed
approach. The classifiers are implemented using leave-one-out cross validation and
holdout validation methods. We tested each classifier against all feature sets and the
results were compared. The results showed that in most of the cases, Support Vector
Machine performed better than the other classifiers.
In second approach, we have used multispectral data and applied wavelet
transformations in conjunction with various machine learning techniques for the
classification. The results showed that the spectrum extracted from Ultra Violet
x
wavelength range is more suitable for the classification between defective and non-
defective SPPs. Furthermore, results also described that K-Nearest Neighbors classifier
or Ensemble of Classifiers is a more appropriate classifier.
In the last, the hybrid of the both approaches was tested. The analysis of the results
showed that the hybrid approach is better than the individual ones. An accuracy of 94%
is achieved using K-Nearest Neighbors when a combined dataset of SPPs affected by
all of the three environmental factors is used.
CHAPTER NO. 1
INTRODUCTION
1
In Pharmacology, drugs are such chemical substances that are used to change the
physical condition of the patients for the treatment of different diseases. The physician
prescribes them either for a short period in case of acute diseases or on a regular basis
for chronic disorders. According to World Health Organization (WHO), the usage of
substandard drugs like low quality, expired and counterfeits are the real threat to the
health of patients. The identification of such substandard drugs is a common problem
of developing as well as developed countries. According to US Food and Drug
Administration (FDA), up to 25% of all drugs consumed in poor countries are thought
to be counterfeit or substandard. Many reports describe; that the drugs used to treat
serious diseases such as malaria, tuberculosis, AIDS or other infections are more often
the object of counterfeits.
1.1 Drug Dosage Forms
According to a particular disease, the decision for the drug delivery system and accurate
required amount is very important. The physical type and amount of medication is
known as Dosage Form. The dosage forms also describe the route of the drug
administration; route of administration means the path through which a drug is
delivered to the site of action in the body. Dosage forms are required for accurate dosage
for the patient. With the passage of time, the evolution of the medical therapies and
drugs also results in new dosage forms. Drugs are available in different dosage forms
like:
Solid Dosage Forms
Liquid Dosage Forms
Semisolid Dosage Forms
2
1.1.1 Solid Dosage Forms
The solid dosage forms are medicines that contain accurate dosage and can be given to
the patients as a single unit (dose). They are administrated orally in the form of Tablets,
Capsules or Powders [1]. Solid medicines are the mixture of APIs, with a combination
of different diluents, binders, lubricants, glidants and many other excipients.
Manufacturing of solid medicines requires machines, which are complicated and costly.
Capsules and tablets are most common forms that are widely used in the industry and
have similar manufacturing procedure. Solid dosage forms are easy for shipment and
more stable as compared to liquid drugs. Mostly they have longer expiration dates.
1.1.1.1 Tablets
Tablets are the most popular dosage form. In Pharmaceutical industry, 70% of the
medicines are manufactured in the form of tablets [1]. Tablets are available in different
shapes and sizes, which would be easy to swallow for patients. For identification and
differentiation between different tablets, they are stamped with symbols, numbers and
letters. Along with different medicinal substances, tablets may also have some adjuncts
also known as excipients. For example to ensure efficient tableting manufacturers may
use binders, glidants, lubricants, pigments and coatings. Generally tablets can be
manufactured either by molding or compressing [2]. So various types of tablets are:
i. Compressed Tablets
ii. Molded Tablets
Compressed tablets are prepared by single compression technique using tablet
machines. These tablets are prepared by placing a specific quantity of powdered of
granulated tablet material into special dies. These dies are then compressed by the upper
and lower punches of the machine under high pressure (~tons/in2). It is the least
3
complex, shortest and most effective method for tablet production. After compressing
the APIs, the manufacturers can use excipients and the lubricants. Compressed tablets
can be further divided into three different categories: Multiple Compressed Tablets,
Chewable Tablets and Tablet Triturates. Multiple layered tablets are prepared by
compressing more than once. They are also known as Tablet within a tablet (with cores
and shells). Wet granulation and compression are used to prepare chewable tablets.
These tablets have a property to disintegrate when chewed so they can dissolve in mouth
rapidly. Chewable tablets are commonly used in making of multiple vitamin tablets or
tablet formulation for children. A Small amount of potent drug, formulated into small
and usually cylindrical shaped tablets is known as tablet triturates. These tablets are
completely soluble in water that is why any kind of water insoluble material is avoided
in the formulation process.
Tablets can also be made by modeling instead of compression. Molds of different
shapes can be used for this kind of tablet preparation. They can be prepared either by
tablet machinery or manually. The dampened tablet material is forced into the mold;
formed tablet is then ejected from the mold and left to dry. Modeling technique is
normally reserved for small scale production.
1.1.1.2 Capsules
Pharmaceutical ingredients often have bitter tastes, unpleasant odor or can be reactive to
oxygen. Such drugs may require some kind of coating or encapsulation and capsules are
the best and cheapest solution. Capsules are drugs in which drug substance is enclosed in a
container. These containers can be either hard, soft, gelatin shells or water-soluble [3].
Coatings or enclosing the drug substances in a capsule may affect the bioavailability (BA)
of the drug. Capsules can be of two types.
4
i. Hard Gelatin Capsules
ii. Soft Gelatin Capsules
Powdered or dry ingredients enclosed in hard-shelled capsules are known as hard gelatin
capsules. They are more versatile for controlled drug delivery than soft gelatin capsules.
These capsules are based on two pieces; a cap and a body. The gelatin shells start softening
after ingestion and commence to dissolve in the gastrointestinal tract. After dissolving the
capsule body, the encapsulated drug can disperse rapidly and easily which increases its BA.
These capsules can be supplied in a variety of different sizes starting from ‘000’ for the
largest and ‘5’ for the smallest [3].
Soft gelatin capsules are used for encapsulating oils or such APIs that can be dissolved or
suspended in oil. Plasticized gelatin is used to prepare soft gelatin capsules. Drugs like
Vitamin A, Vitamin E, Chlorotrianisene, Declomycin, Digoxin and Chloral hydrate are
usually prepared in soft gelatin capsules.
1.1.2 Liquid Dosage Forms
Liquid dosage forms cover solutions, syrups, emulsions, suspensions and many more.
Homogeneous mixtures of one solute dispersed in a solvent are known as solutions.
Syrups are the aqueous solutions having sugar or any substitute for sugar with a
combination of different flavoring agents. Physicochemical stability of the drug can be
maintained by using stabilizers in syrups and solutions. The combinations of two or
more liquids that are immiscible are known as emulsions. Emulsions range from low
viscosity lotions to ointments and creams, which come under the category of semi-solid
dosage forms. Insoluble fine solid particles of a drug substance forms suspension, when
dispersed in a liquid medium [3]. Some of the suspending agents are used to increase
the viscosity of the drug that results in slow dissolution of drug.
5
1.1.3 Semisolid Dosage Forms
Ointments, creams, gels etc. lie under the category of Semisolid (SS) dosage forms. The
greasy medications used for skin, rectum or nasal mucosa, which can dissolve to the
skin, are known as Ointments. Creams are mixtures of oil and water. Oil-in-water
(O/W) Creams are more effective and comfortable as they are less greasy. They are
easy to wash out so cosmetically acceptable. Another category is water-in-oil (W/O)
creams. They are reversed of O/W creams, more greasy and difficult to handle.
1.2 Substandard Medicines
In the pharmaceutical industry, each Solid Pharmaceutical Product (SPP) that is being
produced by manufacturers should be according to the defined quality metrics. In the
market, along with genuine drugs, customers can also find substandard SPPs.
According to WHO, Substandard SPPs are genuine products but they ultimately do not
fulfill the quality standards. They are also known as Out of Specification (OOS)
products. The licensed manufacturers who are working under National Pharmaceutical
Regulatory Authority (NPRA) [4] develop them. Some characteristics of such
substandard SPPs are [5]:
Sold either after expiration date
Affected due to improper supply and storage
They can have either too much or too low amount of API
They can have contaminated ingredients.
Sometimes they can have fake packaging or any other kind of quality
negligence.
6
These SPPs may be created either by the carelessness of the pharmacists, insufficient
financial and human resources, by the use of obsolete or malfunctioning of the
laboratory equipment or counterfeiting. They are harmful to patient’s health or
sometimes even cause death [4]. They can be further categorized into three categories:
Counterfeit
Expired
Environment Affected
1.2.1 Counterfeits
Counterfeits are subsets of Substandard SPPs but authorized or licensed manufacturers
do not develop them. According to WHO [6], they are also known as Spurious/Falsely-
labeled/Falsified/Counterfeit (SFFC) medicines. They are fake medicines that look like
genuine. The difference between other substandard and counterfeiting is; the illegally
or intentionally created false medicines are known as counterfeit [4]. Sometimes they
have fake packaging or there can be the absence of API or presence of API in an
inappropriate amount [7]. Multiple different factors contribute to the proliferation of
the counterfeits. They should be identified accurately so that government can take
appropriate actions to eradicate them.
1.2.2 Expired
Another kind of substandard SPPs is expired ones. Every SPP has an expiry date
prescribed by its manufacturers. Expired SPPs are unpredictable in their effectiveness
level. With the passage of time due to the loss of potency level, these medicines can be
either completely ineffective or sometimes even poisonous for health. Medicines are
7
chemical compounds that change their composition with the passage of time; these
changes can be either, in their color, smell or texture.
1.2.3 Environment Affected
Environment-affected medicines are those, which conform to the standards at the time
of manufacturing, but with the passage of time, different external factors change them
into the category of substandard medicines. These factors include moisture, light
(especially sunlight), extreme temperature and oxygen. Oxidation and reduction occur
due to the exposure of drug formulation with oxygen, which results in unstable or
substandard drug. Similarly high temperature and light may also fasten the oxidation
and reduction within a drug. Moisture and humidity may also damage the stability of
any drug. This instability of tablets may result in unpredictable behavior (like their
disintegration and dissolution time) or change in their physical appearance (like
hardness, shape, color etc.). As discussed by Islam and others [8], the escalation for the
moisture present in a pharmaceutical solid tablet from its actual level, results in the
reactions of the API and excipients. He also stated in his research that, moisture
accelerates the hydrolysis process and react with excipients, which affects the physical
and chemical stability of an SPP. In another research, Szakonyi and Zelkó [9] states,
Water absorption in the surface of a tablet results in degradation of its Active
Pharmaceutical Ingredients (APIs). The use of defective tablets may cause some minor
issues in the patient’s body like allergies or may result in their death. Therefore, there
is an immense need of such an approach that can identify environmental affected SPPs
after their manufacturing.
In this research, we are dealing with three environmental factors affecting on SPPs i.e.
moisture, humidity and temperature. Here term moisture is used to represent the liquid
8
form of water. The increase in moisture from its actual need can cause reactions of APIs
and excipients as discussed in [8]. Humidity refers to the gaseous state of water in the
air. The APIs of the pharmaceutical tablets indicates reaction with humidity if they left
in the open air which results in oxidation and reduction process. Lastly, high
temperature can cause a change in potency and efficacy of the APIs in SPPs. All of
these three factors have a great influence on the performance of the SPP, so there should
be an approach for the identification of such defective SPPs.
The non-destructive analysis of a Solid Pharmaceutical Product (SPP) is essential to
verify the quality without destroying the product. In pharmaceutical industries, the
existing approaches for the quality assessment of newly created medicines or for the
detection of counterfeit medicines are destructive and time consuming. These
approaches also require sample preparation and special laboratory environment and
equipment. The focus of this research is to formulate a nondestructive offline approach
for the classification between Defective SPPs (DSPP) and Non-defective SPPs (NSPP).
In this research, the analysis is performed using various Signal and Image Processing
(IP) techniques in conjunction with different Machine Learning (ML) techniques on
image and multispectral data.
The thesis can be read as Chapter 1 provides the brief introduction of the proposed
research; Chapter 2 provides the complete Review of Literature related to the
techniques available for the analysis of SPPs. We have divided the proposed approach
into two parts; one is related to the analysis of SPPs using microscopic high-resolution
imaging and the other one is using multispectral analysis. Both of the proposed
approaches are discussed in Chapter 3 and Chapter 4 respectively. Chapter 5 is
divided into three segments. First two provides analysis and discussion on both of the
proposed approaches and third part explains the results achieved from the combination
9
of the two. Chapter 6 provides the conclusion and future recommendations of the
proposed approach.
CHAPTER NO. 2
REVIEW OF LITERATURE
10
Different techniques are available in literature for data acquisition from the sample
being analyzed, feature extraction and classification purposes. In this chapter, we will
discuss various techniques that are useful for the analysis of SPPs.
2.1 Data Acquisition
In case of SPPs, the extracted data can be of two types: image and spectral data.
Trichromatic color space i.e. RGB (Red-Green-Blue) is used to define image data. Each
digital color of the image consists on the combination of these three colors. Different
electronic devices like digital cameras are used to capture such data. While on the other
hand, each spectral color space consists on tens or even hundreds of color components
[10]. An electromagnetic (EM) wave that is a function of the wavelength or frequency
is known as spectrum. The color of the sensed spectrum can be analyzed from its shape.
An EM spectrum as shown in Figure 2.1, consists of different types of EM radiations
e.g. Gamma rays, X-rays, ultraviolet, visible, infrared, microwaves and radio waves
[11].
Figure 2.1: The electromagnetic Spectrum
The visible part (380 nm to 780 nm) of the EM spectrum is the only part that can be
seen by human eye. The analysis using spectral data allow researchers to investigate
other important feature of the samples that cannot be seen by human eye.
11
Different devices are available that can be used to extract desired raw data from the
sample being analyzed. The assessment of the formulation, efficiency, correctness and
stability of a medicine is very important. In pharmaceutical industries, quality
assessment of drugs can be performed using different data acquisition methods. These
methods can give information about the active ingredients and the structural
information about the surface of the drug. They can be divided into four major
categories:
Chromatographic Techniques
Spectroscopic Techniques
Imaging Techniques
Spectral Imaging
Chromatographic techniques may include tests that are mostly being used in
pharmaceutical industry. These techniques are expensive, destructive, and time-
consuming and require sample preparation. Spectroscopic techniques use spectrometers
that provide spatial information in the form of spectrums for every sample being
studied. All of the spectral methods are non-destructive, less time-consuming and
require less or even no sample preparation except Mass spectrometry. Imaging
techniques involve IP for the analysis. Spectral Imaging techniques are newly emerging
techniques in pharmaceutics. They are the combination of spectroscopic techniques and
traditional imaging which provide both spatial as well as spectral information of the
given sample [12]. The detail of all these techniques is discussed below.
2.1.1 Chromatographic Techniques
Different techniques are available in literature for the assessment and estimation of
formulation, quality, correctness and stability of the solid drugs. Some of these
12
techniques are used at the time of manufacturing to get information about the correct
amount of APIs. According to different researches [7, 13, 14], Thin layer
Chromatography (TLC) and HPLC are most common techniques used for drug quality
testing. The brief description of some of them is given below.
2.1.1.1 Thin Layer Chromatography
TLC procedures can be used for the Detection of counterfeit drugs. These procedures
can be used for the identification and estimation of APIs from the drug. Deisingh [15]
uses TLC for the estimation and identification of counterfeit medicines or the APIs
from the tablets. Impure drug substances can also be identified using TLC [7].
2.1.1.2 HPLC
Most of the manufacturers use HPLC in the pharmaceutical industry to test the products
(medicines) and their raw material or ingredients. Manufacturers assign skilled analysts
for this test. They pass raw materials or prepared medicines through the HPLC machine
and then analyze their results. These machines required sample preparation for testing
which destroys the sample. Therefore, HPLC is a destructive, slow, expensive and time-
consuming method [16].
2.1.2 Spectroscopic Techniques
The interaction of light with molecules and atoms of the natural product (like drugs)
can provide information about their structures. This interaction may results in
spectrums, which lie under different regions of the ES and provide information about
surface structure and ingredients. These spectrums are further processed by the
computers [12]. Along with traditional tests, manufacturers may use these techniques
for the analysis of drugs composition and for the detection of counterfeit and
13
substandard medicines. These techniques just provide some characteristics of the
molecules but do not provide its three dimensional image. As discussed in some other
researches [12, 15, 17-24], solid drug assessment can also be performed using
Spectroscopic techniques. Spectral techniques, which are mostly used in literature for
the analysis of drugs, may include:
Mass Spectrometry (MS)
Nuclear Magnetic Resonance Spectroscopy (NMR)
X-ray Diffraction (XRD)
Scanning Electron Microscopy (SEM)
Vibrational Spectroscopic (VS)
VS includes Raman and Near Infrared Spectroscopy techniques. Different researches
[25-27] explain that all of these require either full or some amount of sample preparation
so they are either destructive or semi-destructive except that of the VS technique.
2.1.2.1 Mass Spectrometry
MS can be widely used to characterize pharmaceutical products. Drug profiling can be
done using time of flight (ToF) and electrospray ionization [15]. MS primarily
LC-MS/MS can be used in all stages of drug development. It can be used to elucidate
the structure of pharmaceutical drug mixtures through mass determination.
Pharmacokinetics of the newly created drug can be investigated through this technique
[19]. MS is a destructive and time consuming method. Barnes et al. [28] uses Time-of-
Flight Secondary-Ion Mass Spectrometry (ToF-SIMS) for the characterization of Bio
and solid-state pharmaceuticals. This can identify chemicals from pharmaceutical
materials and their distribution by analyzing their surface. Culzoni et al. [29] uses
ambient MS for the analysis and detection of falsified or substandard medicines. In
14
another research, MS along with Desorption electrospray ionization (DESI) is used by
Chen et al. [30] for the analysis of pharmaceuticals in an ambient environment.
2.1.2.2 NMR Spectroscopy
In Pharmaceutical industry, NMR is widely being used for the confirmation and
elucidation of the drug structures. Analysis of synthetic or natural products can be
performed through NMR spectroscopy. They can also be used to characterize the
composition and to find impurity profile of the drugs. NMR measurement can also
provide information about the conformations of drugs especially in Tablets [31].
European Pharmacopoeia use this for the identification of drugs and reagents.
Measurement of NMR spectra from liquid drugs is easier than from the solid drugs [17].
Multiple researches are available in literature that uses NMR in pharmaceutical
industry. Holzgrabe et al. [32], provides a review based on the applications of NMR in
pharmacy. According to him, quantitative NMR provides an evaluation based on
quality estimation of pharmaceuticals. In another review presented by the same author
illustrates that the use of quantitative NMR and diffusion-ordered spectroscopy DOSY
NMR experiments are extremely useful for the identification of counterfeit medicines
by elucidating their ingredients [33]. Malet-Martino explains the use of NMR in the
fields of pharmaceutical and biomedical [34].
2.1.2.3 X-ray Diffraction
XRD is one of the spectroscopic techniques, majorly used for the analysis and
identification of polymorphic and solvated forms. It is used to measure the degree of
crystallinity but have lower sensitivity as compared to IR Spectroscopy. This technique
can also be used to determine the quantitative amount of API from multicomponent
tablets. Croker [35] performs a comparative study on the performance of powder XRD,
15
RS and NIRS for the quantitative analysis of the polymorphic mixture named
Piracetam. The research concluded that in this situation RS and NIRS are more suitable
techniques than the XRD. In another research Maurin et al. [36] describes the use of
XRD for the identification of counterfeit medicines. The original Viagra® tablets and
their counterfeit versions were used for the analysis purpose in this research. The
research concluded that the use of XRD is a fast and reliable method for the prediction
of the absence or presence of active contents and excipients from the counterfeit and
genuine medicines. The authors also stated that the XRD is not well suited for trace
analysis. The more accurate methods for trace analysis can be HPLC, GS-MS and
HPLC-MS.
2.1.2.4 Scanning Electron Microscopy
In SEM, surface information of a sample can be traced using electron beam in a raster
pattern, which produce three-dimensional black and white. These images can be further
converted into color images using IP techniques. Surface fractures, contaminations,
chemical compositions, and crystalline structures can be examined using SEM. It is a
destructive method as it require sample preparation. Another drawback is its size and
cost. Trained persons are required to prepare sample and to operate SEM machine.
Scoutaris uses Energy Dispersive X-ray spectroscopy using SEM along with confocal
Raman microscopy for chemical characterization of Paracetamol tablets. The creation
of concentration maps helps in this chemical characterization of the solid tablets [37].
Klang et al. [38] provides a review of SEM techniques used for the domain of
pharmacy. Ruotsalainen et al. [39] uses Confocal Laser Scanning Microscopy
(CLSM) in combination with SEM for imaging film-core interface. The proposed
technique helps in identifying defects from the surface of film-coated tablets.
16
2.1.2.5 Vibrational Spectroscopic Techniques
Vibrational spectroscopic techniques such as IR, NIR and Raman Spectroscopy (RS)
are proving very beneficial for pharmaceutical quality analysis. It is more accurate, less
costly and reliable than the traditional methods. No sample preparation is required for
these analyses [18]. These tests are nondestructive so the sample product can be further
packaged or used again for other tests. Gendrin et al. [40] provides a review based on
VS and chemometric techniques for the analysis of pharmaceutical products.
Another research by De Beer et al. [41] states that the use of RS and NIRS can be
effectively used as process analyzers under Process Analytical Technology framework
in real time environment. These techniques facilitates in a nondestructive analysis
without sample preparation for the extracting physical and chemical composition of the
sample and for the measurement of critical processes and attributes of the sample.
NIR Spectroscopy
NIRS is a low cost, non-destructive and fast method for the analysis of powder
ingredients as well as SPPs in the pharmaceutical industry. It can describe physical
properties of the sample along with some other properties such as hardness, presence
of moisture, dissolution rate, particle size and compaction force. NIRS is widely being
used in pharmaceutical industry to replace traditional time consuming, destructive,
liquid chromatography techniques or wet-testing methods used for the analysis for the
medicines [42]. No sample preparation is required for NIRS. According to [16], NIRS
can be used for Microstructure as well as Macro chemical properties of the tablets.
Through macro chemical properties, analysts can determine the active concentration of
the tablet ingredients. Microstructure defines the distribution and size of the
components (APIs and excipients) within a tablet. NIRS is recognized for the analysis
17
of raw material, process monitoring and quality control of pharmaceutical products
[12].
Roggo et al. [27] reviews the NIRS and chemometric techniques used in the
pharmaceutical industry for the analysis of solid, liquid and biotechnological
pharmaceutical products. NIR Spectra can provide information about various physical
parameters of pharmaceutical product like hardness, compaction force, particle size,
dissolution rate etc. These physical parameters can be obtained from tablets as well as
powders. In another research, Morisseau and Rhodes [26] uses different regression
models such as Partial Least Square (PLS) and MLR along with NIRS to find hardness
of the tablets. NIR accuracy results highly depend on the drug products and their
formulation.
The key parameters of a product, known as polymorphs can change the dissolution
properties of the final drug. Ensuring the correct polymorphic form of a drug is very
important, as polymorphs of a drug can be helpful for the identification and detection
of counterfeits. This confirmation can also be done through NIRS. Water is another key
compound of the pharmaceutical drugs that ensure its stability. Moisture determination
from the drugs is one of the initial tasks of the NIRS. NIRS can be used to determine
the water components from the gelatin capsules [25].
According to Jamrógiewicz [43], along with its many advantages, there are also some
disadvantages. One of them is that, NIRS is less suitable for the direct quantity
determination from the aqueous forms of pharmaceutical ingredients. In another
research, in-line NIR data was used to predict some properties of SPPs that represents
its quality such as hardness, particle size and absorbency [44]. Chalus et al. [45] uses
Wavelet Transformations (WT) along with ANN on NIR spectra to determine the APIs
of the solid tablets. Wavelet transforms are efficiently used to reduce the dimensionality
18
of the large data and to extract relevant information from it. They compared their results
with PLS regression applied on the same raw data and resulted that wavelet coefficients
used with ANN is a better choice. Svensson et al. [46] use NIR based Chemical imaging
along with 2D wavelet filters for the estimation of texture-based difference between
pharmaceutical tablets. In another research Dowell et al. [47] use NIRS along with PLS
to differentiate between counterfeit and genuine Artesunate antimalarial tablets.
Bleye et al. [48] provides a review on the techniques used for NIRS method validation
in pharmaceutics. Shah et al. [49] also discusses the applications of NIRS for the
analysis of pharmaceuticals. Rodionova and Pomerantsev [50] discusses the use of
NIRS for the detection of counterfeits drugs. The Soft Independent Modelling of Class
Analogy (SIMCA), Principal Component Analysis (PCA) and CA were used for
classification. He also reviews the work done by different authors in the same field. In
another research by Rodionova et al. [51], NIR spectrometry was used along with PCA
for the classification between genuine or counterfeit drugs.
Storme-Paris et al. [52] concluded in their research that the use of appropriate
chemometric methods along with NIR spectra proves very helpful in the identification
of counterfeit drugs. They said that the closely related spectra could be classified using
supervised classification algorithms while the samples having different spectrums can
be classified using unsupervised algorithms. Another research using NIRS was
performed for the screening of Viagra tablets and their counterfeit versions by
Vredenbregt et al. [53]. The proposed approach can perform four different tasks. It can
verifies the homogeneities of the batches, distinguish between counterfeit and genuine
versions of the product, screen the active contents of the product from its excipients and
lastly it can identify that the similar sample is analyzed previously or not.
19
Candolfi et al. [54] provides a comparison between different classification techniques
used with NIRS in the analysis of pharmaceutical products. The comparative analysis
was performed by using tablet and capsule datasets. Three classification algorithms
were used in the analysis i.e. K-Nearest Neighbors (KNN), Linear Discriminant
Analysis (LDA) and Quadratic Discriminant Analysis (QDA). The research concludes
that the use of LDA classification algorithm for the analysis of tablets and capsules
results better than the other two. PCA was used as a feature reduction algorithm in
combination with LDA for classification.
Clarke [55] explains the importance of the use of NIR microscopy for the analysis of
pharmaceutical products in his research. Using NIR microscopy, we can analyze the
samples chemically and make a judgment about how well the ingredients are mixed
with each other. PCA and PLS are used for the analysis purposes in this research. IR
diffuse reflectance spectroscopy with acoustic microscopy was used to analyze the
thickness of the coating of the tablet by Bikiaris et al. [56].
Boiret et al. [57] explains the use of NIRCI along with 3D visualization to monitor the
formulation during the development process of the tablet. Another research was
conducted for reviewing the use of mid-IR spectroscopy in the field of pharmaceutics.
Authors illustrate that mid-IR spectroscopy can be used to extract various useful
information from the pharmaceutical samples. It can be used for the identification and
explanation of the structure of the sample. Another application of this technique can be
the characterization of the polymorphs and amorphous forms of the sample [58].
Reich [59] discusses the two non-destructive analytical techniques i.e. NIRS and NIRS
imaging. They can be used for both quantitative as well as qualitative analysis. This
review focuses on five aspects of the both techniques; basics of the NIR and
chemometric based data processing, qualification and identification of raw materials,
20
analysis based on intact solid dosage forms, process monitoring and control and lastly
the regulatory issues.
Blanco and Alcala´ [60] proposed an NIRS based approach for the analysis of intact
pharmaceutical products. This analysis can provide information about the harness of
the tablet and about the API and their uniformity level in tablets. PLS1 calibration
model was used for quantization. This model was created using laboratory calibration
samples of tablets. The NIRS based method is simpler for the quantification of the APIs
as the calibration set provides variability in production samples. Sulub et al. [61]also
used five different NIRS for the analysis of pharmaceutical tablets in order to validate
their content uniformity. Robust multivariate calibration transfer algorithms were used
in this research.
Process analytical techniques can be used to monitor three different levels of the
tableting process namely blend homogeneity, content uniformity and coating thickness.
Moes et al. [62] proposed an NIRS based analytical method for monitoring these three
levels. A calibration free blend homogeneity was estimated using diode array
spectrometer. For the estimation of content uniformity and coating thickness, authors
have used Fourier-transform spectrometer with calibration-based models.
Li et al. [63] measure blend and content uniformity using semi-quantitative reflectance
NIR. The authors state that three factors are most important to judge the applicability
of the proposed method. (1) Identifying the API from the NIR spectrum, (2) Spectrum
strength and (3) relationship between API and NIR spectrum. The authors also state
that this approach can be used in early stages of the formulation process and is able to
analyze multiple batches at the same time. Another research by the Li et al. [64]used
PCA for the prediction of content uniformity from the NIR based spectra of the
21
pharmaceutical tablets. Some other researches that used NIR for the estimation of
content uniformity of the pharmaceutical products are [65, 66].
Raman Spectroscopy
Raman is another advantageous technique used for the analysis of the pharmaceutical
products. Analysis of the samples require no sample preparation, so in fact it is a non-
contact and non-destructive technique. Raman is very suitable for the analysis of
samples, ranges from microscopic amount (<1 µm) to centimeters and can also provide
spectra of small changes in chemical structure. It is equally suitable for solids as well
as samples, in aqueous materials. Uniformity of the sample materials can be analyzed
using Raman spectra. In Raman Spectroscopy (RS), a laser source of visible IR and
monochromatic radiation are used for the analysis of the samples. It is a non-destructive
method so can be used for the analysis of bulk and final products directly in their
packaging. RS can also be used for online monitoring of the drug’s quality and require
minimum trained personnel. RS can be helpful for identification of raw material,
quantity determination of APIs and screening of polymorphs [67]. Vankeirsbilck et al.
[68] explains RS and its applications in pharmaceutical industry in detail. According to
the researcher, RS can be considered as one of the more influential techniques for the
analysis of pharmaceuticals.
Feng et al. [69] presents a technique for the identification of counterfeit drugs using
portable Raman spectroscopy. Extracted spectrums were analyzed using Local Straight
Line Screening (LSLS) and PCA and gave an accuracy of 96.35%. In another research,
Li et al. [70] uses Raman Spectroscopy for the classification of Azithromycin (AZM)
tablets manufactured by four different manufacturers. Classification was performed
using four different classifiers named Support Vector Machine (SVM), Bayes classifier,
K Nearest Neighbors (KNN), and Partial Least Squares Discriminant Analysis (PLS-
22
DA). Among these classifiers, PLS-DA provides 80% accuracy using full spectra and
100% using partial-spectrum. Romero-Torres et al. [71] proposed the use of RS to
examine the variability of tablets coating and later on in another research; they uses RS
to estimate the tablet coating thickness [72]. Muller et al. [73] use RS to monitor the in-
line active coating process.
Gao et al. [74] uses RS for the analysis of expired medicines and compare results
achieved using different classification and chemometric techniques. PLS-DA, KNN
and SVM are the three classifiers used for the comparison in this research. Data
preprocessing was performed using Savitzky–Golay algorithm, first derivative (FD),
second derivative (SD) and max-min normalization (MN). The comparison results that
an average 96.80% accuracy is achieved using SVM.
RS along with PCA and Hierarchical Cluster Analysis (HCA) was used for the
classification between genuine and counterfeit Viagra® tablets by Veij et al. [75].
Another research by Eliasson and Matousek [76] presented a new and improved method
that can be used for the identification of counterfeit pharmaceutical products such as
capsules and tablets. The proposed approach named as spatially offset RS (SORS) can
replace the conventional Backtracking based RS. This approach can be used for the
identification of these products even within their packaging. The conventional
backscattering RS normally fails in identifying the product that are still packed because
of the fluorescence of the packaging material. This packaging material can be the plastic
container, capsule shell or tablet coating.
Ricci et al. [77], proposed a new approach for the characterization between genuine and
counterfeit Artesunate tablets. The proposed approach is a combined version of SORS
and Attenuated Total Reflection Fourier Transform Infrared (ATR-FTIR). The
23
chemical composition of the tablet can be extracted by VS techniques however; the
combined approach can effectively analyze the composition from the overall tablet as
well as from surface only.
Zhang et al. [78] performed a comparative study based on the use of different
multivariate analysis techniques on Raman imaging for the analysis of pharmaceutical
tablets. Direct classical least squares (DCLS), multivariate curve resolution (MCR),
PCA and cluster analysis (CA), are the four different multivariate analysis techniques
that are used in this study. The comparative results are based the multivariate analysis
of the Raman data collected from the 400µm × 400µm area of the surface of a model
tablet. According to the authors the PCA, MCR and CA are suitable for analysis when
the chemical composition of the tablet is unknown, as these techniques do not require
any prior knowledge about the sample. The analysis is completely based on the input
sample in these techniques. While, on the other hand, DCLS requires a reference
spectrum for the analysis purpose. Relative quantitative information about the spectra
can be extracted through DCLS based on the reference. The authors also state there are
also some cautions that should be focused while performing quantitative analysis. The
use of preprocessing techniques is essential to reduce noise but in some situations when
the effects of noise dominate over the signal then the qualitative analysis should be
preferable instead of quantitative analysis.
In another research, authors provide a review based on the use of RS through a
microscope for the extraction of depth and lateral chemical maps of the samples. Bulk
RS data can be processed using different chemometric techniques for the analysis of
the pharmaceutics. According to them, the mapping data extracted from tablets using
RS can effectively be used to determine the distribution of the APIs [79].
24
Hédoux et al. [80] discuss the contribution of the low-frequency RS in order to
investigate and detect the small crystalline materials. It can also be used to expose the
ambiguous polymorphic and non-polymorphic materials and helps in studying their
stability and characteristics.
Another research based on the noninvasive analysis of the pharmaceutical capsules and
tablets can be performed using transmission RS geometry. The proposed approach can
provide bulk information about the contents of the product by reducing surface
fluorescence signals. Transmission RS can also provide better results as compared to
traditional backscattering RS with high specificity, speed and ease in development [81].
Strachan et al [82] states that RS is an analytical approach for both solids and materials
in the aqueous environment. The review also describes that different multivariate
analysis algorithms can be used to overcome the quantification problems raised due to
poor peak resolution of the spectrum. According to the research, RS can also be used
for the complex pharmaceutical products such as suspensions and microspheres.
O’Connell et al. [83], presented based on the discrimination between target substance
and different excipients for the identification of illicit drugs. RS was measured from the
sample and then preprocessed using FD and normalization techniques. The analysis
was performed using PCA, SVM and Principal Component Regression (PCR.). The
results show that SVM outperform from all other algorithms.
2.1.3 Imaging Techniques
Imaging is also used for the examination and classification of the tablets. High-
resolution cameras can be used to capture images of the sample SPPs. Information that
is more detailed can also be captured from the samples by using microscopic cameras.
It is a non-destructive, less expensive and simple approach based on different IP
techniques like IE, Segmentation, Edge and Contour detection and Texture analysis etc.
25
Segmentation of grayscale tablet images using adaptive thresholding and
morphological operations is used for the tablet identification that is also known as pill
recognition. Andreas et al. in their researches [84, 85], performed classification using
Euclidean Distance on a feature set based on size, shape and color, and the results
describe that the most dominant feature from these three is ‘size’. Ramya et al. [86]
used template matching along with a series of IP techniques to detect broken tablets
from blister packaging.
Špiclin et al. [87] performed inspection of imprinted tablets using image registration on
an image database of different defective and non-defective tablets. They used three
registration methods in this research: direct matching of pixel intensities, principal axis
matching and circular profile matching. Comparative analysis shows that circular
profile matching is more powerful registration technique of visual inspection of the
tablets. Bukovec et al. [88, 89] performed two studies on the comparison of geometrical
and statistical methods for visual inspection of tablets using Receiver Operating
Characteristics Analysis. Geometrical features are based on imprinted shape while on
the other hand statistical features are based on tablet surface statistics. The proposed
inspection method can identity five types of defects: spot, deboss, emboss, crack and
dot. Results show that the features extracted from the statistical methods are better than
the geometrical methods for the tablet inspection.
In another research, statistical textural features are used for the classification between
defective and non-defective solid tablets. These features are extracted from microscopic
images of the surface of the tablets [90]. Možina et al. [91] provides an automated
technique for visual inspection of the imprints of the solid pharmaceuticals. Lee et al.
[92] also provide an imprint based automated method for matching and retrieving illicit
pills. Edge localization and invariant moments were used a feature vector for matching.
26
Yu et al. [93] use content-based image retrieval technique to develop an online solution
to drug tablets retrieval. Signature features are used to extract shape and Gabor features
for imprint mark from an image of a tablet. A research conducted by Jung et al. [94]
describes the use of image processing and statistical analysis for the detection of
counterfeits solid tablets. Image acquisition was performed using a high-resolution
VSC 5000. Different morphological operations segments the tablet image from its
background. RGB color components of the images are used to build a statistical model
for the detection purpose. Bhattacharyya distance measures are used for the
discrimination between genuine or counterfeit tablets.
2.1.4 Spectral Imaging Techniques
Spectral Imaging (SI) techniques are another type that can be used for the analysis of
solid form of dosages. These techniques provide detailed information about the
concentration and distribution of the drug ingredients. These are combination of both
vibrational spectroscopic techniques and digital image processing. SI involve two major
techniques:
a) Hyperspectral Imaging (HSI)
b) Multispectral Imaging (MSI)
Chemical Imaging (CI) is also used for the analysis of SPPs. The combination of IP
along with any of the VS techniques is known as CI. CI is used to capture spatial as
well as spectral information from an object. Initially it was developed for remote
sensing but recent researches proved that it could be used for nondestructive analysis
of pharmaceutical products [95]. HSI is related to MSI, the basic difference between
both of them is the number of bands or the type of measurements.
27
De Juan et al. [96], illustrates the benefits of using spectroscopic imaging techniques
along with different chemometrics for the analysis of pharmaceutical products. The
merger of these two can be helpful in extracting local as well as global information
from the chemical components of the surface area. It can be used to estimate
homogeneity of chemical components, detecting impurities from the sample and
monitoring process
2.1.4.1 Hyperspectral Imaging
Hyperspectral sensors are used to collect information of each spatial position as a set of
images. Each of the image is based on spectral band range means each pixel in the
image contains a spectrum of that specific position [95]. Chemical Images are three
dimensional blocks of data, based on one wavelength and two spatial dimensions.
Chemical Images can be formed by combining either NIR or Raman Spectroscopy with
digital imaging. They both can be used for the analysis of pharmaceutical dosage forms.
Therefore, in general CI can be NIR-CI and Raman-CI. Both of them are used for the
analysis of raw ingredients of the drugs, drug development process monitoring and
quality control.
NIR-CI
Near-infrared chemical imaging (NIR-CI) is an emerging technology as compared to
simple NIR spectroscopic technique in pharmaceutical industry. NIR-CI is used for the
prediction of APIs and excipients concentrations from the solid pharmaceutical dosage
forms [97]. In another research NIR-CI was used for the detection of counterfeit
pharmaceutical tablets, where no prior knowledge of the composition of sample is
required [98]. NIR-CI is also used to assess content uniformity from the batch of tablets.
Content uniformity was evaluated by applying different quantitative algorithms to
28
global hyperspectral image of ten tablets [99]. Another recent research demonstrates
the use of single point NIRS along with NIR-CI and statistical variance analysis for the
detection of counterfeit tablets [100]. NIR-CI is also used for the quantification of
coating thickness of the tablets and their chemical structure of the tablet core and
coating [101]. High throughput quality analysis is highly required in pharmaceutical
industry. NIR-CI can also be applied to perform analysis on multiple sample tablets at
a time even if they are packed. This results in fast and nondestructive identification of
APIs and excipients. Hamilton and Lodder [102] uses HSI for the analysis of
pharmaceutical medicines to compare the performance of HSI over HPLC and
concludes that HSI is more accurate.
In another research, Gowen et al. [95] performed non-destructive assessment of the
pharmaceutical tablets using VS along with various Image Processing (IP) techniques.
The image created from the combination of digital imaging with either Raman
Spectroscopy or Near-Infrared Spectroscopy are known as Chemical Image. From
different researches [99, 101, 103-105] , it is found that chemical imaging can also be
used to monitor the development process and quality control of the pharmaceutical
tablets. Puchert et al. [100] uses Near Infrared Chemical Imaging (NIRCI) for the
identification of counterfeit medicines. Sacré et al. [106] present a detailed review on
VS based hyperspectral imaging for the analysis of pharmaceuticals. This paper also
provide detailed information about the chemometric techniques used for pre and post
processing of the data. In a research, the author Amigo and Ravn [107] tried to avoid
calibration required for the quantification of major and minor components of the
pharmaceuticals. Various methods have been tested using NIRCI for this purpose and
concluded that Multivariate Curve Resolution (MCR) provides reliable results. Franch-
29
Lage et al. [108] uses NIR based HSI for the surface based assessment of the
distribution of APIs and excipients in the pharmaceutical products.
Carneiro and Poppi [109] use NIR imaging spectroscopy to study the distribution of
API and excipients in the spironolactone tablets. Concentration maps against each
compound were obtained using the Interval partial least squares model. These maps
were created by quantifying API and excipients at each pixel. The results indicated that,
the research is helpful for the quantification of compounds at each pixel level. In
another research by Palou et al. [110], the nondestructive analysis of pharmaceutical
products was performed to determine the distribution and concentration of the major
and minor components of the products. The calibrations models were build using PLS.
In another research [111], Super-resolution was used to improve the performance of
NIRCI to determine the quality of the pharmaceutical solids.
Osorio et al. [112] used NIRCI to characterize the pharmaceutical powder blends. A
Science-Based Calibration (CBS) chemometric method was used in this research. This
method creates calibration model based on pure spectra of the component. CBS helps
in characterizing blends by creating concentration maps. CBS does not require large
number of samples to create a calibration model and that is the main benefit of this
algorithm upon conventional methods like PLS or PCA.
Lyon et al. [113] studied NIRS imaging for the assessment of the quality i.e. the blend
uniformity of the pharmaceutical products. High contrast NIR images of a tablet were
acquired using array detector technology. These experimental tablets were based on
five levels of blending of APIs i.e. from well blended to un-blended. The results were
compared with those acquired from simple NIRS. The authors concluded that spectral
imaging based approach could clearly differentiate between all five levels of blending
qualitatively as well as quantitatively.
30
Lee et al. [114], proposed a possible approach for the measurement of content
uniformity from multiple tablets at the same time. This approach is based on NIR-CI.
A field of view of the size 59.5mm x 47.5mm was used for data extraction from a total
of twenty tablets simultaneously. Each tablet consists of 3000 pixels where each pixel
is of 186µm by 186µm of the sample area. The results of the proposed approach were
compared with conventional UV method. The authors concluded that the results from
both of the approaches are of the same accuracy. They also stated that the location
variation of the tablets in the field of view did not affect the performance of the
proposed approach.
Westenberger et al. [115]performed a comparison of traditional and non-traditional
analytical methods to estimate the quality of pharmaceutical products that are available
on internet for sale. The HPLC was used as a traditional method in comparison with
NIRS, NIR imaging and Thermogravimetric methods. The comparison of results
describe that the use of non-traditional methods effectively highlight more
characteristics as compared to HPLC.
Gendrin et al. [116]investigate the feasibility to use NIR-CI in order to quantify APIs
and excipients available in pharmaceutical products. The chemical images were
captured with two pixel sizes 10µm/pixel and 40 µm/pixel. Two preprocessing
techniques were applied namely SNVC and SDC. Concentrations were extracted using
PLS2 and multivariate CLS. The comparison of results indicate that the use of 40
µm/pixel with SNVC as preprocessing and PLS2 can be a better combination for
prediction of API contents. Another research by Li et al. [117]proposed that the CI
based on NIR can be used to estimate API particles/domain. The proposed approach
can effectively be used to evaluate the blending behavior of the APIs in the process of
their formulation.
31
The effects of the particle size of the extra granular tartaric acid on the uniformity of
the BMS-561389 tablets were estimated by Hilden et al[118]. The relation between the
two was estimated using NIR based chemical imaging.
Raman-CI
In pharmaceutical industry, Raman-CI can be applied to find particle size estimation,
minor component detection and tablet characterizations. For the analysis purpose, data
at each pixel of the sample is compared to a standard spectrum of the sample that has
APIs and excipients to its correct level [12]. Sasic applied Raman-CI to capture
spectrums from the drugs for the detection of low content API pharmaceutical
formulations. Author reported that PCA is more helpful for such kind of detections
[119].
Another research conducted by Doub et al. [103] focuses the application of Raman-CI
for ingredient specific particle size characterization of nasal spray formulation. It is
suitable for identification of APIs as well as placebo. Similar chemical compositions in
drugs can effectively be described using Raman-CI.
CA is used for the segmentation of images, which enable the visualization of distinct
regions for the characterization of solid dosage by Bell et al. [120]. Vidal and Amigo
[121] illustrate the essential preprocessing techniques required to process hyperspectral
images before starting actual analysis. These preprocessing techniques can handle
various issues like image compression, identification and removal of background,
spiked points and dead pixels etc.
Sasic [122] used the combination of NIR and Raman global illumination mapping
devices to capture chemical images of the pharmaceutical granules. The main purpose
of this techniques was to measure how well the APIs and excipients are mixed with
32
each other. Both of the devices were used to analyze randomly distributed 50 – 100
granules through a microscopic slide of 3.5mm x 3.5mm. Spectra acquired from both
of the instruments easily characterize the granules. The comparison of the results show
that Raman global illumination provides more comprehensive information about the
chemical structure of the sample.
2.1.4.2 Multispectral Imaging
MSI systems use MSI sensors, which can collect spectra from less than 20, generally
noncontiguous spectral bands [123]. These bands can detect information in a specific
combination from the desired region of the spectrum. Unique combination of spectral
information can be achieved by varying number and position of bands within MSI
system [124]. Another research uses MSI for the determination of moisture and salicylic
acid from a single packaged Asprin tablet. They conclude that MSI offers high-speed
advantage approximately 30000 times over HPLC. They also concluded that MSI of a
field of tablets is almost 1000 times faster than spectrometry of a single tablet [125].
Many other researches [126, 127] are available in literature that uses MSI in
pharmaceutical industry.
2.2 Preprocessing Techniques
Preprocessing is an essential step before applying any kind of analysis techniques on
the acquired data. Spatial and spectral information gathered from the sample provides
knowledge about its physical structure, surface information and chemical composition.
However, multiple external factors cause systematic variations between spectra or
image. In case of SPPs, different nonchemical factors may get included in the data after
acquisition process. These factors can be scattering effects due to surface
inhomogeneity, specular reflections, random noise, interference from external light
33
sources etc. Different preprocessing techniques are required to remove such
nonchemical biases from the spectral and spatial information such as [128]:
Smoothing
Normalization
Standard Normal Variate Correction
Multiplicative Scatter Correction
Savitzky-Golay Derivative Conversion
Image Enhancement
2.2.1 Smoothing
Smoothing algorithms are useful in both IP and Signal Processing (SP) in order to
prepare images or signals for further processing by reducing noise. Different types of
noise may exist in images salt and pepper noise. It is a sparse light and dark disturbance
in an image in such a way that the color of the noisy pixels will have no any relation
with the pixels of the original image. It is like light and dark spots on the image. Another
type of noise is known as Gaussian noise in which, each pixel of the image is slightly
changed from its original pixel value. Histograms can be used to visualize the normal
distribution of noise. In IP mostly different convolution or filtering based algorithms
are available that can be used to reduce noise from images. Low pass filtering is a
technique used to make an image smooth. A low pass filter retain the low frequencies
in the image by attenuating the high frequencies [129]. Some other filters are also used
in IP e.g. weighted average filters, binomial filters, mean filter, median filter.
In SP, the most common technique used to remove signal noise is known as moving
average. This algorithm generates a smooth signal as output, which is based on
equidistant points. Each smoothed point (Yk)s of the output signal consists on the
34
average of filter width. Usually, filter width is an odd number of the consecutive 2n+1
points (n = 1, 2, 3... n) of the raw data, where the raw signal consists of
Y1, Y2, Y3 … Yn. According to Efstathiou [130], the formula for applying moving
average to smooth a point is given below.
(𝑌𝑘)𝑠 = ∑𝑌𝑘+𝑖
(2𝑛 + 1)
𝑖=𝑛
𝑖=−𝑛
Equation 2.1: Formula for Smoothing a signal
Smoothing level of the signal depends on the filter width, the greater the filter width
the more strong the smoothing effect will be. Signal to noise ratio can also be increased
by applying the smoothing algorithm multiple times. However, this is a lossy technique
to smooth the signal. Each time the application of this algorithm to the data may results
in loss of first and last n points from data.
2.2.2 Normalization
In IP, the process of changing the range of the intensity values of each pixel to enhance
contrast level of the image is known as normalization. This is also known as contrast
stretching or histogram stretching. Histogram equalization is used for contrast
stretching. However, in SP, normalization is also known as dynamic range expansion.
It is used to enhance short wave spectra. The dynamic range expansion converts an
image or signal into a more consistent form.
2.2.3 Standard Normal Variate Correction
Standard Normal Variate Correction (SNVC) is a well-known algorithm used for scatter
correction of spectral data (Mostly for NIR). The application of this algorithm to the
35
input spectrum results in reducing the spectral noise and elimination of background
effects. Baseline shifting or tilting can occur in data due to the chemical composition
of the sample and the use of the variable length of the spectral path. These issues
normally occur at longer wavelengths. SNVC is used to either reduce or eliminate such
type of scatter corrections. The basic working of SNVC is same as of MSC. The
difference between both is SNVC performs baseline and reference corrections
consecutively. According to Rinnan et al. [131], the formula for SNVC is:
𝑋𝑐𝑜𝑟 = 𝑋𝑜𝑟𝑔 − 𝑎0
𝑎1
Equation 2.2: Formula for SNVC
Here Xcor is the corrected spectrum, Xorg is the original sample spectrum, a0 is the
average of the Xorg and a1 is the standard deviation calculated from Xorg.
2.2.4 Multiplicative Scatter Correction
Multiplicative Scatter Correction (MSC) is another commonly used technique to
remove imperfections or unwanted scatter effects from the sample spectrum. In
principle, the MSC was originally developed to apply only to that part of the spectrum,
which contain no chemical information, but in practice whole spectrum is used. This
technique is useful when the chemical difference between samples is small. The
application of MSC is performed in two steps. In step one the correction coefficients
are estimated using the formula [131]:
𝑋𝑜𝑟𝑔 = 𝑏0 + 𝑏𝑟𝑒𝑓,1 . 𝑋𝑟𝑒𝑓 + 𝑒
Equation 2.3: Formula for the estimation of correlation coefficients of MSC
In step two, the correction is performed on the spectrum using the formula [131]:
36
𝑋𝑐𝑜𝑟 = 𝑋𝑜𝑟𝑔 − 𝑏0𝑏𝑟𝑒𝑓,1
Equation 2.4: Formula for the corrections in MSC
Here, Xorg is the sample spectrum, Xref is the reference spectrum that is being used to
preprocess the whole dataset, b0 and bref,1 are the scalar parameters, e is the unmolded
part of the original spectrum and Xcor is the corrected spectrum. The scalar parameters
will be different for each sample being preprocessed.
2.2.5 Savitzky-Golay Derivative Conversion
Savitzky-Golay Derivative Conversion (SDC) is an algorithm also used to implement
smoothing. SDC is better than moving average algorithm as it is based on polynomials
[132]. An individual polynomial is used to fit a filter width (also called window) around
each data point in the spectrum. After fitting, a polynomial curve to the input data point,
a central point is calculated which is treated as the newly smoothed point. However,
this is also a lossy method. SDC algorithm strongly depends on two things, one is the
order of the polynomial and the other one is the size of the window. The use of lower
order of polynomial and larger size of window may result in a more smoothed signal
[130].
2.2.6 Image Enhancement
Image enhancement (IE) techniques are basically used to prepare images for a better
perception of the human viewers. The basic purpose of its application is to modify the
attributes of the input image in such a manner so that they can act better for a specific
task [133]. IE can be applied on images in two domains; spatial and frequency. Spatial
domain enhancement techniques are those, which can be directly applied to pixels of
37
the images. While on the other hand frequency domain is the one, in which images must
be converted to Fourier Transform (FT). All the enhancements should be applied on the
FT of the image. After applying enhancements, the inverse FT is applied to get back
the image form.
Maini and Aggarwal [134] in their research illustrates that multiple different
transformation techniques are available in the literature that can be used to enhance the
images. These techniques can be logarithmic, power law, piecewise linear and intensity
transformations. Some other techniques can be gray level slicing, image negation,
Histogram matching and equalization. Different morphological operations can also be
applied on the images to improve their visibility. These can be erosion, dilation,
opening and closing.
Color or grayscale mapping with intensity scaling can visualize compositional contrast
between pixels in an image. To enhance the contrast level between distinct regions of
the sample, Image Fusion can be implemented. Image Fusion combines two or more
images at different wavelengths to create a new one [135]. Some straightforward
mathematical operations can also be applied on images to combine them, such as
addition, subtraction, multiplication and division.
2.3 Feature Extraction Techniques
Extraction of useful information from the data (both spatial and spectral) gathered from
the sample being analyzed; require different advanced IP and SP techniques. Various
methods exist for the extraction of chemical, spatial and physical information hidden in
these spectrums and images.
Spectral analysis is helpful to determine different components, which are present in the
sample, their concentration and distribution. This analysis can be performed by
38
evaluating intensity at a single wavelength, ratios of intensities at different wavelengths
and the integrated intensity (area) under a spectral peak. After spectral analysis, it is
necessary to reduce the number of available variables, by keeping those variables that
have a maximum variation in their data and discarding all other ones. This can be
performed by using multivariate chemometric method [128].
Different types of feature extraction techniques are available in the literature to extract
useful information from images or spectral data. Some of them are described below.
2.3.1 Low Level Feature Extraction
The techniques that are helpful in extracting features from images without having any
information about the shape are known as low-level feature extraction techniques. Edge
Detection (ED) is most common extraction method of this category. ED provides a line
drawing of the input image [136, 137]. Different ED operators are available in literature
that can extract refined edges from an image, e.g. Prewitt, Sobel, Canny, Laplacian and
Marr–Hildreth operators [138].
2.3.2 High Level Feature Extraction
Another group of the technique used for feature extraction from images is known as
high-level feature extraction techniques. This set of algorithms is based on shapes
extraction along with some other information like their position, size and orientation.
Some basic geometric shapes (circle, rectangle and squares) are used for the extraction
of complex shapes from images. Different algorithms are available for extraction of
shape features from images e.g. Thresholding, Image Subtraction, Template Matching,
Hough Transform, Generalized Hough Transform and Snakes [138].
39
Fourier Descriptors are also available to extract shapes from an image. These
descriptors also help in extracting some other information about shape e.g. its area,
perimeter, centroid, shape layout etc.
2.3.3 Textural Feature Extraction
The texture of a surface can be defined using different types of features, which can be
extracted from the gray level distribution of the image intensity. Statistical feature
extraction methods are extensively used for the texture analysis [139-143].
2.3.3.1.1 Gray-level Co-occurrence Matrix
Gray-level Co-occurrence Matrix (GLCM) is one of the statistical feature extraction
methods, which can be used to define the texture of a surface. It is based on the spatial
relationship between pixels. Texture characterization can be performed by calculating
how often pairs of pixel with specific values and in a specified spatial relationship occur
in an image [144]. Thus, we can say a second order statistics of grayscale histograms
are used in this method [142].
2.3.3.1.2 Histogram Features
Histogram features are first-order statistics based features, used to represent surface
texture. According to Srinivasan [145], histogram-based features represent intensity
concentration on all parts of the image.
2.3.3.1.3 Run Length Matrix
Run Length Matrix (RLM) from an input gray level image is defined by a set of
consecutive, collinear pixels having same gray-level. The coarseness of a texture in a
specific direction can be captured using RLM [146]. RLM is a higher order statistic of
the grayscale histogram.
40
2.3.3.1.4 Autoregressive Model
The local interactions between image pixels are used in Autoregressive (AR) model.
The intensities of the pixels are based on the weighted sum of the input pixels. The AR
model is considered as more simplified and efficient model used for segmentation of
unsupervised segmentation of textures. The parameters defined by using the AR model
when implemented for image regions helps in texture discrimination [147].
2.3.3.1.5 Wavelet Transformations
In the field of texture classification and segmentation, Wavelet Transformations (WT)
is another feature extraction approach that can be used to characterize the texture.
Wavelet Coefficients (WC) extracted from the images, which are then used to compute
textural features. Different textures have different textural features if their frequency
spectrum is decomposed properly. These textural features include energy, entropy or
averaged l1-norm [144].
The use of WC as features, for the classification of any type of signals like audio, EEG
etc. is also very helpful. WT of spectra provide two main benefits, one is dimensionality
reduction and other is de-noising of spectrum. Cai et al. [148] discusses the usage of
Multiresolution Wavelet Transformation to reduce the noise intrusion and background
subtraction in RS, specifically in the domain of automated processing of large spectral
dataset and spectral imaging.
In another research, Li et al. [149] uses near-infrared diffuse reflectance spectroscopy
along with PLS regression and PCA for the identification and quantification of
azithromycin tablets. Continuous Wavelet Transformation was used for baseline
elimination in this research.
41
The computation process of WC includes decomposition of the whole signal into
multiple wavelets. Many wavelet functions are available for this purpose e.g.
Daubechies, Coiflets, Symlets, Discrete Meyer, Biorthogonal and Reverse
Biorthogonal. Signal is then projected on the chosen wavelet function for the
calculation of WCs. This result into two type of coefficients; detailed and
approximated. Therefore, the selection of decomposition levels and wavelet function is
very important. The output of the WT is a vector consisting of the final level of
approximation coefficients and all detailed coefficients calculated up to that level.
Figure 2.2 explains the basic approach for the decomposition and calculation of WC at
level two. Where ‘ai’ represent approximation coefficient at level ‘i’ and ‘di’ is the
detailed coefficient at level ‘i'.
Figure 2.2: Two level decomposition for the computation of WC
2.4 Feature Reduction Techniques
The process of optimally reducing the original feature space according to some defined
criteria is known as feature reduction. Not all the features extracted from feature
Level 2 decomposition
Level 1 decomposition
Input Signal
a1 d1
a2 d
2
d1 a
2 d
2
42
extraction phase has equal importance against some specific target concept. Feature
reduction techniques reduce the dimensionality of the feature set by removing
irrelevant, noisy or redundant features. The use of optimally reduced feature vectors
may enhance the efficiency, accuracy and processing speed of the classification or data
mining algorithms [150]. Multiple different algorithms are available in the literature for
this purpose. The details of some of these algorithms are given below.
2.4.1 Information Gain
Information Gain (IG) is a feature reduction algorithm based on entropy measure.
Entropy is an information theory measure used to characterize the purity of examples.
It is used to measure system’s unpredictability. If Y is a random variable, then the
entropy of Y can be calculated using the formula given below [151].
𝐻(𝑌) = ∑𝑝(𝑦) log2(𝑝(𝑦))
𝑦∈𝑌
Equation 2.5: Entropy calculation when Y is independent variable
Here, the marginal probability density function of Y is represented by p(y). A
relationship exists between two variables X and Y if they satisfy two conditions.
According to the first condition, the observed values of Y from the training dataset must
depend on the variable X. The other condition is based on their entropy measure i.e. the
entropy of Y according to the partitions based on X should be less than the entropy of
Y prior to the partitioning based on X. Therefore, the formula for entropy when Y
depends on X is:
43
𝐻(𝑌|𝑋) = ∑𝑝(𝑥)
𝑥∈𝑋
∑𝑝(𝑦|𝑥) log2(𝑝(𝑦|𝑥))
𝑦∈𝑌
Equation 2.6: Entropy calculation when Y is dependent on X
Here 𝑝(𝑦|𝑥) is the conditional probability of y given x. The variable X can also provide
some additional information about Y based on its entropy i.e. the amount by which the
entropy of Y decreases. It is known as IG and calculated by using equation 2.7.
𝐼𝐺 = 𝐻(𝑌) − 𝐻(𝑌|𝑋) = 𝐻(𝑋) − 𝐻(𝑋|𝑌)
Equation 2.7: Formula for IG
The IG calculated about Y by observing X should be equal to the IG calculated from X
by observing Y. Therefore, we can say that IG is a symmetric measure. One major
disadvantage of IG is that it more biased to the features having more values even when
they are information less.
2.4.2 Symmetrical Uncertainty
The Symmetrical Uncertainty (SU) provides the solution for the problem of IG i.e. the
more biasedness for the features having more values. The solution of this issue is
achieved by dividing the IG with the sum of entropy measures of X and Y [152], as
shown in Equation 2.8.
𝑆𝑈 = 2 𝐼𝐺
𝐻(𝑋) + 𝐻(𝑌)
Equation 2.8: Symmetrical Uncertainty formula
The SU always provides normalized values between [0, 1] because of the correction
factor 2. If SU results in 0, it means zero correlation between X and Y. While, on the
44
other hand, SU = 1 means highly correlated. The problem of SU is that it is more biased
towards the features having fewer values.
2.4.3 One-R
Holte [153] proposed an algorithm named OneR, based on rules for feature reduction.
An individual rule was created for each attribute of the training dataset. The rules
having minimum error is then selected for further processing. All the features based on
numerical values are treated as continuous by using a simple method for dividing the
range of values into multiple disjoint intervals. Missing values are handled by using
‘missing’ as a legitimate value.
2.4.4 Chi-Square
Chi-square (CS) feature selection algorithm performs ranking of features by calculating
chi-squared statistic for each class. CS calculates the degree of the dependency between
attributes and a specific class. According to Chatcharaporn et al. [154], the formula for
CS is:
𝑋2 = ∑∑(𝑂𝑖𝑗 − 𝐸𝑖𝑗)
2
𝐸𝑖𝑗
𝑐
𝑗=1
𝑟
𝑖=1
Equation 2.9: Formula for Chi-Square
Where Oij and Eij is the observed and expected frequencies respectively.
2.4.5 Gain Ratio
Gain Ratio (GR) ranks the attributes by compensating the bias for Information Gain
(IG). According to Chatcharaporn et al. [154] GR can be measured by:
45
𝐺𝑅 = 𝐼𝐺
𝐻(𝑋)
Equation 2.10: Formula for Gain Ratio
Where H(X) is entropy of X. The result of the GR is always between [0, 1]. GR = 1
means that X can completely predict Y, where Y is the variable to be predicted and
GR = 0 indicate no relation between X and Y.
2.4.6 Relief-F
Another statistical attribute selection technique used in this research is Relief-F (RF).
RF calculates weight for each feature using relationship between a feature and a specific
class to rank it [154]. This weight calculation is based on two types of nearest neighbor
probabilities. The first probability is calculated through two different classes with
different feature values and the other probability of weight computation is based on the
same class of two nearest neighbors with the same feature value [151].
2.4.7 Principal Component Analysis
PCA is another algorithm used to select relevant features without any loss of the useful
information. PCA can be used to avoid issues of over-fitting. PCA calculates new
variables from a large number of original variables by using projections. Therefore,
each new variable is based on the linear combination of the actual measurements. This
new variable contain information based on the whole data [155]. Different researches
[125, 149, 156, 157] use PCA for the analysis of pharmaceuticals.
2.5 Classification Techniques
For analysis and comparison, spectrums and images of the samples are compared with
reference data from the external library. Different similarity measures can be used like
46
Pearson’s correlation coefficients and Euclidean distance for this purpose [158].
Clustering is required for the identification of regions having similar spectral or image
characteristics, which provide information about chemical and physical properties of
the sample, their concentration and distribution. Clustering can be performed by using
unsupervised classification techniques, such as: K-means clustering and Fuzzy
clustering. These methods do not require any prior knowledge about the sample being
tested and helpful for the extraction of important features.
Some other supervised techniques can also be used for classification purposes. These
are known as, Supervised Classification methods and require prior knowledge about
the data. They use separate training and testing datasets for classification. Some
important supervised classification algorithms are Partial Least Squares, Artificial
Neural Networks, Naïve Bayes, K-Nearest Neighbors and Support Vector Machines.
The details of all these algorithms are given below.
2.5.1 Pearson’s Correlation Coefficient
Pearson’s Correlation Coefficient (PCC) is a measure to find similarity between two
objects. It calculates the linear association between two variables. It is also known as
the product moment correlation coefficient. Its value can be from [-1, +1] and
represented by r [159]. If the value of r is positive, it means there is a positive
correlation exists between the two variables (no fluctuations between the two). The
lower the value of r from +1, the greater the fluctuations exists between the data [160].
2.5.2 Euclidean Distance
Euclidean Distance (ED) is a distance-based measure to find out the similarity between
objects [161]. Hierarchical trees named dendrograms can be used to visualize these
distances. New objects are formed based on linking the objects having the smallest
47
distances. These newly formed objects are combined again in the same manner until all
the objects were linked. Wang et al. [162] presents a modified version of the original
ED to apply on images.
2.5.3 K-mean Clustering
K-mean clustering belongs to the field of SP for CA. It is used for vector quantization.
It helps in partitioning n observations into a k number of clusters. Each observation is
added to its relevant cluster based on the nearest mean. The formula for the k-mean
clustering is given in Equation 2.11 [163]. The k-mean clustering aims to partition n
observations (x1, x2, …, xn) into k (≤ n) clusters S = { S1, S2,…,Sk }. Each observation
can be a d dimensional vector. Observations are assigned to their relevant clusters if the
sum of square within a cluster is minimum.
𝑎𝑟𝑔𝑚𝑖𝑛𝑆
∑∑‖𝑋 − 𝜇𝑖‖2
𝑥∈𝑆𝑖
𝑘
𝑖=1
Equation 2.11: K-mean Clustering
Where, the mean of points in Si is represented by μi.
2.5.4 Fuzzy Clustering
The Fuzzy Clustering (FC) is based on fuzzy logic. In FC, every point does not
completely belong to a cluster but they have a certain degree of belongingness for a
cluster. The points lying in the center of a cluster may be in the cluster to a higher degree
than the points on the edge of the cluster. A set of coefficients from any point x
represents the degree of that point for being a part of the kth cluster. The degree of
belongingness is represented by wk(x). The centroid of a cluster is defined by
48
calculating mean of all points weighted by wk(x) for the c-mean FC as shown in
Equation 2.12 [164].
𝐶𝑘 = ∑ 𝑤𝑘 (𝑥)
𝑚 𝑥𝑥
∑ 𝑤𝑘 (𝑥)𝑚
𝑥
Equation 2.12: Centroid calculation for FC
wk(x) is inverse of the distance between x and the cluster center calculated in the
previous pass. Another important parameter is m that handles the assignment of weight
to the closest center. The degree of belongingness also depends on the parameter m. FC
also tries to minimize the intra-cluster variance but the results strongly depends on the
initial choice of weights. FC using c-mean can be used for the clustering of objects from
images.
2.5.5 Partial Least Square Discriminant Analysis
The Partial Least Square Discriminant Analysis (PLS-DA) is a supervised classification
technique that is based on PLS. It can be used when dimensionality reduction is
required. Latent variables are calculated for the classification of samples into different
groups [74]. The first step towards the implementation of PLS-DA is the application of
PLS regression model on the variables. These variables acts as indicators of the
classification groups. In next step, based on the largest predicted indicator variable, the
classification of the observations is performed [165].
2.5.6 Artificial Neural Networks
Artificial Neural Network (ANN) is an easy to implement classification algorithm
based on fewer parameters but slower in learning [166]. The ANN is a specific neuron
architecture based on a specific number of layers. Each layer further consists on a
49
number of neurons. The training of the network relies on an iterative process for the
adjustment of the weights related to input. This process results in an optimal prediction
of the sample data of the training set. Furthermore, this trained network can be used to
predict new unknown data [167]. Multiple algorithms exists for the implementation of
ANN. Perceptron learning algorithms are available to classify data, which are linearly
separable. Linearly separable means the instances of data can be categorized into their
correct classes by drawing a straight line or plane. When data is not linearly separable,
multi-layer neural networks (NN) can be used to categorize data into classes. Multilayer
NN consists of three layers: input, hidden and output. Input layer receives input, output
layer produces the result of classification and hidden layers helps in generating the right
output. Proper estimation about the size of hidden layers is very important as
underestimation can lead towards poor approximation, on the other hand, excessive
nodes results in over-fitting [168]. Wu et al. [169] use ANN with back propagation for
the classification between the drugs having different strengths. NIR spectra was used
as data in this research. The research also performs a comparison between training set
selection methods to choose the one that can produce best result when used in
combination with ANN. Kennard-Stone, D-optimal designs, Kohonen self-organized
mapping and on random are the four selection methods. The comparative analysis
describe that the Kennard-Stone is better than D-optimal designs. The performance of
Kohonen self-organized mapping is on third level and lastly the random selection
method.
2.5.7 Naïve Bayes
Naïve Bayes is a statistical learning algorithm that performs probabilistic classification
based on Bayesian networks [170]. Naïve Bayes performs training by estimating prior
and conditional probabilities from the dataset. Prior probability for a specific class is
50
calculated by dividing the count of training examples falling in that class by total
number of examples. On the other hand, conditional probabilities are based on the
frequency distribution of feature xi from the training data that belong to that specific
class [151]. Some important studies related to drugs using Naïve Bayes as a classifier
are [171-174]. NB is also suitable in the situation, when the training dataset is small in
number. It can easily estimate the classification parameters i.e. mean and variance from
the data [151]. According to Kotsiantis et al. [168], NB classification require short
computational time for training of the dataset.
2.5.8 K-Nearest Neighbor
K-Nearest Neighbor (KNN) is a simple but robust algorithm that can efficiently deal
with complex problems of classification. The classification of objects depends on the
majority votes of the neighbors [175]. This algorithm is based on two parameters i.e.
how many nearest neighbors must be considered while classification, it is denoted by
K, and the distance of features within a dataset to determine which data belong to which
group. The value of K must be a small positive integer. KNN was used for the
classification and analysis of pharmaceutical solid tablets by many researchers [70, 74,
90, 176].
2.5.9 Support Vector Machine
Support Vector Machine (SVM) uses linear equation built from the training data for
partitioning the dataset. Two main steps are involved in classification using SVM:
mapping and similarity measure. In the first step mapping of nonlinear data is
performed (input space to feature space) and in the second step, the kernel function is
used to measure similarity between feature vectors. It can handle large feature sets with
high accuracy [154]. Hou et al. [177] has used SVM models for the recognition of SH3
51
domain-peptide. Li et al. [70] used SVM with linear kernel function for the
classification of Raman spectra of Azithromycin tablet.
In another research, RS along with SVM was used to identify the pharmaceutical
tablets. The identification process was performed in two steps: identification of tablet
family and after that identification of the formulation of the tablet [178]. Some other
researches that uses SVM for the analysis of pharmaceutics includes [74, 156, 179].
SVM produces efficient results even the training data is not linearly separable [166].
Table 2.1 describes the comparison between various techniques that can be used for
quality assessment of drugs. These techniques are compared against different features.
Some of these techniques provide only spectral information about the sample while
others provide only spatial information. Multispectral and Hyperspectral imaging
techniques provide both spectral and spatial data of the sample. Both of these provide
much more detailed information about the sample being studied than any other. Some
of the techniques require sample preparation before the analysis that destroys the
sample so they are destructive. Drugs used in such kind of analysis cannot be used again
for any other purpose. Techniques that do not destroy the sample are known as non-
destructive techniques and are more appropriate when we do not want to waste the
sample. This table also provides comparison of these techniques against the time
required for the analysis, their processing complexities and the cost in terms of
machinery and work force.
According to the comparison, destructive techniques are more complex, time
consuming and costly, as they require sample preparation before analysis. XRD and
SEM are more suitable for SS drugs that are of crystalline form. NIRS and RS can be
used for solids and are nondestructive methods of analysis requiring no sample
preparation, but provide only spectral information of the sample.
52
On the other hand, MSI and HSI are also nondestructive methods of analysis but have
advantage over other techniques by providing both spectral and spatial data of the
sample. This allow application of image processing techniques along with different
classification methods for more detailed analysis. MSI is better than HSI as it require
less time for data processing because HSI contain data that are more redundant.
Therefore, it is more complex than MSI.
These chemical imaging techniques are also known as surface based techniques. Each
measurement captured from the penetration of the rays into the sample material
provides information about the surface of the sample. Homogeneity of the data captured
from the surface of the tablets represent its correctness. Analysis of the surface area of
the tablets can provide information about the correct shape, size, color, hardness and
dissolution. Homogeneous nature of the surface can also be used to determine
oxidization reaction of the APIs and excipients.
Table 2.2 provide a quick review and comparison between different researches in the
literature for the quality assessment of the drugs. NIR-CI is mostly used for the analysis
of solid medicines especially tablets. Analysis of tablets using NIR-CI provides
information about content uniformity, composition determination, identification of
counterfeits and tablet classification. This is based on both spectral and spatial data of
the medicines and mostly require no sample preparation. MSI can also be used for the
analysis of solid medicines especially tablets. Analysis through MSI can also be
beneficial even the medicines are in packaged form. Contrast enhancement, histograms,
binarization, noise reduction and gray scaling are commonly used image analysis
techniques. Classification process can be performed mostly using PLS, SVM, NB and
KNNs. PCA is the most common feature reduction technique. SDC, SNVC and MSC
are mainly used pre-processing techniques.
53
MSI systems can be used for acquisition of wavelengths representing multichannel
images of visible spectra known as RGB and going to IR wavelengths. IR region is
further classified into NIR, MIR and FIR.
54
Table 2.1: Comparison between various quality assessment techniques for drugs T
ech
niq
ue
Dosage
forms Applications
Sp
ectr
al
Info
rmati
on
Sp
ati
al
Info
rmati
on
Sam
ple
Pre
para
tion
Des
tru
cti
ve
(D)/
Non
-
Des
tru
cti
ve
(ND
)
Tim
e
Con
sum
pti
on
Com
ple
xit
y
Cost Disadvantage
HPLC Solids Raw ingredients
and final drug
testing
No No Yes D High Max High May lend to inaccurate
compound
categorization
MS Solids Detection of low
quantities in
compounds
Yes No Yes D High Moderate High Inability to
discriminate between
enantiomers, most
diastereomers, and salt
forms of drugs
SEM Solids,
SS
Determine
Particle
morphology and
size distribution
No Yes Yes D Mediu
m
Moderate High Characterize only
small area of a tablet
RGB
Imaging
Solids IE No Yes No ND Low Min Low Sensitivity depends on
detector device
55
Tec
hn
iqu
e
Dosage
forms Applications
Sp
ectr
al
Info
rmati
on
Sp
ati
al
Info
rmati
on
Sam
ple
Pre
para
tion
Des
tru
cti
ve
(D)/
Non
-
Des
tru
cti
ve
(ND
)
Tim
e
Con
sum
pti
on
Com
ple
xit
y
Cost Disadvantage
XRD SS Crystallinity
measurement,
Amount of API
determination
Yes No Little
/ No
Semi-D High Max High Cannot examine
solutions and non-
crystalline drug forms
NMR Liquids
, Solid
Crystallinity
measurement,
API and
excipients
interaction
investigation
Yes No Little
/ No
Semi-D Mediu
m
Moderate High More suitable for
liquid drugs.
NIRS Solids Monitoring final
quality of drugs,
identification of
organic
compounds and
Counterfeit drug,
Yes No No ND Low Min Low Low structural
selectivity
56
Tec
hn
iqu
e
Dosage
forms Applications
Sp
ectr
al
Info
rmati
on
Sp
ati
al
Info
rmati
on
Sam
ple
Pre
para
tion
Des
tru
cti
ve
(D)/
Non
-
Des
tru
cti
ve
(ND
)
Tim
e
Con
sum
pti
on
Com
ple
xit
y
Cost Disadvantage
quantitative
measurement of
API
RS Solids Crystallinity
measurement,
determination of
multi-
components,
Analysis of
Polymorphic
Forms
Yes No No ND Low Min Medium Not suitable for
Moisture analysis
MSI Solids,
Liquids
Tablet
identification /
composition
Determination,
Surface analysis
Yes Yes No ND Low Min Low
--
57
Tec
hn
iqu
e
Dosage
forms Applications
Sp
ectr
al
Info
rmati
on
Sp
ati
al
Info
rmati
on
Sam
ple
Pre
para
tion
Des
tru
cti
ve
(D)/
Non
-
Des
tru
cti
ve
(ND
)
Tim
e
Con
sum
pti
on
Com
ple
xit
y
Cost Disadvantage
HSI Solids,
Liquids
Distribution and
Identification of
counterfeit,
contaminated and
minor
components of
drugs,
Surface analysis
Yes Yes No ND High Max High Require large data
storage
58
Table 2.2: Comparison between different researches for the analysis of medicines R
efer
ence
Tec
hn
iqu
e
Dosage
Form
Type of
Processing
Sam
ple
Pre
para
tion
Features
Pre
pro
ces
sin
g
Fea
ture
Red
uct
ion
Seg
men
tati
on
/
IP
Cla
ssif
icati
on
Software
Sp
ectr
al
Sp
ati
al
[180] NIRS Tablets Whole Tablet
Uniform Content
Checking
Yes Yes No Mean
Centering
No No PLS - I PLS-IQ
[181] Mono-
chromat
ography
Solids Particle Size
Characterization
Yes No Yes No No Noise
Reduction,
Binarizatio
n,
Gray scale
Difference
Matrix
PLS -
[83] RS Solids Target substance
and Excipients
Discrimination
No Yes No FD,
Normaliz
ation
PCA No SVM,
K-NN,
Ripper,
NB, C4.5
Unscrambler,
MATLAB
59
Ref
eren
ce
Tec
hn
iqu
e
Dosage
Form
Type of
Processing
Sam
ple
Pre
para
tion
Features
Pre
pro
ces
sin
g
Fea
ture
Red
uct
ion
Seg
men
tati
on
/
IP
Cla
ssif
icati
on
Software
Sp
ectr
al
Sp
ati
al
[182] NIR-CI Tablets Content
uniformity
Checking
No Yes Yes No No Histograms MCR,
Alternatin
g Least
Squares
TS Capture,
MATLAB
[125] MSI Tablets Multiple Tablet
Simultaneous
identification /
composition
Determination
No Yes Yes No No No PCA -
[100] NIR-CI Tablets Counterfeit
tablet
Identification
Yes Yes Yes No PCA No PLS Unscrambler
[183] NIR-CI Tablets Tablet
classification /
sourcing
No Yes Yes MSC,
SNVC,
SDC
PCA Histograms k-Means
Clustering
ISys,
Matlab
60
Ref
eren
ce
Tec
hn
iqu
e
Dosage
Form
Type of
Processing
Sam
ple
Pre
para
tion
Features
Pre
pro
ces
sin
g
Fea
ture
Red
uct
ion
Seg
men
tati
on
/
IP
Cla
ssif
icati
on
Software
Sp
ectr
al
Sp
ati
al
[97] NIR-CI Solids Chemical Image
generation
No Yes Yes SDC,
SNVC,
MSC
No Noise
Removal
PLS -I,
Classical
Least
Squares
MATLAB,
PLS Toolbox
[184] CSLM Solids Coating
Thickness and
Pore
Distribution
evaluation
No No Yes No No Binarizatio
n, Image
Contrast
Enhanceme
nt
Fuzzy c-
Means
Cluster,
ED
MATLAB
[185] NIR-CI Tablets Composition
Determination
No Yes Yes No No No Classical
Least
Squares
-
[178] RS Tablets Identification of
tablets
No Yes No PLS No No SVM -
61
Ref
eren
ce
Tec
hn
iqu
e
Dosage
Form
Type of
Processing
Sam
ple
Pre
para
tion
Features
Pre
pro
ces
sin
g
Fea
ture
Red
uct
ion
Seg
men
tati
on
/
IP
Cla
ssif
icati
on
Software
Sp
ectr
al
Sp
ati
al
[75] RS Tablets Detection of
counterfeit
tablets
No Yes No - No No PCA.
HCA
MATLAB
[53] NIRS Tablets Screening of the
suspected
counterfeit
tablets
No Yes No No No No PCA MATLAB
[51] NIRS Tablets Counterfeit
detection
No Yes No No No No PCA MATLAB,
Unscrembler
[69] Portable
RS
Tablets Counterfeit
detection
No Yes No WT No No PCA,
LSLS
MATLAB
[54] NIRS Tablets,
Capsule
identification of
clinical study
lots
No Yes No SNVC Fisher
criterion
, FT,
PCA
No LDA,
QDA,
KNN
MATLAB
62
Ref
eren
ce
Tec
hn
iqu
e
Dosage
Form
Type of
Processing
Sam
ple
Pre
para
tion
Features
Pre
pro
ces
sin
g
Fea
ture
Red
uct
ion
Seg
men
tati
on
/
IP
Cla
ssif
icati
on
Software
Sp
ectr
al
Sp
ati
al
[74] RS Tablets Identification of
expired drugs
No Yes No SDC, FD,
SD, MN
No No PLS-DA,
KNN,
SVM
MATLAB
[50] NIRS Tablets Screening of
counterfeit
tablets
No Yes No SNVC,
MSC
NO No PCA, CA,
SIMCA
-
[107] NIR-CI Tablets Distribution
assessment of
major and minor
components and
their
quantification
No Yes Yes SDC,
SNVC
No No PLSR,
MCR,
CLS
-
[108] NIR-
HSI
Tablets Assessment of
APIs and
excipients from
No Yes Yes SDC,
SNVC
No No MCR MATLAB
63
Ref
eren
ce
Tec
hn
iqu
e
Dosage
Form
Type of
Processing
Sam
ple
Pre
para
tion
Features
Pre
pro
ces
sin
g
Fea
ture
Red
uct
ion
Seg
men
tati
on
/
IP
Cla
ssif
icati
on
Software
Sp
ectr
al
Sp
ati
al
the surface of the
tablet
[94] RGB
Imaging
Tablets Identification of
counterfeit
tablets
No No Yes Binarizati
on
No Morphologi
cal
operations
Bhattacha
ryya
distance
MATLAB
[92] RGB
Imaging
Tablets Illicit tablet
matching and
retrieval system
No No Yes Segmenta
tion,
smoothin
g
No Edge
detection,
Boundary
removal
L2-norm MATLAB
[110] NIR-CI Tablets Determination of
excipients and
coating
distribution
No Yes Yes SNVC No No PLS MATLAB
CHAPTER NO. 3
PROPOSED APPROACH –
MICROSCOPIC IMAGING
64
In this chapter, our focus is to develop a methodology based on high-resolution
Microscopic Imaging (MI). The proposed approach will use combination of IP and ML
techniques for the classification of these SPPs into DSPP and NSPP. This part of the
research is aimed at formulating a new nondestructive method, which is based on the
surface analysis of SPPs for their above said classification.
3.1 Microscopic Imaging
In this part, our focus is on an analysis based on surface morphology of the SPP using IP
and ML techniques. The surface of an SPP can effectively represent various characteristics.
In proposed approach, we are using the high resolution microscopic surface images of SPPs
for the classification between DSPP and NSPP. The proposed approach mainly consists of
five phases: image acquisition, preprocessing, feature extraction, feature reduction and
classification. The main flow of the proposed approach is shown in Figure 3.1.
Figure 3.1: Basic flow of the proposed MI approach
The first phase is based on the acquisition of the surface images of the SPPs. This is
followed by the preprocessing phase in which input images are prepared for further
analysis. The preprocessed images are then passed to feature extraction phase to extract
different textural features, which will be stored as Feature Vector (FV). In the next phase,
NSPP
Enhanced Image Color Image
FV
Reduced FV
Preprocessing Feature
Extraction
Feature
Reduction Classification
DSPP
Image
Acquisition
SPP
65
feature reduction techniques are applied on the extracted FV to reduce its dimensionality. In
the end, the last phase classifies the images into two categories i.e. DSPP and NSPP based
on their selected features. The detailed proposed approach is shown in Figure 3.2.
3.1.1 Image Acquisition
We have created nine different datasets for the experimentation of the proposed approach.
Each dataset comprises the images of defective and non-defective versions of ten different
SPPs. These images are captured using Labomed 5MP digital camera mounted on Nikon
Eclipse LV100 microscope [186] with a resolution of 2580 x 1944. We have considered
three major environmental factors i.e. temperature, moisture and humidity to expose the
surface of the SPPs. We have created our own datasets for the analysis as the datasets for
the environment affected SPPs are not available publically
Three datasets are created for the SPPs affected by temperature and labeled as T1, T2 and
T3. T1 consists of images of the SPPs, which are placed in an area having 200⁰C
temperature for five minutes and their non-defective versions. Similarly, T2 and T3 contain
images of defective and non-defective SPPs placed in 240⁰C and 280⁰C for five minutes
respectively. In the same way, three datasets are created for humidity factor labeled as H1,
H2 and H3. Defective SPPs in H1 are placed out of their packaging (in open air) for three
days, similarly H2 and H3 contain images of the SPPs that remain out of their packaging
for two and one day respectively. Another three datasets are created for the SPP images
affected by moisture. Moisture affected SPP images were captured after affecting ten SPPs
at day 1, ten at day 2 and ten at day 3 with different levels of moisture (liquid water) exposed
to them and these datasets of the SPPs are referred to as M1, M2 and M3 respectively. A
brief description of datasets is given in Table 3.1.
66
Figure 3.2: Detailed diagram of the proposed MI approach
Figure 3.3 shows some of the images of the datasets used in this research. In each part of
this figure, first four images are of environment-affected SPPs and last four are of their
non-defective versions. Figure 3.3: (a), (b) and (c) parts shows SPP images of datasets H1,
H2 and H3 which are affected by humidity. Similarly, parts (d), (e) and (f) display SPPs
SPP
Feature Extraction
GLCM RLM Histogram AR
Model HAAR
Wavelet
Grayscale
Conversion
Contrast
Enhancement
Preprocessing
Feature
Reduction
Classification
DSPP NSPP
Image
Acquisition
Color Image
Gray Image
Enhanced Image
GLCM
FV
RLM FV Hist FV HAAR
FV ARM
FV
Combined FV
Reduced FV
67
affected by temperature and labeled as T1, T2 and T3. Figure 3.3 part (g), (h) and (i)
represent SPPs of datasets M1, M2 and M3 respectively. Each of these three datasets
belongs to moisture-affected SPPs.
Table 3.1: Dataset description
Environmental
Factors
Dataset
No. of
DSPP
No. of
NSPP
Humidity
H1 11 17
H2 13 17
H3 13 17
Temperature
T1 13 17
T2 15 17
T3 19 17
Moisture
M1 14 17
M2 15 17
M3 16 17
3.1.2 Preprocessing
Preprocessing consists of algorithms that can be used for IE and noise removal. After image
acquisition, preprocessing is an essential step to prepare the captured images for the feature
extraction. Preprocessing is performed in two steps i.e. Grayscale Conversion and IE.
68
3.1.2.1 Grayscale Conversion
Texture analysis is used in different machine vision problems such as surface inspection
and classification. We can define texture as the spatial distribution of different gray levels
in a neighborhood. To perform textural analysis it is important to convert color image into
grayscale image.
Figure 3.3: The sample images of the DSPPs and NSPPs in each dataset (a) images
contained in dataset H1 (b) images contained in dataset H2 (c) images contained in
dataset H3 (d) images contained in dataset T1 (e) images contained in dataset T2 (f)
images contained in dataset T3 (g) images contained in dataset M1 (h) images contained
in dataset M2 (i) images contained in dataset M3
3.1.2.2 Contrast Enhancement
IE is very important to improve the quality of the input image. The enhancement technique
used in the proposed approach is contrast enhancement. The increase in image contrast is
69
performed using the formula given in Equation 3.1 [187], which is based on saturating 1%
of the data at high and low gray intensity values of the input image.
𝐶𝐸 (𝑖, 𝑗) =
{
255, 𝑖𝑓𝑓(𝑖, 𝑗) > ℎ
0, 𝑖𝑓 𝑓(𝑖, 𝑗) < 𝑙
min (𝑓(𝑖, 𝑗) − 1
ℎ − 𝑙 , 255) , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Equation 3.1: Contrast enhancement formula
Where
CE(i , j) = Contrast enhancement at pixel i , j
f(i , j) = image intensity at a particular index i , j
h = high intensity of the image
l = low intensity of the image
3.1.3 Feature Extraction
After applying preprocessing on the input image, we need to perform feature extraction to
quantify surface of the image through different parameters. Analysis of the surface of the
SPPs through its texture can provide great help in classifying them into defective and non-
defective SPPs. Different textural features used in this study are Gray level Co-occurrence
Matrix (GLCM), Histogram, Run Length Matrix (RLM), Autoregressive (AR) Model and
HAAR Wavelet features. The details of these features are available in Chapter 2. A total
281 textural features are extracted from each of the preprocessed image using MaZda
(Texture Analysis Software) designed by Szczypinski et al. [188]. The creation of jth dataset
is shown in Equation 3.2; here the value of j is from 1 to 9.
70
𝑑𝑎𝑡𝑎𝑠𝑒𝑡𝑗 = ⋃𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠(𝐼𝑘)
𝐿
𝐾=1
Equation 3.2: Formula for dataset representation
Ik is the kth input image
features (Ik) is the feature set of the kth image
L is the total number of images for each dataset
The detail of these features used in this research is given below.
3.1.3.1 Gray-level Co-occurrence Matrix
MaZda provide eleven features extracted from GLCM. These are angular second moment,
contrast, correlation, sum of squares, inverse difference moment, sum average, sum
variance, sum entropy, entropy, difference variance and difference entropy. In this research
we have computed GLCM features for five between-pixel distances (1, 2, 3, 4, and 5). So
total 220 features are extracted.
3.1.3.2 Histogram Features
MaZda provides nine histogram features in total from which we have chosen four: Mean
(histogram’s mean), Variance (histogram’s variance), Skewness (histogram’s skewness)
and Kurtosis (histogram’s kurtosis).
3.1.3.3 Run Length Matrix
Twenty features are extracted for RLM using MaZda in this research. These features are
run-length non-uniformity, grey level non-uniformity, long run emphasis, short run
emphasis and fraction of image in runs. Each feature is computed in four different
directions (horizontally, vertically, 45 degree and 135 degree).
71
3.1.3.4 Autoregressive Model
MaZda provides five different features based on AR model. These are Teta1 (parameter
θ1), Teta2 (parameter θ2), Teta3 (parameter θ3), Teta4 (parameter θ4) and Sigma
(parameter σ).
3.1.3.5 Wavelet Transformations
Wavelet energy features based on HAAR wavelet are measured at eight scales using four
bands of frequency (LL, LH, HL and HH) using MaZda. This provides a total number of
32 features.
3.1.4 Feature Reduction
The feature extraction phase results in 281 different features, which are very hard to deal
with. Therefore, for better results it is important to reduce the dimensionality of the feature
sets. For this reason, two reduced feature sets were derived from the original 281 features.
First set is based on the features selected using feature reduction algorithms. Three different
feature reduction algorithms are used in this research for extracting the most promising
features that can lead us towards the correct classification between DSPP and NSPP. These
three algorithms are Chi-Square (CS), Gain Ratio (GR), and Relief-F (RF). Feature
reduction is performed using each of the three techniques. The size of the reduced feature
sets is selected as 15. Therefore, in result 15 top ranked features against each feature
reduction algorithm are extracted from the complete feature set. Feature reduction is
performed using an ML based software named WEKA developed by Hall et al. [189]. All
of these feature selection algorithms are used along with Ranker search algorithm.
It is observed that top 15 features extracted from both GR and RF for our dataset are the
same. The names of these selected features are given in Table 3.2. It can be noticed from
72
this table that according to the CS, 14 features are related to angular second moment
(AngScMom) from multiple distances and one is inverse difference moment (InvDfMom)
at distance 4 from these fifteen features. All of these features lies under GLCM. On the
other hand, 14 features selected from GR are related to GLCM and one is Wavelet Energy
from HAAR Wavelet features.
The second reduced feature set is based on the top two features from overall 281 features.
These top two features were extracted on the base of multiple experiments. For the
experimentation, we have used the overall 281 feature in multiple combinations of different
lengths as feature sets. These feature sets are then used as input to the classifiers. The
comparison of the accuracies was performed. The analysis showed that the use “S (5, 0)
Entropy” (entropy at distance 5) and “Horzl_GLevNonU” (horizontal grey level non-
uniformity) in combination is a better choice for the classification of the DSPPs and NSPPs.
Entropy measure is from GLCM and Horzl_GlevNonU is from RLM.
3.1.5 Classification
The evaluation of the features extracted from the images of the SPPs is performed using
four different types of classification algorithms i.e. SVM, KNN, NB and Ensemble of
Classifiers (EC). In this research, we have performed a comparison between the accuracies
achieved from these classifiers. All experimental work for this research is performed using
MATLAB [190]. Classification is performed by using all 281, and the 2 reduced feature
sets based on top 15 and 2 selected features. The details of all these classifiers can be found
in Chapter 2.
73
Table 3.2: List of top 15 selected features from CS, GR and RF
Rank
Features
Chi-Square Gain Ratio / Relief-F
1 S(5,5)AngScMom WavEnHH_s-8
2 S(4,-4)AngScMom S(3,0)SumOfSqs
3 S(0,4)AngScMom S(3,0)Contrast
4 S(0,2)AngScMom S(3,0)AngScMom
5 S(4,4)AngScMom S(2,-2)DifEntrp
6 S(2,2)AngScMom S(2,-2)DifVarnc
7 S(4,0)AngScMom S(2,-2)Entropy
8 S(2,-2)AngScMom S(3,0)Correlat
9 S(0,3)AngScMom S(3,0)InvDfMom
10 S(3,-3)AngScMom S(2,-2)SumVarnc
11 S(3,0)AngScMom S(3,0)SumAverg
12 S(4,4)InvDfMom S(3,0)DifEntrp
13 S(2,0)AngScMom S(3,0)DifVarnc
14 S(3,3)AngScMom S(3,0)Entropy
15 S(1,-1)AngScMom S(3,0)SumEntrp
In the proposed approach, we have used the value of k as two for the implementation of
KNN. Therefore, two nearest neighbors (2NN) with ‘Cosine’ as a distance metric is used
for the classification between NSPP and DSPP. For SVM, the training of the datasets is
74
performed using linear kernel function with Sequential Minimal Optimization (SMO)
method for separating hyperplanes. The implementation of NB is performed using Normal
Gaussian Distribution.
Ensemble of Classifiers (EC) is also used for the classification between DSPP and NSPP
in this research. The working of EC is voting based, i.e. a class having maximum votes
from the set of base classifiers is assigned to the test instance. We have used SVM, KNN,
NB as base classifiers in this research, and implemented using MATLAB software. The
purpose of EC is to enhance the performance of the base classifiers. Equation 3.3 explains
the formula for EC.
𝐸𝐶 = {1 𝑖𝑓 ∑𝐵𝑎𝑠𝑒_𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑟_𝐶𝑙𝑎𝑠𝑠
𝑛
1
> 𝑛
2
−1 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Equation 3.3: Formula for EC
Where
‘1’ represents NSPP class
‘-1’ represents DSPP class
‘n’ is the total number of base classifiers
CHAPTER NO. 4
PROPOSED APPROACH –
MULTISPECTRAL ANALYSIS
75
In this chapter, we have proposed another approach based on Multispectral Analysis (MA)
for the classification between DSPP and NSPP. The literature indicates that MA is a very
effective technique for the analysis of SPPs. The MA of the spectral data along with various
ML and SP techniques is used for the required classification.
4.1 Multispectral Analysis
The proposed approach is based on four main phases i.e. Spectrum Acquisition,
Preprocessing, Feature Extraction and Classification. The basic flow of the proposed
approach is shown in Figure 4.1.
Figure 4.1: Flow of the proposed MA approach
4.1.1 Spectrum Acquisition
Spectrum acquisition is made using µRamboss Raman Spectrometer (Dongwoo Optron,
South Korea) [191] for all of the nine datasets. All spectrums are collected using He-Cd
laser with a source excitation of 442 nm and laser power of 40 mW. A complete spectrum
of a single SPP consists of 1024 data points over the range of 106 nm – 2805 nm. The
‘Andor Solis for Spectroscopy’ software used for data acquisition.
SPP
Wav
elet
Coefficien
ts
IR/ UV/ Visible
Spectrum Raw
Spectrum Spectrum
Acquisition Preprocessing
Feature
Extraction
Classification
DSPP NSPP
76
For spectrum acquisition, again we have created the required datasets based on environment
affected SPPs. Fourteen general purpose SPPs that are commonly prescribed by the
physicians to the patients against different diseases were used in the experimentation of this
approach. We have again created nine datasets on the same conditions as used in MI based
approach. Each of these nine datasets consists of fourteen DSPPs and fourteen NSPPs.
Therefore, twenty-eight SPPs in each dataset. DSPP are again prepared using three
environmental factors i.e. temperature, moisture and humidity. Temperature affected
DSPPs are further divided into three groups labeled as T1, T2 and T3, by placing them in
a preheated environment with a temperature of 200⁰C, 240⁰C and 280⁰C respectively for
five minutes. Similarly, three groups of moisture affected DSPPs namely M1, M2, and M3
are created by exposing with a different amount of liquid water at day 1, day 2 and day 3
respectively. Groups of humidity-affected SPPs are called as H1, H2 and H3. H1 SPPs are
placed out of their packaging for three days, H2 for two days and H3 for one day.
4.1.2 Preprocessing
The extracted spectrum consists of 1024 points, which is in fact quite a large number of
variables to process. For further processing, we have divided the spectra of each dataset
into three ranges i.e. UV (10 nm – 380 nm), Visible (380 nm – 750 nm) and IR (above 750
nm). Now UV range consists of 91 data points, Visible of 127 and IR of 806 points.
Figure 4.2 shows sample spectral data of Non-Defective and Defective SPPs within the UV
wavelength range. Similarly, Figure 4.3 represents the spectra of the same SPP within the
visible wavelength. (a), (b) and (c) parts of both of the figures, represent spectra of SPP
affected by each of the three environmental factors respectively.
77
Figure 4.2: Multispectral data for NSPP and DSPP within UV wavelength. (a) Spectra of
NSPP and humidity affected DSPP datasets H1, H2 and H3. (b) Spectra of NSPP and
temperature affected DSPP datasets T1, T2 and T3. (c) Spectra of NSPP and moisture
affected DSPP datasets M1, M2 and M3.
In the same way, Figure 4.4 (a) shows multispectral data of the NSPP and humidity affected
DSPP within the IR wavelength. Figure 4.4 (b) and (c) parts represent spectra of
temperature and moisture affected DSPP along with their NSPP correspondingly.
78
Figure 4.3: Multispectral data for NSPP and DSPP within Visible wavelength. (a) Spectra
of NSPP and humidity affected DSPP datasets H1, H2 and H3. (b) Spectra of NSPP and
temperature affected DSPP datasets T1, T2 and T3. (c) Spectra of NSPP and moisture
affected DSPP
4.1.3 Feature Extraction
For further processing, we need to extract some features from multispectral data that can
help in a proper classification between DSPP and NSPP. In this research, we have used
Wavelet Coefficients (WC) of a signal as features for the classification purpose.
79
4.1.3.1 Wavelet Transformation
In this research, we are using WT to gain both of its advantages: spectrum de-noising and
dimensionality reduction. In this research, we ae using only detailed coefficients because
they represent a good match between spectra and wavelet function. The MATLAB software
package is used for the computations of WC in this research. We have performed multiple
experiments, using Daubechies, Coiflets, Symlets, Discrete Meyer, Biorthogonal and
Reverse Biorthogonal wavelet functions along with different decomposition levels (2 -5)
to achieve best results. The detailed coefficients are then used to train the classifiers for
evaluating the performance of the proposed approach.
4.1.4 Classification
The last phase of the proposed approach is the classification based on feature sets extracted
from the previous phase. We have used four classification algorithms in this research i.e.
SVM, KNN, NB and EC. In this research, we are also performing comparison of accuracies
achieved using all the four classifiers. The detail of these algorithms is given in Chapter 2.
80
Figure 4.4: Multispectral data for NSPP and DSPP within IR wavelength. (a) Spectra of
NSPP and humidity affected DSPP datasets H1, H2 and H3. (b) Spectra of NSPP and
temperature affected DSPP datasets T1, T2 and T3. (c) Spectra of NSPP and moisture
affected DSPP
CHAPTER NO. 5
ANALYSIS AND DISCUSSION
81
In this research, we have proposed two different approaches for the classification between
DSPP and NSPP. One is based on two-dimensional MI data and the other one is based on
single dimensional multispectral data. We have applied statistical textural feature extraction
techniques on MI data. While, on the other hand, we have used WT for MA as a feature
set. The results gained by using both of the approaches is given below.
5.1 Microscopic Imaging
In this research, we have evaluated the accuracy of the proposed approach using two
different experiments; one is using LOO cross-validation and the other one is using Holdout
Validation (HV). LOO is a validation technique that uses N-1 samples as training set and
the remaining one sample as test set from a total of N samples. While the HV validation
method is based on the use of training and testing datasets separately. As in real-time, we
do not have any information about which of the environmental factors have affected the
SPP that we have to test. Therefore, for this reason, we have created three new combined
datasets based on humidity, moisture and temperature affected SPPs. To avoid an element
of bias we have extracted the subsets of five SPPs from each of the datasets H1, H2 and H3
individually and placed them into a new dataset named ‘H1,H2,H3’. Same for the moisture
and temperature datasets and these are named as ‘T1,T2,T3’ and ‘M1,M2,M3’ respectively.
So now, each of these three datasets consists of fifteen NSPPs and fifteen DSPPs. A fourth
dataset was also created which is, in fact, again a combination of randomly selected thirty
SPPs from original nine datasets named as ‘HTM’.
In Experiment I, Leave-one-out (LOO) Cross Validation method is used for the evaluation
of the proposed approach. LOO cross-validation is firstly applied on each individual
dataset, then on combined datasets of each environmental factor and in the last, over a
combined dataset of all environmental factors. Three classifiers are used in this experiment
82
namely, SVM, KNN and NB. In Experiment II, using separate training and testing datasets
(HV model), we have evaluated the accuracy of the proposed approach. Each dataset is
divided into two equal halves, so 50% of the data is used for training the proposed approach
and 50% of the remaining data is used for the testing. Classification accuracy of the
proposed approach is measured using four different types of classifiers (SVM, KNN, NB
and EC). The feature vector is formed using 281 texture-based features extracted from the
preprocessed images.
In Experiment I, firstly, we have used whole 281 features as feature vector and evaluated
the performance of the proposed approach using all of the classifiers based on LOO cross-
validation. Further, we have applied the classification process on each SPP dataset
individually and then on combined datasets. Table 5.1 contains the results of LOO cross
validation using 281 features.
A graphical representation of the accuracy of each classifier is shown in Figure 5.1. Results
show that maximum accuracy is achieved by using SVM classifier for most of the datasets.
Classification accuracies against moisture-affected SPPs are higher than the other two
factors. From humidity affected SPP datasets, it can be seen that the humidity affects the
surface of the solid SPPs very slowly that is why they have low classification rate. Same
results are being reflected by the accuracies of the combined datasets.
83
Table 5.1: LOO results for all individual and combined datasets using 281 features
Datasets No. of
Features
SVM KNN NB
Acc Sn Sp Acc Sn Sp Acc Sn Sp
H1 281 57 65 45 68 65 73 50 47 55
H2 281 57 59 54 50 41 62 53 47 62
H3 281 63 65 62 37 29 46 47 35 62
T1 281 57 59 54 53 59 46 60 47 77
T2 281 72 71 73 78 82 73 66 65 67
T3 281 69 71 68 64 59 68 69 88 53
M1 281 81 82 79 68 59 79 84 82 86
M2 281 72 76 67 78 76 80 69 65 73
M3 281 88 82 94 88 88 88 85 94 75
H1, H2, H3 281 67 53 73 63 18 84 57 35 68
T1, T2, T3 281 69 59 72 75 41 87 61 71 57
M1,M2, M3 281 85 71 91 82 59 91 84 76 87
HTM 281 84 53 88 86 12 95 64 59 65
From Table 5.1 to Table 5.12, ‘Acc’ is for accuracy, ‘Sn’ for sensitivity and ‘Sp’ for
specificity.
84
Figure 5.1: LOO results against all individual and combined datasets using 281 features
After that, LOO cross-validation is applied on the selected top 15 features. Against most of
the datasets, SVM provides the maximum level of accuracies. Classification accuracies are
calculated again using three classifiers against top 15 selected features and it can be seen
from results that feature sets extracted from the CS provide higher accuracies as compared
to GR. The comparison of results using top 15 features is shown in Table 5.2. Overall SVM
and KNN provide higher accuracies using CS for the classification for all individual
datasets of the SPPs. SVM provides maximum 90.32% accuracy for M1 Dataset using CS
while KNN provides 90.91% accuracy for M3 using GR. From the results it can be
highlighted that moisture affected SPPs have higher classification rate.
In case of combined datasets of SPPs, high rate of correct classification achieved for
moisture affected SPPs using CS and SVM. Temperature and humidity affected SPPs have
relatively lower classification accuracies as compared to moisture. In case of completely
combined datasets, maximum 86.30% accuracy achieved using KNN classifier. Figure 5.2
0
20
40
60
80
100A
CC
UR
AC
IES
TABLET DATASETS
LOO RESULTS USING ALL 281 FEATURES
SVM KNN NB
85
shows the accuracies of individual and combined SPP datasets by using CS and GR feature
sets.
In the last of Experiment I, we have evaluated the accuracy of the proposed approach
against top two features selected from 281 features. As we already mentioned that these
two features are selected by making combinations of two from 281 features and then
selecting a pair of features providing maximum accuracy.
Figure 5.2: LOO results against all individual and combined datasets using top 15 features
0
20
40
60
80
100
AC
CU
RA
CIE
S
TABLET DATASETS
LOO RESULTS USING TOP 15 FEATURES
THROUGH CS AND GR
CS-SVM CS-KNN CS-NB
GR-SVM GR-KNN GR-NB
86
Table 5.2: LOO results for all individual and combined datasets using top 15 features
Datasets No. of Features FR-Algo
SVM KNN NB
Acc Sn Sp Acc Sn Sp Acc Sn Sp
H1 15 CS 54 47 64 71 82 55 57 41 82
H2 15 CS 53 47 62 57 59 54 53 41 69
H3 15 CS 50 47 54 53 65 38 60 47 77
T1 15 CS 70 82 54 63 71 54 53 35 77
T2 15 CS 72 94 47 72 76 67 53 41 67
T3 15 CS 78 94 63 75 71 79 67 53 79
M1 15 CS 90 88 93 71 76 64 58 53 64
87
Datasets No. of Features FR-Algo
SVM KNN NB
Acc Sn Sp Acc Sn Sp Acc Sn Sp
M2 15 CS 78 94 60 72 76 67 56 35 80
M3 15 CS 88 94 81 73 71 75 67 47 88
H1, H2, H3 15 CS 67 41 78 74 53 84 63 35 76
T1, T2, T3 15 CS 66 94 55 80 53 89 67 41 77
M1,M2, M3 15 CS 85 94 82 77 59 84 69 29 84
HTM 15 CS 64 94 60 86 35 93 75 29 81
H1 15 GR 61 65 55 68 76 55 43 47 36
H2 15 GR 43 47 38 57 71 38 53 47 62
88
Datasets No. of Features FR-Algo
SVM KNN NB
Acc Sn Sp Acc Sn Sp Acc Sn Sp
H3 15 GR 43 35 54 40 47 31 43 35 54
T1 15 GR 60 59 62 57 59 54 57 47 69
T2 15 GR 72 71 73 75 82 67 59 53 67
T3 15 GR 72 88 58 64 65 63 67 82 53
M1 15 GR 81 76 86 74 76 71 77 76 79
M2 15 GR 72 71 73 72 71 73 63 65 60
M3 15 GR 79 88 69 91 94 88 85 94 75
H1, H2, H3 15 GR 48 35 54 59 35 70 61 35 73
89
Datasets No. of Features FR-Algo
SVM KNN NB
Acc Sn Sp Acc Sn Sp Acc Sn Sp
T1, T2, T3 15 GR 66 71 64 73 47 83 61 47 66
M1,M2, M3 15 GR 77 71 80 79 65 84 79 65 84
HTM 15 GR 61 65 60 83 24 91 77 47 81
90
Table 5.3 shows the accuracies of individual and combined datasets using top two features.
LOO cross-validation using top two features again provide maximum classification rates
for moisture affected datasets through SVM. In case of combined dataset NB provide
maximum classification accuracy i.e. 91% but with low sensitivity rate, that is 29%. This
is depicted in Figure 5.3.
Figure 5.3: LOO results against all individual and combined datasets using top 2 features
Similarly in Experiment II, we have evaluated the accuracies of the proposed approach
through the HV model against all 281, selected top 15 and top 2 features. All accuracies in
this experiment are calculated by providing the test datasets to the already trained models.
0
20
40
60
80
100
AC
CU
RA
CIE
S
TABLET DATASETS
LOO RESULTS USING TOP TWO FEATURES
SVM KNN NB
91
Table 5.3: LOO results for all individual and combined datasets using top 2 features
Datasets No. of
Features Feature Name
SVM KNN NB
Acc Sn Sp Acc Sn Sp Acc Sn Sp
H1 2 A189 and A226 61 41 91 46 59 27 46 47 45
H2 2 A189 and A226 53 47 62 67 76 54 37 41 31
H3 2 A189 and A226 50 47 54 47 53 38 40 47 31
T1 2 A189 and A226 63 53 77 50 53 46 50 47 54
T2 2 A189 and A226 56 65 47 66 65 67 69 59 80
T3 2 A189 and A226 67 59 74 75 71 79 67 65 68
M1 2 A189 and A226 81 71 93 61 71 50 65 59 71
M2 2 A189 and A226 72 76 67 47 53 40 63 59 67
M3 2 A189 and A226 76 76 75 61 59 63 76 65 88
H1,H2,H3 2 A189 and A226 61 35 73 63 35 76 72 29 92
92
Datasets No. of
Features Feature Name
SVM KNN NB
Acc Sn Sp Acc Sn Sp Acc Sn Sp
T1,T2,T3 2 A189 and A226 69 53 74 80 47 91 75 35 89
M1,M2,M3 2 A189 and A226 76 71 78 65 41 73 73 29 89
HTM 2 A189 and A226 73 71 73 83 24 91 91 29 99
93
Table 5.4 shows the test results against all 281 features. In case of combined humidity
dataset, 69% accuracy achieved through SVM and EC. Higher accuracies achieved using
combined datasets of temperature and moisture affected SPPs i.e. 81% for both of them
using EC. However, the classification rate of proposed approach reduces when applied on
over all combined dataset. Against individual datasets like humidity, temperature and
moisture, EC provides accuracies higher than all other three classifiers except for T3. NB
and EC both provide 94% accuracy for M1 and for T2, 88%. In case of H1, SVM and EC
both provides maximum accuracy i.e. 79%. Figure 5.4 shows the results in graphical form.
Figure 5.4: HV results against all individual and combined datasets using all 281 features
0
20
40
60
80
100
AC
CU
RA
CIE
S
DATASETS
HV RESULTS USING ALL 281 FEATURES
SVM KNN NB EC
94
Table 5.4: Accuracies for test datasets using 281 features
Datasets
No. of
features
SVM KNN NB EC
Acc Sn Sp Acc Sn Sp Acc Sn Sp Acc Sn Sp
H1 281 79 80 43 57 60 29 57 40 57 79 80 75
H2 281 67 56 83 60 67 50 60 67 50 73 67 83
H3 281 60 67 50 40 33 50 67 78 50 67 67 67
T1 281 73 67 83 53 44 67 60 56 67 73 67 83
T2 281 81 78 86 69 67 71 88 78 100 88 89 86
T3 281 61 75 50 61 63 60 61 75 50 56 63 50
M1 281 88 89 86 75 56 100 94 100 86 94 89 100
95
Datasets
No. of
features
SVM KNN NB EC
Acc Sn Sp Acc Sn Sp Acc Sn Sp Acc Sn Sp
M2 281 75 78 71 75 67 86 75 78 71 81 78 86
M3 281 88 100 75 82 78 88 88 100 75 88 100 75
H1,H2,H3 281 69 100 38 63 63 63 50 63 38 69 100 38
T1,T2,T3 281 75 63 88 75 88 63 56 88 25 81 88 75
M1,M2,M3 281 75 63 88 81 88 75 75 63 88 81 75 88
HTM 281 44 63 25 69 63 75 56 63 50 63 75 50
96
The test results against selected top 15 features are shown in Table 5.5. In some
situations, features selected from CS outer performs while for others GR provide
higher accuracies. NB provide relatively low accuracies than SVM and KNN. A
maximum of 94% accuracy achieved against T2 and M1 datasets using GR and
SVM. In case of M2 and M3, 88% accuracy achieved again by using GR. SVM and
EC outer-performs against both of these datasets. The correct classification rate of
humidity affected SPPs is relatively lower than those of the others. When the trained
model is tested on combined datasets, maximum 88% accuracy is achieved against
moisture and temperature affected SPP datasets using both CS and GR. An accuracy
of 75% achieved when the proposed approach is tested for overall combined
datasets (HTM). This is achieved using CS-EC and GR-KNN. The graphical
representation of these results is shown in Figure 5.5.
The accuracies against top two selected features using test datasets are provided in
Table 5.6. It can be seen from results against for almost all of the datasets, SVM
performs better except humidity. In case of humidity-affected datasets, KNN
provides better results. For M3, SVM provides 88% accuracy with 89% sensitivity
and 88% specificity. In case of overall combined dataset, SVM provides maximum
accuracy i.e. 69%. Humidity and temperature affected combined datasets results
75% accuracy by using NB and KNN. Highest level of accuracy is achieved from
moisture affected SPPs through SVM and i.e. 88%. These results are also shown in
Figure 5.6.
97
Table 5.5: Accuracies for test datasets using top 15 selected features
Datasets
No. of
features
FR-Algo
SVM KNN NB EC
Acc Sn Sp Acc Sn Sp Acc Sn Sp Acc Sn Sp
H1 15 CS 50 30 57 71 80 29 50 30 57 50 30 100
H2 15 CS 47 33 67 60 44 83 60 44 83 53 44 67
H3 15 CS 40 33 50 60 67 50 60 44 83 47 44 50
T1 15 CS 80 67 100 60 67 50 53 33 83 67 56 83
T2 15 CS 69 44 100 69 78 57 63 44 86 69 44 100
T3 15 CS 56 75 40 78 88 70 56 38 70 67 63 70
M1 15 CS 88 100 71 81 78 86 69 78 57 88 100 71
98
Datasets
No. of
features
FR-Algo
SVM KNN NB EC
Acc Sn Sp Acc Sn Sp Acc Sn Sp Acc Sn Sp
M2 15 CS 81 100 57 69 67 71 56 44 71 75 78 71
M3 15 CS 88 100 75 76 78 75 71 44 100 82 89 75
H1, H2, H3 15 CS 56 50 63 69 63 75 69 63 75 63 50 75
T1, T2, T3 15 CS 75 88 63 75 100 50 75 88 63 75 88 63
M1,M2, M3 15 CS 88 75 100 56 75 38 63 63 63 75 75 75
HTM 15 CS 63 75 50 63 75 50 69 63 75 75 75 75
H1 15 GR 29 20 29 64 80 14 57 40 57 50 50 50
H2 15 GR 60 56 67 53 67 33 67 67 67 53 56 50
99
Datasets
No. of
features
FR-Algo
SVM KNN NB EC
Acc Sn Sp Acc Sn Sp Acc Sn Sp Acc Sn Sp
H3 15 GR 53 67 33 40 33 50 67 78 50 60 67 50
T1 15 GR 60 67 50 60 44 83 67 67 67 67 67 67
T2 15 GR 94 89 100 56 44 71 81 67 100 88 78 100
T3 15 GR 61 88 40 39 50 30 61 75 50 56 75 40
M1 15 GR 94 100 86 81 89 71 88 89 86 88 89 86
M2 15 GR 88 89 86 56 56 57 75 78 71 88 89 86
M3 15 GR 88 100 75 82 89 75 88 100 75 88 100 75
H1, H2, H3 15 GR 56 100 13 50 63 38 56 75 38 50 88 13
100
Datasets
No. of
features
FR-Algo
SVM KNN NB EC
Acc Sn Sp Acc Sn Sp Acc Sn Sp Acc Sn Sp
T1, T2, T3 15 GR 63 75 50 88 88 88 44 75 13 75 88 63
M1,M2, M3 15 GR 75 75 75 81 75 88 75 63 88 88 88 88
HTM 15 GR 44 88 0 75 75 75 50 63 38 50 63 38
101
Figure 5.5: HV results against all individual and combined datasets using top 15 features
0
20
40
60
80
100
AC
CU
RA
CIE
S
DATASETS
HV RESULTS USING TOP 15 FEATURES THROUGH CS AND GR
CS-SVM CS-KNN CS-NB CS-EC GR-SVM GR-KNN GR-NB GR-EC
102
Table 5.6: Accuracies for test datasets using top 2 features
Datasets
No. of
Features
Feature Name
SVM KNN NB EC
Acc Sn Sp Acc Sn Sp Acc Sn Sp Acc Sn Sp
H1 2 A189 and A226 50 30 57 57 70 14 50 50 29 50 50 50
H2 2 A189 and A226 53 56 50 67 67 67 60 56 67 67 67 67
H3 2 A189 and A226 53 56 50 53 67 33 47 33 67 47 56 33
T1 2 A189 and A226 80 78 83 73 89 50 60 44 83 80 78 83
T2 2 A189 and A226 69 44 100 63 56 71 50 44 57 56 44 71
T3 2 A189 and A226 56 75 40 56 75 40 56 63 50 56 63 50
M1 2 A189 and A226 75 67 86 75 100 43 63 67 57 69 67 71
M2 2 A189 and A226 81 89 71 56 67 43 69 67 71 69 78 57
M3 2 A189 and A226 88 89 88 65 56 75 76 56 100 76 56 100
103
Datasets
No. of
Features
Feature Name
SVM KNN NB EC
Acc Sn Sp Acc Sn Sp Acc Sn Sp Acc Sn Sp
H1,H2,H3 2 A189 and A226 63 63 63 75 75 75 75 75 75 63 63 63
T1,T2,T3 2 A189 and A226 69 88 50 75 88 63 75 88 63 69 88 50
M1,M2,M3 2 A189 and A226 88 75 100 44 75 13 63 63 63 63 63 63
HTM 2 A189 and A226 69 75 63 63 88 38 69 63 75 69 75 63
104
Figure 5.6: HV results against all individual and combined datasets using top 2 features
0
20
40
60
80
100
AC
CU
RA
CIE
S
DATASETS
HV RESULTS USING TOP TWO FEATURES
SVM KNN NB EC
105
5.2 Multispectral Analysis
In this part of the research, three different experiments are performed to evaluate the best
wavelet function that can effectively classify the environmentally affected defective or non-
defective SPPs for the proposed approach. The first experiment is to find out the best
wavelet function that can provide maximum accuracy against each individual dataset of
three environmental factors. In the second experiment, we have identified three wavelet
functions corresponding to each of the three environmental factors. This identification is
performed to achieve maximum accuracies for the classification of SPPs affected by any
of these factors. In the third experiment, we have tested that selected wavelet function
against combined datasets.
Both of these experiments are performed using HV model (training and testing data).
Again, the datasets are divided into two halves, each part containing 50% of the whole
dataset. Each of the four classifiers is trained using the training dataset based on detailed
WCs. After training, the performance of the proposed approach is evaluated using unknown
test datasets. Four different classification techniques (SVM, NB, KNN and EC) are used
on extracted feature vector to evaluate the accuracy of the proposed approach.
The Table 5.7 describes the accuracy, sensitivity and specificity of each dataset against
experiment I. Each dataset based on different environmental factors is tested in the UV, IR
and visible ranges of wavelengths, using the specified wavelet functions and classifiers. In
the case of humidity, the experiment gave the best accuracies using UV wavelength while
wavelet function and classifier were Coif1 and KNN respectively. Similar were the results
for sensitivity and specificity as well. When the environmental factor was temperature, the
experiment showed around 100% accuracy for both UV and IR wavelengths. Against T1,
Rbio1.3 was the wavelet function; KNN was the classifier when the accuracy was 100%
106
for UV wavelength. Against infrared wavelength, accuracy was maximum for Rbio3.5
wavelet, KNN classifier and T3 was the dataset. Figure 5.7 and Figure 5.8 summarizes the
accuracies and sensitivity of the results shown in Table 5.7 respectively.
For the moisture, again the infrared and ultraviolet wavelengths showed 100% accuracy for
all datasets. In case of UV wavelength, Rbio1.3 and EC were the wavelet function and
classifier correspondingly. For IR, sets for wavelet function and classifier for M1, M2 and
M3 were Dmey and SVM, Rbio2.8 and KNN, and Rbio4.4 and EC respectively. Results
for sensitivity and specificity were also according to the accuracy as mentioned. Visible
wavelength showed the lower accuracies for every dataset as compared to the infrared and
ultraviolet.
Figure 5.7: Comparison of accuracies against UV, IR and visible for experiment I
0
20
40
60
80
100
120
H1 H2 H3 T1 T2 T3 M1 M2 M3
AC
CU
RA
CIE
S
DATASETS
COMPARISON OF ACCURACIES AGAINST UV,
IR AND VISIBLE FOR EXPERIMENT I
UV IR Visible
107
Figure 5.8: Comparison of sensitivity against UV, IR and visible for experiment I
0
20
40
60
80
100
120
H1 H2 H3 T1 T2 T3 M1 M2 M3
SE
NS
ITIV
ITY
DATASETS
COMPARISON OF SENSITIVITIES AGAINST
UV, IR AND VISIBLE FOR EXPERIMENT I
UV IR Visible
108
Table 5.7: Results achieved by experiment I
Environmental
Factor Dataset Wavelength
No. of
Features
Wavelet Parameter
Classifier Accuracy Sensitivity Specificity Wavelet
Function Level
Humidity
H1
UV
26 Coif1 2 KNN 77 71 83
H2 26 Coif1 2 KNN 93 100 86
H3 26 Coif1 2 KNN 100 100 100
H1
IR
116 Rbio3.9 3 KNN 77 71 83
H2 115 Rbio2.8 3 KNN 93 86 100
H3 115 Rbio2.8 3 KNN 86 86 86
H1 Visible 35 Coif1 2 KNN 77 86 67
109
Environmental
Factor Dataset Wavelength
No. of
Features
Wavelet Parameter
Classifier Accuracy Sensitivity Specificity Wavelet
Function Level
H2 23 Bior2.4 3 KNN 93 86 100
H3 23 Rbio4.4 3 KNN 71 86 57
Temperature
T1
UV
15 Rbio1.3 3 KNN 100 100 100
T2 25 Bior3.9 3 KNN 93 100 86
T3 26 Bior6.8 3 KNN 93 86 100
T1
IR
58 Rbio2.4 4 EC 93 86 100
T2 116 Rbio3.9 3 NB 93 100 86
T3 109 Rbio3.5 3 KNN 100 100 100
110
Environmental
Factor Dataset Wavelength
No. of
Features
Wavelet Parameter
Classifier Accuracy Sensitivity Specificity Wavelet
Function Level
T1
Visible
42 Bior3.7 2 KNN 93 100 86
T2 21 Rbio3.3 3 KNN 79 86 71
T3 42 Bior2.4 4 EC 86 86 86
Moisture
M1
UV
15 Rbio1.3 3 EC 100 100 100
M2 15 Rbio1.3 3 EC 100 100 100
M3 15 Rbio1.3 3 EC 100 100 100
M1
IR
277 Dmey 2 SVM 100 100 100
M2 115 Rbio2.8 3 KNN 100 100 100
111
Environmental
Factor Dataset Wavelength
No. of
Features
Wavelet Parameter
Classifier Accuracy Sensitivity Specificity Wavelet
Function Level
M3 108 Rbio4.4 3 EC 100 100 100
M1
Visible
41 Rbio2.6 2 EC 100 100 100
M2 35 Coif1 2 EC 93 100 86
M3 28 Rbio3.7 3 KNN 93 100 86
112
In experiment II, KNN and EC outperform the other two classifiers. UV wavelength range
provides high accuracies as compared to IR and visible against almost all of the datasets
except H1. In the case of UV Coif1, Sym5 and Rbio1.3 provided maximum accuracies
against humidity, temperature and moisture affected SPP respectively. An accuracy of
100% achieved against all three datasets of moisture affected SPPs. While Rbio2.8, Rbio2.2
and Rbio3.5 resulted in high accuracies against humidity, temperature and moisture
affected SPP respectively for IR. Similarly, Bior2.4, Rbio3.3 and Rbio3.7 provide
maximum accuracies for the spectrums lies under visible wavelength range. Results for
sensitivity and specificity were almost in the same trend that were of accuracy for all
datasets. Table 5.8 describes the complete results of the second experiment. The
comparison between accuracies and sensitivities achieved from the experiment 2 are shown
in Figure 5.9 and Figure 5.10 correspondingly.
Figure 5.9: Comparison of accuracies against UV, IR and visible for experiment II
0
20
40
60
80
100
120
H1 H2 H3 T1 T2 T3 M1 M2 M3
AC
CU
RA
CIE
S
DATASETS
COMPARISON OF ACCURACIES AGAINST UV,
IR AND VISIBLE FOR EXPERIMENT II
UV IR Visible
113
Figure 5.10: Comparison of sensitivity against UV, IR and visible for experiment II
The results of the experiment III are shown in Table 5.9. In experiment III, we have tested
the selected wavelet functions from experiment II on the combined datasets. The results
show that in case of UV wavelengths, Coif1 wavelet function outperforms using EC
classifier. In case of overall combined dataset HTM, an accuracy of 94% achieved using
UV data. KNN with Rbio3.5 provides maximum accuracies for IR data and KNN in
combination with Rbio3.3 for visible data. These results are also represented in Figure 5.11.
0
20
40
60
80
100
120
H1 H2 H3 T1 T2 T3 M1 M2 M3
SE
NS
TIV
ITY
DATASETS
COMPARISON OF SENSITIVITIES AGAINST
UV, IR AND VISIBLE FOR EXPERIMENT II
UV IR Visible
114
Figure 5.11: Comparison of accuracies against UV, IR and visible for experiment III
0
20
40
60
80
100
H1,H2,H3 T1,T2,T3 M1,M2,M3 HTM
AC
CU
RA
CIE
S
DATASETS
ACCURACIES FOR COMBINED DATASETS
UV IR Visible
115
Table 5.8: Results achieved by experiment II
Environmental
Factor Dataset Wavelength
No. of
Features
Wavelet
Parameter
Classifier Accuracy Sensitivity Specificity
Wavelet
Function Level
Humidity
H1
UV 26 Coif1 2 KNN
77 71 83
H2 93 100 86
H3 100 100 100
H1
IR 115 Rbio2.8 3 KNN
69 86 50
H2 93 86 100
H3 86 86 86
H1
Visible 23 Bior2.4 3 KNN
69 71 67
H2 93 86 100
H3 57 86 29
116
Environmental
Factor Dataset Wavelength
No. of
Features
Wavelet
Parameter
Classifier Accuracy Sensitivity Specificity
Wavelet
Function Level
Temperature
T1
UV 29 Sym5 2 EC
93 100 86
T2 86 100 71
T3 86 100 71
T1
IR 105 Rbio2.2 3 KNN
86 100 71
T2 79 71 86
T3 79 71 86
T1
Visible 21 Rbio3.3 3 KNN
86 100 71
T2 79 86 71
T3 64 86 43
117
Environmental
Factor Dataset Wavelength
No. of
Features
Wavelet
Parameter
Classifier Accuracy Sensitivity Specificity
Wavelet
Function Level
Moisture
M1
UV 15 Rbio1.3 3 EC
100 100 100
M2 100 100 100
M3 100 100 100
M1
IR 109 Rbio3.5 3 EC
86 86 86
M2 93 100 86
M3 93 100 86
M1
Visible 28 Rbio3.7 3 KNN
64 71 57
M2 86 86 86
M3 93 100 86
118
Table 5.9: Results against combined datasets using experiment III
Dataset Wavelength No. of
Features
Wavelet
Parameter
Classifier Accuracy Sensitivity Specificity
Wavelet
Function Level
H1,H2,H3
UV 26 Coif1 2 EC
81 75 88
T1,T2,T3 81 75 88
M1,M2,M3 94 88 100
HTM 94 100 88
H1,H2,H3
IR 112 Rbio3.5 3 KNN
81 75 88
T1,T2,T3 56 75 38
M1,M2,M3 88 88 88
HTM 75 75 75
119
Dataset Wavelength No. of
Features
Wavelet
Parameter
Classifier Accuracy Sensitivity Specificity
Wavelet
Function Level
H1,H2,H3
Visible 24 Rbio3.3 3 KNN
56 63 50
T1,T2,T3 69 88 50
M1,M2,M3 75 88 63
HTM 63 100 25
120
5.3 Hybrid
In this research, we have proposed two different methodologies for the classification of
defective and non-defective SPPs. One is based on MI and the other one is MA. From the
results of imaging based approach (Chapter 3), we can conclude that statistical textural
features extracted using different feature extraction techniques are beneficial for the
classification of the different environment affected SPPs. Top two from overall 281 features
and top 15 features extracted either from CS or GR results in better accuracies against
different datasets. The results also describe that the use of MI approach provides promising
results in the case of individual environment affected dataset. However, the results against
combined datasets are lower than those of the individual datasets. While in case of MA
based approach (Chapter 4), the use of UV wavelength spectra for the analysis of SPPs is
more suitable than the other two wavelengths. WC extracted from UV data proves very
useful for the classification of SPPs lying in individual datasets. The results of the MA
approach also concludes that the results achieved against combined datasets are better than
the results of MI approach. Both of the proposed approaches perform differently against
different environment affected SPPs.
The main purpose of this research is to propose an approach that can produce accurate
results for individual datasets as well as for combined datasets. The combination of both of
the proposed approaches (spatial and spectral) may produce results that are more accurate.
For the experimentation of this hybrid approach, we have selected data of such SPPs that
were present in both MI and MA. Therefore, we have formed nine new datasets related to
all three environmental factors based on the SPPs that lie in both of the analysis. Now each
new dataset consists of 30 SPPs, from which 15 SPPs were defective and 15 were non-
defective.
121
Figure 5.12 explains the complete flow of the hybrid approach. The whole processing of
the hybrid approach consists of two parts. In the first part of the processing, the statistical-
textural features are extracted from the preprocessed microscopic image of the SPP. The
feature reduction techniques applied to the extracted feature vector finalizes the feature set.
In the second part of the hybrid approach, spectrum acquisition is performed for the same
SPP. After that, the preprocessing phase divides the raw input spectrum into UV, IR and
visible wavelength.
Figure 5.12: Basic flow of the hybrid approach
Combined Feature Vector
Feature Vector
Enhanced Image
Gray Image
SPP Image
SPP
Image Acquisition Spectrum Acquisition
Contrast
Enhancement Preprocessing
Feature Extraction
Feature Extraction Feature Reduction
Classification
Grayscale
Conversion SPP Raw Spectra
Reduced
Feature Vector Feature Vector
IR/UV/Visible
Spectra
DSPP NSPP
122
Feature extraction phase results in a feature vector based on WCs extracted from each
wavelength. These two extracted feature vectors finally became the input to the
classification phase, which will classify the input SPP into one of the two defined classes
(DSPP and NSPP).
In this part of the research, we have combined top two statistical textural features extracted
from the MI approach with WCs extracted from UV wavelength. Similarly, these WCs are
also combined with top 15 statistical textural features, extracted from CS and GR. We have
selected WCs extracted using coif1 (level 2), rbio1.3 (level 3), and sym5 (level 2. All of
these wavelet functions and levels are selected by analyzing the results of second
experiment of the MA approach. Hence, each of the three feature vectors chosen from the
MI approach are combined with three feature vectors selected from the MA approach. This
provides nine feature vectors for further processing.
We have evaluated the accuracy of the proposed hybrid approach using all of these feature
vectors. The analysis of the results shows that WC calculated using rbio1.3 combined with
CS top fifteen features provides higher accuracies for individual, as well as combined
datasets. These results are shown in Table 5.10. This table shows the comparison of the
accuracies, sensitivities and specificities achieved from each of the four classifiers in order
to test the performance of the proposed hybrid approach on individual datasets. The
analysis of the results describes that the proposed approach gives maximum results for all
of the individual datasets using KNN as classifier except M2. All of the achieved accuracies
are greater than or equals to 92%. This can be seen from Figure 5.13. In case of M2, SVM
provides maximum accuracy that is of 100%.
123
Table 5.10: Test accuracies using hybrid approach for individual datasets
Datasets
No. of
Features
SVM KNN NB EC
Acc Sn Sp Acc Sn Sp Acc Sn Sp Acc Sn Sp
H1 33 67 100 0 92 88 100 67 63 75 92 100 75
H2 33 79 88 67 86 100 67 71 88 50 79 88 67
H3 33 79 88 67 93 100 83 64 63 67 79 88 67
T1 33 93 88 100 100 100 100 57 75 33 93 88 100
T2 33 87 88 86 93 100 86 73 75 71 87 88 86
T3 33 100 100 100 100 100 100 76 75 78 100 100 100
M1 33 100 100 100 100 100 100 79 75 83 100 100 100
M2 33 100 100 100 93 100 86 80 75 86 93 100 86
M3 33 87 88 86 93 88 100 73 75 71 87 88 86
124
Figure 5.13: HV results using hybrid approach for individual datasets
After evaluating the performance of the proposed hybrid approach on individual datasets,
we have tested it for the combined datasets. The results achieved by the use of these
combined datasets are also very promising. Once again, these datasets consist of features,
which are the combination of CS top fifteen and WC with a wavelet function rbio1.3. These
results are shown in Table 5.11.
Here again, accuracies achieved through KNN are higher than all others. For a combined
dataset of humidity, maximum 81% accuracy with 88% sensitivity and 75% specificity
achieved. Similarly, for Temperature and moisture 100% accuracy achieved. In the case of
completely combined datasets, the one that is a combination of SPPs affected by all
environmental factors, 100% accuracy achieved through EC and 94% through NB. These
results are also shown in Figure 5.14.
If we compare the results achieved from both of the proposed approaches and their
combination, we can see that hybrid approach provides more accurate results, with high
sensitivity and specificity.
0
20
40
60
80
100
120
H1 H2 H3 T1 T2 T3 M1 M2 M3
AC
CU
RA
CIE
S
DATASETS
HV RESULTS USING HYBRID APPROACH FOR
INDIVIDUAL DATASETS
SVM KNN NB EC
125
Table 5.11: Test accuracies using hybrid approach for combined datasets
Datasets No. of
Features
SVM KNN NB EC
Acc Sn Sp Acc Sn Sp Acc Sn Sp Acc Sn Sp
H1,H2,H3 33 81 100 63 81 88 75 69 63 75 81 88 75
T1,T2,T3 33 81 88 75 100 100 100 69 88 50 81 88 75
M1,M2,M3 33 88 88 88 100 100 100 75 75 75 100 100 100
HTM 33 88 100 75 94 88 100 81 75 88 100 100 100
126
Figure 5.14: Test accuracies using hybrid approach for combined datasets
Our focus in this research is to develop an approach that can accurately classify any of the
environmentally affected SPPs. For this purpose, we have proposed two approaches and
then a hybrid version of the both. Table 5.12 provides a comparison of the highest
accuracies achieved using all three approaches when used against combined datasets. The
analysis of the results shows that in case of MI approach the features extracted using GR
are more promising for the classification of SPPs in comparison with other feature sets used
in that experiment. While in case of MA, the wavelet function coif1 provides maximum
accuracies when used with UV data. The results achieved with this approach are better than
the previous approach. In the last when the hybrid approach was used it provides more
accurate results than the other two against all datasets. These results are also shown in
Figure 5.15.
0
20
40
60
80
100
120
H1,H2,H3 T1,T2,T3 M1,M2,M3 HTM
AC
CU
RA
CIE
S
DATASETS
HV ACCURACIES USING HYBRID
APPROACH FOR COMBINED
DATASETS
SVM KNN NB EC
127
Table 5.12: Comparison of highest accuracies achieved using MI, MA and hybrid approaches
Dataset Features No. of
Features Approach
SVM KNN NB EC
Acc Sn Sp Acc Sn Sp Acc Sn Sp Acc Sn Sp
H1,H2,H3
GR 15 MI
56 100 13 50 63 38 56 75 38 50 88 13
T1,T2,T3 63 75 50 88 88 88 44 75 13 75 88 63
M1,M2,M3 75 75 75 81 75 88 75 63 88 88 88 88
HTM 44 88 0 75 75 75 50 63 38 50 63 38
H1,H2,H3 WC
Wavelet
Function =
coif1) and
Level = 2
28 MA
69 75 63 69 88 50 56 13 100 81 75 88
T1,T2,T3 75 75 75 88 100 75 81 75 88 81 75 88
M1,M2,M3 94 88 100 100 100 100 81 75 88 94 88 100
HTM 81 88 75 94 100 88 81 88 75 94 100 88
128
Dataset Features No. of
Features Approach
SVM KNN NB EC
Acc Sn Sp Acc Sn Sp Acc Sn Sp Acc Sn Sp
H1,H2,H3
Chi + WC
(Wavelet
Function =
rbio1.3) and
Level = 3
33 Hybrid (H)
81 100 63 81 88 75 69 63 75 81 88 75
T1,T2,T3 81 88 75 100 100 100 69 88 50 81 88 75
M1,M2,M3 88 88 88 100 100 100 75 75 75 100 100 100
HTM 88 100 75 94 88 100 81 75 88 100 100 100
129
Figure 5.15: Comparison of accuracies achieved using all three approaches
CHAPTER NO. 6
CONCLUSION AND FUTURE
RECOMMENDATIONS
130
In this research, we have proposed a new approach for the classification of defective and
non-defective SPPs using image processing, signal processing and machine learning
techniques. We have proposed two approaches for this purpose and then tested their
performances independently and in a hybrid manner. We have considered three
environmental factors that can affect the surface of the SPPs. These factors are humidity,
temperature and moisture. The performances of the proposed approaches were also tested
using independent datasets individually, as well as in combined manner.
The first proposed approach is based on microscopic imaging that uses textural features
extracted from the surface of the preprocessed images. A comparison analysis is performed
using all 281, top 15 (extracted using CS, GR and RF) and top 2 features. Classification is
performed using SVM, KNN, NB and EC classifiers. Analysis shows that higher accuracies
are achieved on moisture-affected SPPs as moisture has a quick reaction with the APIs of
the SPP. In different types of experiments, the proposed approach using SVM for most of
the individual datasets is better than the other classifiers. In case of combined datasets, GR
provides more accurate results but still its performance is not promising.
In second proposed approach, we have used wavelet transformations along with Machine
Learning algorithms on Multispectral data for the classification of the environment affected
solid pharmaceutical products. A correct choice of wavelet function and decomposition
level provides promising results in the required area. In our case, two or three level
decomposition of multispectral data leads towards better results. Again, four different
classifiers are used to test the effectiveness of the wavelet coefficients in the proposed
approach. Comparison tables show that KNN and EC are more suitable as compared to NB
and SVM. We have also compared the use of WC extracted from MA of UV, IR and visible
wavelengths and results describe that UV wavelength is more suitable for the classification
131
of defective and non-defective SPPs. In case of combined datasets, UV data provides more
accurate results than Microscopic imaging based approach.
In the last, we have tested a hybrid approach based on both imaging and spectral data. The
comparison of results shows that this hybrid approach provides maximum accuracies
against combined dataset. An accuracy of 81% is achieved against combined datasets of
SPPs affected by humidity. Similarly, for temperature and moisture affected SPPs an
accuracy of 100% is achieved. The proposed hybrid approach provides 94% accuracy when
used against overall combined dataset named HTM.
The proposed approach provides an accurate solution for the classification of defective and
non-defective solid pharmaceutical products. However, there are some aspects which can
also be dealt with in future. Currently, we are dealing with three environmental factors i.e.
humidity, temperature and moisture. The analysis can be extended on some more factors
like pressure, light, etc. Secondly, the spectral analysis can also be extended on semisolid
pharmaceutical products. Thirdly, both of the proposed approaches are based on the
analysis of single point data acquired from the input product. In future, this can be extended
towards the analysis based on multi-point data, which can produce results that will be more
accurate.
132
References
1 Sahoo, P. K., Pharmaceutical technology - tablets. 2007, Delhi Institute of
Pharmaceutical Sciences and Research: New Delhi.
2 Harbir, K. Processing technologies for pharmaceutical tablets- a review.
International Research Journal of Pharmacy: (2012).
3 Mahato, R. I., Dosage forms and drug delivery systems. In the apha complete
review for pharmacy, Castle Connolly Graduate Medical Publishing, New York.
2007, (2007)
4 Christian, L., Collins, L., Kiatgrajai, M., Merle, A., Mukherji, N.and Quade, A.,
The problem of substandard medicines in substandard countries. 2012, Workshop
in International Public Affairs.
5 Clift, C., Combating counterfeit, falsified and substandard medicines. 2010, Centre
on Global Health Security.
6 WHO. Medicines: Spurious/falsely-labelled/ falsified/counterfeit (SFFC)
medicines. Fact sheet n°275. World Health Organization 2012 May; Available
from: http://www.who.int/mediacentre/factsheets/fs275/en/.
7 WHO, Counterfeit drugs. 1999, Department of Essential Drugs and Other
Medicines: Geneva, Switzerland.
8 Islam, S. M. A., Hossain, M. A., Kabir, A. N. M. H., Kabir, S.and Hossain, M. K.
Study of moisture absorption by ranitidine hydrochloride: Effect of % rh,
excipients, dosage forms and packing materials. Dhaka University Journal of
Pharmaceutical Sciences, 7 (1): 59-64 (2008).
9 Szakonyi, G. and Zelkó, R. The effect of water on the solid state characteristics of
pharmaceutical excipients: Molecular mechanisms, measurement techniques, and
quality aspects of final dosage form. International Journal of Pharmaceutical
Investigation, 2 (1): 18 (2012).
10 Antikainen, J., New techniques for spectral image acquisition and analysis. 2012,
PhD diss., PhD thesis publications of the university of eastern finland dissertations
in forestry and natural sciences: Finland.
11 NASA, The electromagnetic spectrum. 2013, NASA.
133
12 Gowen, A. A., O'Donnel, C., Cullen, P.and Bell, S. E. J. Recent applications of
chemical imaging to pharmaceutical process monitoring and quality control.
European Journal of Pharmaceutics and Biopharmaceutics, 69 (1): 10-22 (2008).
13 Pathuri, R., Muthukumaran, M., Krishnamoorthy, B.and Nishat, A. A review on
analytical method development and validation of pharmaceutical technology.
Current Pharma Research, 3 (2): 855-870 (2013).
14 Ferrer, I. and Thurman, E. M. Analysis of 100 pharmaceuticals and their degradates
in water samples by liquid chromatography/quadrupole time-of-flight mass
spectrometry. Journal of Chromatography A, 1259 148-157 (2012).
15 Deisingh, A. K. Pharmaceutical counterfeiting. The Analyst, 130 (3): 271-279
(2005).
16 Perkinelmer, Using Near-IR spectroscopy to better understand tablet uniformity and
properties. 2003.
17 Holzgrabe, U., Diehl, B. W. K.and Wawer, I. NMR spectroscopy in pharmacy.
Journal of Pharmaceutical and Biomedical Analysis, 17 (4): 557-616 (1998).
18 ThermoScientific, Nir spectroscopy for pharmaceutical analysis. 2010.
19 Vogeser, M., Kobold, U.and Seidel, D. Mass spectrometry in medicine- the role of
molecular analysis. Dtsch Arztebl, 104 (31-32): 2194-200 (2007).
20 Görög, S. Identification in drug quality control and drug research. TrAC Trends in
Analytical Chemistry: (2015).
21 Nuhu, A. A. Recent analytical approaches to counterfeit drug detection. Journal of
Applied Pharmaceutical Science, 1 (5): 06-13 (2011).
22 Siddiqui, M. R., AlOthman, Z. A.and Rahman, N. Analytical techniques in
pharmaceutical analysis: A review. Arabian Journal of Chemistry: (3013).
23 Sacré, P.-Y., Deconinck, E., Beer, T. D., Courselle, P., Vancauwenberghe, R.,
Chiap, P., Crommen, J.and Beer, J. O. D. Comparison and combination of
spectroscopic techniques for the detection of counterfeit medicines. Journal of
pharmaceutical and biomedical analysis, 53 (3): 445-453 (2010).
24 Kazeminy, A., Hashemi, S., Williams, R. L., Ritchie, G. E., Rubinovitz, R.and Sen,
S. A comparison of near infrared method development approaches using a drug
134
product on different spectrophotometers and chemometric software algorithms.
Journal of Near Infrared Spectroscopy, 17 (5): 233 (2009).
25 Buice, J., G., R., Gold, T. B., Lodder, R. A.and Digenis, G. A. Determination of
moisture in intact gelatin capsules by near-infrared spectrophotometry.
Pharmaceutical Research, 12 (1): 161-163 (1995).
26 Morisseau, K. M. and Rhodes, C. T. Near-infrared spectroscopy as a nondestructive
alternative to conventional tablet hardness testing. Pharmaceutical Research 14 (1):
108-111 (1997).
27 Roggo, Y., Chalus, P., Maurer, L., Leme-Martinez, C., Edmond, A.and Jent, N. A
review of near infrared spectroscopy and chemometrics in pharmaceutical
technologies. Journal of Pharmaceutical and Biomedical Analysis, 44 (3): 683-700
(2007).
28 Barnes, T. J., Kempson, I. M.and Prestidge, C. A. Surface analysis for
compositional, chemical and structural imaging in pharmaceutics with mass
spectrometry: A tof-sims perspective. International journal of pharmaceutics 417
(1): 61-69 (2011).
29 Culzoni, M. J., Dwivedi, P., Green, M. D., Newton, P. N.and Fernández, F. M.
Ambient mass spectrometry technologies for the detection of falsified drugs.
MedChemComm, 5 (1): 9-19 (2014).
30 Chen, H., Talaty, N. N., Takáts, Z.and Cooks, R. G. Desorption electrospray
ionization mass spectrometry for high-throughput analysis of pharmaceutical
samples in the ambient environment. Analytical Chemistry, 77 (21): 6915-6927
(2005).
31 Holzgrabe, U. Quantitative nmr spectroscopy in pharmaceutical applications. Prog
NMR Spectrosc, 57: 229-240 (2010).
32 Holzgrabe, U., Deubner, R., Schollmayer, C.and Waibel, B. Quantitative nmr
spectroscopy—applications in drug analysis. Journal of pharmaceutical and
biomedical analysis, 38 (5): 806-812 (2005).
33 Holzgrabe, U. and Malet-Martino, M. Analytical challenges in drug counterfeiting
and falsification—the nmr approach. Journal of pharmaceutical and biomedical
analysis, 55 (4): 679-687 (2011).
135
34 Malet-Martino, M. and Holzgrabe, U. NMR techniques in biomedical and
pharmaceutical analysis. Journal of pharmaceutical and biomedical analysis, 55
(1): 1-15 (2011).
35 Croker, D. M., Hennigan, M. C., Maher, A., Hu, Y., Ryder, A. G.and Hodnett, B.
K. A comparative study of the use of powder x-ray diffraction, raman and near
infrared spectroscopy for quantification of binary polymorphic mixtures of
piracetam Journal of pharmaceutical and biomedical analysis 63: 80-86 (2012).
36 Maurin, J. K., Pluciński, F., Mazurek, A. P.and Fijałek, Z. The usefulness of simple
x-ray powder diffraction analysis for counterfeit control—the viagra® example.
Journal of pharmaceutical and biomedical analysis, 43 (4): 1514-1518 (2007).
37 Scoutaris, N., Vithani, K., Slipper, I., Chowdhry, B.and Douroumis, D. Sem/edx
and confocal raman microscopy as complementary tools for the characterization of
pharmaceutical tablets. International journal of pharmaceutics, 470 (1): 88-98
(2014).
38 Klang, V., Valenta, C.and Matsko, N. B. Electron microscopy of pharmaceutical
systems. Micron, 44 45-74 (2013).
39 Ruotsalainen, M., Heinämäki, J., Guo, H., Laitinen, N.and Yliruusi, J. A novel
technique for imaging film coating defects in the film-core interface and surface of
coated tablets. European journal of pharmaceutics and biopharmaceutics, 56. (3):
381-388 (2003).
40 Gendrin, C., Roggo, Y.and Collet, C. Pharmaceutical applications of vibrational
chemical imaging and chemometrics: A review. Journal of pharmaceutical and
biomedical analysis, 48 (3): 533-553 (2008).
41 De Beer, T., Anneleen Burggraeve, Margot Fonteyne, Saerens, L., Remon, J. P.and
Vervaet, C. Near infrared and raman spectroscopy for the in-process monitoring of
pharmaceutical production processes. International journal of pharmaceutics, 417
(1): 32-47 (2011).
42 Luypaert, J., Massart, D. L.and Heyden, Y. V. Near-infrared spectroscopy
applications in pharmaceutical analysis. Talanta, 72 (3): 865-883 (2007).
136
43 Jamrógiewicz, M. Application of the near-infrared spectroscopy in the
pharmaceutical technology. Journal of Pharmaceutical and Biomedical Analysis,
66: 1-10 (2012).
44 Luukkonen, P., Fransson, M., Björn, I. N., Hautala, J., Lagerholm, B.and Folestad,
S. Real-time assessment of granule and tablet properties using in-line data from a
high-shear granulation process. J. Pharm. Sci., 97: 950-959 (2008).
45 Chalus, P., Walter, S.and Ulmschneider, M. Combined wavelet transform–artificial
neural network use in tablet active content determination by near-infrared
spectroscopy. Analytica chimica acta, 591 (2): 219-224 (2007).
46 Svensson, O., Abrahamsson, K., Engelbrektsson, J., Nicholas, M., Wikström, H.and
Josefson, M. An evaluation of 2d-wavelet filters for estimation of differences in
textures of of pharmaceutical tablets. Chemometrics and Intelligent Laboratory
Systems, 84: 3–8 (2006 ).
47 Dowell, F. E., Maghirang, E. B., Fernandez, F. M., Newton, P. N.and Green, M. D.
Detecting counterfeit antimalarial tablets by near-infrared spectroscopy. Journal of
pharmaceutical and biomedical analysis, 48 (3): 1011-1014 (2008).
48 Bleye, C. D., Chavez, P.-F., Mantanus, J., Marini, R., Hubert, P., Rozet, E.and
Ziemons, E. Critical review of near-infrared spectroscopic methods validations in
pharmaceutical applications. Journal of pharmaceutical and biomedical analysis,
69: 125-132 (2012).
49 Shah, R. G., Patel, N. K.and Pancholi, S. S. Near infrared spectroscopy: An
advanced technique in spectroscopy. Int J Pharm Bioanal Res, 1 (1): 1-11 (2014).
50 Rodionova, O. Y. and Pomerantsev, A. L. NIR-based approach to counterfeit-drug
detection. TrAC Trends in Analytical Chemistry 29 (8): 795-803 (2010).
51 Rodionova, O. Y., Houmøller, L. P., Pomerantsev, A. L., Geladi, P., Burger, J.,
Dorofeyev, V. L.and Arzamastsev, A. P. NIR spectrometry for counterfeit drug
detection: A feasibility study. Analytica Chimica Acta, 549 (1): (2005).
52 Storme-Paris, I., Rebiere, H., Matoga, M., Civade, C., Bonnet, P.-A., Tissier, M.
H.and Chaminade, P. Challenging near infrared spectroscopy discriminating ability
for counterfeit pharmaceuticals detection. Analytica chimica acta 658 (2): 163-174
(2010).
137
53 Vredenbregt, M. J., Blok, L.-T., Hoogerbrugge, R., Barends, D. M.and Kaste, D. D.
Screening suspected counterfeit viagra® and imitations of viagra® with near-
infrared spectroscopy. Journal of Pharmaceutical and Biomedical analysis 40 (4):
840-849 (2006).
54 Candolfi, A., Wu, W., Massart, D. L.and Heuerding, S. Comparison of
classification approaches applied to nir-spectra of clinical study lots. Journal of
pharmaceutical and biomedical analysis, 16 (8): 1329-1347 (1998).
55 Clarke, F. Extracting process-related information from pharmaceutical dosage
forms using near infrared microscopy. Vibrational Spectroscopy 34 (1): 25-35
(2004).
56 Bikiaris, D., Koutri, I., Alexiadis, D., Damtsios, A.and Karagiannis, G. Real time
and non-destructive analysis of tablet coating thickness using acoustic microscopy
and infrared diffuse reflectance spectroscopy. International journal of
pharmaceutics, 43 (1): 33-44 (2012).
57 Boiret, M., Gut, Y., Duval, H.and Ginot, Y. M. 2013. Use of near infrared chemical
imaging and 3d visualisation of a pharmaceutical tablet for formulation selection
during drug product development, in NIR 2013 proceedings, Orléans.
58 Van Eerdenbrugh, B. and Taylor, L. S. Application of Mid-IR spectroscopy for the
characterization of pharmaceutical systems. International journal of
pharmaceutics, 417 (1): 3-16 (2011).
59 Reich, G. Near-infrared spectroscopy and imaging: Basic principles and
pharmaceutical applications Advanced drug delivery reviews, 57 (8): 1109-1143
(2005).
60 Blanco, M. and Alcalá, M. Content uniformity and tablet hardness testing of intact
pharmaceutical tablets by near infrared spectroscopy: A contribution to process
analytical technologies. Analytica chimica acta, 557 (1): 353-359 (2006).
61 Sulub, Y., LoBrutto, R., Vivilecchia, R.and Wabuyele, B. W. Content uniformity
determination of pharmaceutical tablets using five near-infrared reflectance
spectrometers: A process analytical technology (pat) approach using robust
multivariate calibration transfer algorithms. Analytica chimica acta, 611 (2): 143-
150 (2008).
138
62 Moes, J. J., Ruijken, M. M., Gout, E., Frijlink, H. W.and Ugwoke, M. I. Application
of process analytical technology in tablet process development using nir
spectroscopy: Blend uniformity, content uniformity and coating thickness
measurements. International journal of pharmaceutics, 357 (1): 108-118 (2008).
63 Li, W., Bashai-Woldu, A., Ballard, J., Johnson, M., Agresta, M., Rasmussen, H.,
Hu, S., Cunningham, J.and Winstead, D. Applications of NIR in early stage
formulation development: Part i. Semi-quantitative blend uniformity and content
uniformity analyses by reflectance nir without calibration models. International
journal of pharmaceutics, 340 (1): 97-103 (2007).
64 Li, W., Bagnol, L., Berman, M., Chiarella, R. A.and Gerber, M. Applications of nir
in early stage formulation development. Part ii. Content uniformity evaluation of
low dose tablets by principal component analysis. International journal of
pharmaceutics, 380 (1): 49-54 (2009).
65 Sulub, Y., LoBrutto, R., Vivilecchia, R.and Wabuyele, B. Near-infrared
multivariate calibration updating using placebo: A content uniformity
determination of pharmaceutical tablets. Vibrational Spectroscopy, 46 (2): 128-134
(2008).
66 Ely, D., Chamarthy, S.and Carvajal, M. T. An investigation into low dose blend
uniformity and segregation determination using nir spectroscopy. Colloids and
Surfaces A: Physicochemical and Engineering Aspects 288 (1): 71-76 (2006).
67 Vankeirsbilck, T., Vercauteren, A., Baeyens, W.and Weken, G. V. d. Applications
of raman spectroscopy in pharmaceutical analysis. trends in analytical chemistry,
21 (12): 869-877 (2002).
68 Vankeirsbilck, T., Vercauteren, A., Baeyens, W., Weken, G. V. d., Verpoort, F.,
Vergote, G.and Remon, J. P. Applications of raman spectroscopy in pharmaceutical
analysis. TrAC trends in analytical chemistry 21 (12): 869-877 (2002).
69 Feng, L., Xinxin, W., Yifeng, C., Yongjian, Y., Yinjia, Y.and Gengli, D. A novel
identification system for counterfeit drugs based on portable raman spectroscopy.
Chemometrics and Intelligent Laboratory Systems: (127) 63-69 (2013).
139
70 Li, Y., Du, G., Cai, W.and Shao, X. Classification and quantitative analysis of
azithromycin tablets by raman spectroscopy and chemometrics. American Journal
of Analytical Chemistry, 2: 135-141 (2011).
71 Romero-Torres, S., Pérez-Ramos, J. D., Morris, K. R.and Grant, E. R. Raman
spectroscopic measurement of tablet-to-tablet coating variability. Journal of
pharmaceutical and biomedical analysis, 38 (2): 270-274 (2005).
72 Romero-Torres, S., Pérez-Ramos, J. D., Morris, K. R.and Grant, E. R. Raman
spectroscopy for tablet coating thickness quantification and coating characterization
in the presence of strong fluorescent interference. Journal of pharmaceutical and
biomedical analysis, 41 (3): 811-819 (2006).
73 Müller, J., Knop, K., Thies, J., Uerpmann, C.and Kleinebudde, P. Feasability of
raman spectroscopy as pat tool in active coating. Drug development and industrial
pharmacy, 36 (2): 234-243 (2010).
74 Gao, Q., Liu, Y., Li, H., Chen, H., Chai, Y.and Lu, F. Comparison of several
chemometric methods of libraries and classifiers for the analysis of expired drugs
based on raman spectra. Journal of pharmaceutical and biomedical analysis 94:
58-64 (2014).
75 Veij, d. M., Deneckere, A., Vandenabeele, P., Kaste, D. d.and Moens, L. Detection
of counterfeit viagra® with raman spectroscopy. Journal of pharmaceutical and
biomedical analysis, 46 (2): 303-309 (2008).
76 Eliasson, C. and Matousek, P. Noninvasive authentication of pharmaceutical
products through packaging using spatially offset raman spectroscopy. Analytical
chemistry, 79 (4): 1696-1701 (2007).
77 Ricci, C., Eliasson, C., Macleod, N. A., Newton, P. N., Matousek, P.and Kazarian,
S. G. Characterization of genuine and fake artesunate anti-malarial tablets using
fourier transform infrared imaging and spatially offset raman spectroscopy through
blister packs. Analytical and bioanalytical chemistry, 389 (5): 1525-1532 (2007).
78 Zhang, L., Henson, M. J.and Sekulic, S. S. Multivariate data analysis for raman
imaging of a model pharmaceutical tablet. Analytica Chimica Acta, 545 (2): 262-
278 (2005).
140
79 Gordon, K. C. and McGoverin, C. M. Raman mapping of pharmaceuticals.
International journal of pharmaceutics, 417 (1): 151-162 (2011).
80 Hédoux, A., Guinet, Y.and Descamps, M. The contribution of raman spectroscopy
to the analysis of phase transformations in pharmaceutical compounds.
International journal of pharmaceutics 417 (1): 17-31 (2011).
81 Matousek, P. and Parker, A. W., Non-invasive bulk analysis of pharmaceutical
tablets and capsules using the transmission raman method. 2006/2007, Central
Laser Facility Annual Report: Didcot.
82 Strachan, C. J., Rades, T., Gordon, K. C.and Rantanen, J. Raman spectroscopy for
quantitative analysis of pharmaceutical solids. Journal of pharmacy and
pharmacology, 59: 179–192 (2007).
83 O'Connell, M. L., Howley, T., Ryder, A. G., Leger, M. N.and Madden, M. G.
Classification of a target analyte in solid mixtures using principal component
analysis, support vector machines, and raman spectroscopy. In Opto-Ireland: 340-
350 (2005).
84 Andreas, H. and Clemens, A. 2010. Computer-vision based pharmaceutical pill
recognition on mobile phones, in 14th Proceedings of CESCG.
85 Andreas, H., Clemens, A.and Dieter, S. 2010. Instant segmentation and feature
extraction for recognition of simple objects on mobile phones, in Computer Vision
and Pattern Recognition Workshops (CVPRW), IEEE Computer Society
Conference.
86 Ramya, S., Suchitra, J.and Nadesh, R. K. Detection of broken pharmaceutical drugs
using enhanced feature extraction technique. International Journal of Engineering
and Technology, 5 (2): 1407-1411 (2013).
87 Špiclin, Ž., Bukovec, M., Pernuš, F.and Likar, B. Image registration for visual
inspection of imprinted pharmaceutical tablets. Machine Vision and Applications,
22 (1): 197-206 (2011).
88 Bukovec, M., Špiclin, Ž., Pernuš, F.and Likar, B. Automated visual inspection of
imprinted pharmaceutical tablets. Measurement Science and Technology, 18 (9):
2921 (2007).
141
89 Bukovec, M., Spiclin, Z., Pernus, F.and Likar, B. 2007. Geometrical and statistical
visual inspection of imprinted tablets, in MVA2007 IAPR Conference on Machine
Vision Applications, Tokyo: p. 412-415.
90 Tahir, F. and Fahiem, M. A. A statistical-textural-features based approach for
classification of solid drugs using surface microscopic images. Computational and
mathematical methods in medicine, 2014 (2014 ).
91 Možina, M., Tomaževič, D., Pernuš, F.and Likar, B. Automated visual inspection
of imprint quality of pharmaceutical tablets. Machine vision and applications, 24
(1): 63-73 (2013).
92 Lee, Y.-B., Park, U.and Jain, A. K. 2010. Pill-id: Matching and retrieval of drug pill
imprint images, in International Conference on Pattern Recognition, Istanbul: p.
2632-2635.
93 Yu, C.-C., Wen, C.-Y., Lu, C.-P.and Chen, Y.-F. The drug tablet image retrieal
system based on content-based image retrieval. International journal of innovative
computing, information and control, 8 (7(A)): 4497-4508 (2012).
94 Jung, C. R., Ortiz, R. S., Limberger, R.and Mayorga, P. A new methodology for
detection of counterfeit viagra® and cialis® tablets by image processing and
statistical analysis. Forensic science international 216 (1): 92-96 (2012).
95 Gowen, A. A., O’Donnell, C. P., Cullenb, P. J., Downey, G.and Frias, J. M.
Hyperspectral imaging- an emerging process analytical tool for food quality and
safety control. Trends in Food Science & Technology, 18: 590-598 (2007).
96 de Juan, A., Tauler, R., Dyson, R., Marcolli, C., Rault, M.and Maeder, M.
Spectroscopic imaging and chemometrics: A powerful combination for global and
local sample analysis. TrAC Trends in Analytical Chemistry 23 (1): 70-79 (2004).
97 Ravn, C., Skibstedb, E.and Bro, R. Near-infrared chemical imaging (NIR-CI) on
pharmaceutical solid dosage forms - comparing common calibration approaches.
Journal of Pharmaceutical and Biomedical Analysis, 48: 554–561 (2008).
98 Dubois, J., Wolff, J.-C., Warrack, J. K., Schoppelrei, J.and Lewis, E. N. Nir
chemical imaging for counterfeit pharmaceutical products analysis. Spectroscopy,
22 (2): 40-50 (2007).
142
99 Cruz, J. and Blanco, M. Content uniformity studies in tablets by NIR-CI. Journal
of Pharmaceutical and Biomedical Analysis, 56 (2): 408– 412 (2011).
100 Puchert, T., Lochmann, D., Menezes, J. C.and Reich, G. Near-infrared chemical
imaging (NIR-CI) for counterfeit drug identification—a four-stage concept with a
novel approach of data processing (linear image signature). Journal of
Pharmaceutical and Biomedical Analysis 51: 138–145 (2010).
101 Lewis, E. N., Kidder, L. H.and Lee, E. Nir chemical imaging- near-infrared
spectroscopy on steroids. NIR News, 16 (5): (2005).
102 Hamilton, S. J. and Lodder, R. A. Hyperspectral imaging technology for
pharmaceutical analysis. International Society for Optics and Photonics., 4626:
136-147 (2002).
103 Doub, W. H., Adams, W. P., Spencer, J. A., Buhse, L. F.and Nelson, M. P. Raman
chemical imaging for ingredient-specific prticle size characterization of aqueous
suspension nasal spray formulations: A progress report. . Pharmaceutical research,
24 (5): 934-945 (2007).
104 Moor, J. Application of NIR imaging and multivariate data analysis for
pharmaceutical products. 2010.
105 Wu, Z., Tao, O., Dai, X., Du, M., Shi, X.and Qiao, Y. Monitoring of a
pharmaceutical blending process using near infrared. Vibrational Spectroscopy 63
371-379 (2012).
106 Sacré, P.-Y., Bleye, C. D., Chavez, P.-F., Netchacovitch, L., Hubert, P.and
Ziemons, E. Data processing of vibrational chemical imaging for pharmaceutical
applications. Journal of pharmaceutical and biomedical analysis 101: 123-140
(2014).
107 Amigo, J. M. and Ravn, C. Direct quantification and distribution assessment of
major and minor components in pharmaceutical tablets by nir-chemical imaging.
European Journal of Pharmaceutical Sciences, 37 (2): 76-82 (2009).
108 Franch-Lage, F., Amigo, J. M., Skibsted, E., Maspoch, S.and Coello, J. Fast
assessment of the surface distribution of api and excipients in tablets using NIR-
hyperspectral imaging. International journal of pharmaceutics 411 (1): 27-35
(2011).
143
109 Carneiro, R. L. and Poppi, R. J. A quantitative method using near infrared imaging
spectroscopy for determination of surface composition of tablet dosage forms: An
example of spirolactone tablets. Journal of the Brazilian Chemical Society, 23 (8):
1570-1576 (2012).
110 Palou, A., Cruz, J., Blanco, M., Tomàs, J., Ríos, J. d. l.and Alcalà, M. Determination
of drug, excipients and coating distribution in pharmaceutical tablets using nir-ci.
Journal of Pharmaceutical Analysis 2(2): 90-97 (2012).
111 Offroy, M., Roggo, Y.and Duponchel, L. Increasing the spatial resolution of near
infrared chemical images (NIR-CI): The super-resolution paradigm applied to
pharmaceutical products. Chemometrics and Intelligent Laboratory Systems, 117
183-188 (2012).
112 Osorio, J. G., Stuessy, G., Kemeny, G. J.and Muzzio, F. J. Characterization of
pharmaceutical powder blends using in situ near-infrared chemical imaging
Chemical Engineering Science, 108 244-257 (2014).
113 Lyon, R. C., Lester, D. S., Lewis, E. N., Lee, E., Lawrence, X. Y., Jefferson, E.
H.and Hussain, A. S. Near-infrared spectral imaging for quality assurance of
pharmaceutical products: Analysis of tablets to assess powder blend homogeneity.
AAPS PharmSciTech, 3 (3): 1-15 (2002).
114 Lee, E., Huang, W. X., Chen, P., Lewis, E. N.and Vivilecchia, R. V. High-
throughput analysis of pharmaceutical tablet content uniformity by near-infrared
chemical imaging. Spectroscopy, 21 (11): 24 (2006).
115 Westenberger, B. J., Ellison, C. D., Fussner, A. S., Jenney, S., Kolinski, R. E., Lipe,
T. G., Lyon, R. C., Moore, T. W., Revelle, L. K., Smith, A. P., Spencer, J. A.and
Story, K. D. Quality assessment of internet pharmaceutical products using
traditional and non-traditional analytical techniques. International journal of
pharmaceutics, 306 (1): 56-70 (2005).
116 Gendrin, C., Roggo, Y.and Collet, C. Content uniformity of pharmaceutical solid
dosage forms by near infrared hyperspectral imaging: A feasibility study. Talanta,
73 (4): 733-741 (2007).
117 Li, W., Woldu, A., Kelly, R., McCool, J., Bruce, R., Rasmussen, H., Cunningham,
J.and Winstead, D. Measurement of drug agglomerates in powder blending
144
simulation samples by near infrared chemical imaging. International journal of
pharmaceutics, 350 (1): 369-373 (2008).
118 Hilden, L. R., Pommier, C. J., Badawy, S. I. F.and Friedman, E. M. Nir chemical
imaging to guide/support bms-561389 tablet formulation development.
International journal of pharmaceutics, 353 (1): 283-290 (2008).
119 Sasic, S. Raman mapping of low-content api pharmaceutical formulations. I.
Mapping of alprazolam and alprazolam/xanax tablets. Pharmaceutical Research,
24 (1): 58-65 (2007).
120 Bell, S. E. J., Beattie, J. R., McGarvey, J. J., Peters, K. L., Sirimuthu, N. M. S.and
Speers, S. J. Development of sampling methods for raman analysis of solid forms
of therapeutic and illicit drugs. Journal of Raman Spectroscopy, 35 (5): 409–417
(2004).
121 Vidal, M. and Amigo, J. M. Pre-processing of hyperspectral images. Essential steps
before image analysis. Chemometrics and Intelligent Laboratory Systems, 117:
138-148 (2012).
122 Šašić, S. Chemical imaging of pharmaceutical granules by raman global
illumination and near-infrared mapping platforms. Analytica chimica acta, 611 (1):
73-79 (2008).
123 Shippert, P. Introduction to hyperspectral image analysis. Online Journal of Space
Communication, 3: (2003).
124 Vagni, F., Survey of hyperspectral and multispectral imaging technologies. 2007,
Research and Technology Organization.
125 Malik, I., Poonacha, M., Moses, J.and Lodder, R. A. Multispectral imaging of
tablets in blister packaging. AAPS PharmSciTech, 2 (2): 38-44 (2001).
126 Nippolainen, E., Fauch, L., Miridonov, S.and Kamshilin, A. A. Novel multispectral
imaging system for pharmaceutical applications. Pacific Science Review, 12 (2):
203~207 (2011).
127 Tahir, F., Fahiem, M. A., Tauseef, H.and Farhan, S. A survey of multispectral high
resolution imaging based drug surface morphology validation techniques. Life
Science Journal, 10 (7s): 1050-1059 (2013).
145
128 Cullen, P., Edelman, G. J., Van Leeuwen, T. G., Aalders, M. C.and Gaston, E.
Hyperspectral imaging for non-contact analysis of forensic traces.: (2012).
129 Ramchandra, A. Filters a image enhancement and smoothing techniques. Paripex -
Indian Journal Of Research, 2 (7): 31-33 (2013).
130 Efstathiou, C. E., Signal smoothing algorithms. chem.
131 Rinnan, Å., Berg, F. v. d.and Engelsen, S. B. Review of the most common pre-
processing techniques for near-infrared spectra. TrAC Trends in Analytical
Chemistry 28 (10): 1201-1222 (2009).
132 Luo, J., Ying, K., He, P.and Bai, J. Properties of savitzky–golay digital
differentiators. Digital Signal Processing, 15: 122–136 (2005).
133 Jain, A. K., Fundamentals of digital image processing, Englewood Cliffs: prentice-
Hall. 1989, (1989)
134 Maini, R. and Aggarwal, H. A comprehensive review of image enhancement
techniques Journal of Computing, 2 (3): 8-13 (2010).
135 Pohl, C. and Van Genderen, J. L. Review article multisensor image fusion in remote
sensing: Concepts, methods and applications. International journal of remote
sensing, 19 (5): 823-854 (1998).
136 Pal, N. R. and Pal, S. K. A review on image segmentation techniques. Pattern
recognition, 26 (9): 1277-1294 (1993).
137 Shakti, S. Comparative study of various image segmentation methods. International
Journal in Multidisciplinary and Academic Research (SSIJMAR), 2 (3): (2013).
138 Nixon, M. and Aguado, A. S., Feature extraction & image processing, Replika
Press Pvt Ltd, Delhi. 2002, (2002)
139 Wechsler, H. Texture analysis—a survey. Signal Processing, 2 (3): 271-282
(1980).
140 Zhang, J. and Tan, T. Brief review of invariant texture analysis methods. Pattern
recognition, 35 (3): 735-747 (2002).
141 Reed, T. R. and DuBuf, J. H. A review of recent texture segmentation and feature
extraction techniques. CVGIP: Image understanding, 57 (3): 359-372 (1993).
146
142 Xie, X. A review of recent advances in surface defect detection using texture
analysis techniques. Electronic Letters on Computer Vision and Image Analysis, 7
(3): 1-22 (2008).
143 Chen, Y. Q., Nixon, M. S.and Thomas, D. W. Statistical geometrical features for
texture classification. Pattern Recognition, 28 (4): 537-552 (1995).
144 Bharati, M. H., Liu, J. J.and MacGregor, J. F. Image texture analysis: Methods and
comparisons. Chemometrics and intelligent laboratory systems, 72 (1): 57-71
(2004).
145 Srinivasan, G. N. and G, S. Statistical texture analysis. Proceedings Of World
Academy Of Science, Engineering And Technology, 36: 1264-1269 (2008).
146 Aioanei, S., Kurani, A., Xu, D.-H.and Undergraduates, C. T. I., Texture analysis for
computed tomography studies. 2002, DePaul University.
147 Materka, A. and Strzelecki, M., Texture analysis methods–a review. 1998,
Technical university of lodz, institute of electronics, COST B11 report, : Brussels.
p. 9-11.
148 Cai, T. T., Zhang, D.and Ben-Amotz, D. Enhanced chemical classification of raman
images using multiresolution wavelet transformation. Applied spectroscopy, 55 (9):
1124-1130 (2001).
149 Li, P., Du, Guorong, Cai, W.and Shao, X. Rapid and nondestructive analysis of
pharmaceutical products using near-infrared diffuse reflectance spectroscopy.
Journal of pharmaceutical and biomedical analysis, 70: 288-294 (2012).
150 Saeys, Y., Inza, I.and Larrañaga, P. A review of feature selection techniques in
bioinformatics. bioinformatics 23 (19): 2507-2517 (2007).
151 Novaković, J., Strbac, P.and Bulatović, D. Toward optimal feature selection using
ranking methods and classification algorithms. The Yugoslav Journal of Operations
Research, 21 (1): 2334-6043 (2011).
152 Hall, M. A. and Smith, L. A. 1998. Practical feature subset selection for machine
learning, in Proceedings of the 21st Australian Computer Science Conference.
153 Holte, R. C. Very simple classification rules perform well on most commonly used
datasets. Machine learning, 11 (1): 63-90 (1993).
147
154 Chatcharaporn, K., Kittidachanupap, N., Kerdprasop, K.and Kerdprasop, N.,
Comparison of feature selection and classification algorithms for restaurant
dataset classification.
155 Jorgensen, A., Clustering excipient near infrared spectra using different
chemometric methods. 2000, Pharmaceutical Technology Division Department of
Pharma University of Helsinki: Helsinki.
156 O'Connell, M.-L., Howley, T., Ryder, A. G., Leger, M. N.and Madden, M. G.
Classification of a target analyte in solid mixtures using principal component
analysis, support vector machines, and raman spectroscopy. International Society
for Optics and Photonics: 340-350 (2005).
157 Rajalahti, T. and Kvalheim, O. M. Multivariate data analysis in pharmaceutics: A
tutorial review. International journal of pharmaceutics 417 (1): 280-290 (2011).
158 Van der Meer, F. The effectiveness of spectral similarity measures for the analysis
of hyperspectral imagery. International journal of applied earth observation and
geoinformation, 8 (1): 3-17 (2006).
159 Sedgwick, P. Pearson’s correlation coefficien. BMJ, 345: (2012).
160 Hall, G. Pearson’s correlation coefficient. 2015. Available from:
http://www.hep.ph.ic.ac.uk/~hallg/UG_2015/Pearsons.pdf.
161 Massart, D. L., Vandeginste, B. G. M., Deming, S. N., Michotte, Y.and Kaufman,
L., Chemometrics: A textbook, Elsevier Science Publishers B.V, Netherlands.
1988, (1988)
162 Wang, L., Zhang, Y.and Feng, J. On the euclidean distance of images. Pattern
Analysis and Machine Intelligence, IEEE Transactions, 27 (8): 1334-1339. (2005).
163 Sugiyama, M. Advanced data analysis: K-mean clustering. Available from:
http://www.google.com.tr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=4&ved=
0CC8QFjAD&url=http%3A%2F%2Fwww.cis.upenn.edu%2F~cis519%2Ffall201
4%2Flectures%2F13_Unsupervised%2520Learning.pdf&ei=T3JYVY2pIubIsQSll
4DoBA&usg=AFQjCNE1EMrs5__RPisNIwZhwIdkEG6MuQ&bvm=bv.9.
164 Mirkin, B., Core concepts in data analysis: Summarization, correlation and
visualization: Summarization, correlation and visualization, Springer Science &
Business Media. 2011, (2011)
148
165 Chevallier, S., Bertrand, D., Kohler, A.and Courcoux, P. Application of pls‐da in
multivariate image analysis. Journal of Chemometrics 20 (5): 221-229 (2006).
166 Nikam, S. S. A comparative study of classification techniques in data mining
algorithms. Orient.J. Comp. Sci. and Technol, 8 (1):
167 Gupta, M. and Aggarwal, N. 2010. Classification techniques analysis in In
NCCI2010-National Conference on Computational Instrumentation, CSIO,
Chandigarh, India.
168 Kotsiantis, S. B., Zaharakis, I. D.and Pintelas, P. E. Machine learning: A review of
classification and combining techniques. Artificial Intelligence Review, 26 (3):
159-190 (2006).
169 Wu, W., Walczak, B., Massart, D. L., Erni, S. H., F., Last, I. R.and Prebble, K. A.
Artificial neural networks in classification of nir spectral data: Design of the
training set. Chemometrics and intelligent laboratory systems, 33 (1): 35-46
(1996).
170 Kotsiantis, S. B. Supervised machine learning: A review of classification
techniques. Informatica (03505596), 31 (3): 249-268 (2007).
171 Li, D., Chen, L., Li, Y., Tian, S., Sun, H.and Hou, T. Admet evaluation in drug
discovery. 13. Development of in silico prediction models for p-glycoprotein
substrates. Molecular pharmaceutics, 11 (3): 716-726 (2014).
172 Sheng, T., Wang, J., Li, Y.and Xu, X. Drug-likeness analysis of traditional chinese
medicines: Prediction of drug-likeness using machine learning approaches.
Molecular pharmaceutics, 9 (10): 2875-2886 (2012).
173 Lei, C., Li, Y., Zhao, Q., Peng, H.and Hou, T. Adme evaluation in drug discovery.
10. Predictions of p-glycoprotein inhibitors using recursive partitioning and naive
bayesian classification techniques. Molecular pharmaceutics, 8 (3): 2011 (889-
900).
174 Wang, S., Li, Youyong, Wang, J., Chen, L., Zhang, L., Yu, H.and Hou, T. Admet
evaluation in drug discovery. 12. Development of binary classification models for
prediction of herg potassium channel blockage. Molecular pharmaceutics 9(4):
996-1010 (2012).
149
175 Phyu, T. N. 2009. Survey of classification techniques in data mining, in In
Proceedings of the International MultiConference of Engineers and Computer
Scientists, Hong Kong: p. 18-20.
176 Anzanello, M. J., Ortiz, R. S., Limbergerb, R. P.and Mayorga, P. A multivariate-
based wavenumber selection method for classifying medicines into authentic or
counterfeit classes. Journal of pharmaceutical and biomedical analysis 83: 209-
214 (2013).
177 Hou, T., Li¸Nan, , L., Youyongand Wang, W. Characterization of domain–peptide
interaction interface: Prediction of sh3 domain-mediated protein–protein
interaction network in yeast by generic structure-based models. Journal of
proteome research, 11 (5): 2982-2995 (2012 ).
178 Roggo, Y., Degardin, K.and Margot, P. Identification of pharmaceutical tablets by
raman spectroscopy and chemometrics. Talanta, 81 (3): 988-995 (2010).
179 Dégardin, K., Roggo, Y., Been, F.and Margot, P. Detection and chemical profiling
of medicine counterfeits by raman spectroscopy and chemometrics. Analytica
chimica acta, 705 (1): 334-341 (2011).
180 Ramirez, J. L., Bellamy, M. K.and Romañach, R. J. A novel method for analyzing
thick tablets by near infrared spectroscopy. (2001). AAPS PharmSciTech, 2 (3):
15-24 (2001).
181 Laitinen, N., Antikainen, O.and Yliruusi, J. Characterization of particle sizes in bulk
pharmaceutical solids using digital image information. AAPS PharmSciTech, 4 (4):
383-391. (2003).
182 Cruz, J. and Blanco, M. Content uniformity studies in tablets by nir-ci. Journal of
Pharmaceutical and Biomedical Analysis, 56: 408– 412 (2011).
183 Lopes, M. B. and Wolff, J. C. Investigation into classification/sourcing of suspect
counterfeit heptodin™ tablets by near infrared chemical imaging Analytica chimica
acta, 633 (1): 149-155 (2009).
184 Laksmana, F. L., Van Vliet, L. J., Kok, P. H., Vromans, H., Frijlink, H. W.and Van
der Voort Maarschalk, K. Quantitative image analysis for evaluating the coating
thickness and pore distribution in coated small particles Pharmaceutical research,
26 (4): 965-976 (2009).
150
185 Lopes, M. B., Wolff, J. C., Bioucas-Dias, J. M.and Figueiredo, M. A. Determination
of the composition of counterfeit heptodin™ tablets by near infrared chemical
imaging and classical least squares estimation. Analytica chimica acta, 641 (1): 46-
51 (2009).
186 Nikon, Nikoninstruments. Nikon.
187 Fichera, L. An implementation of imadjust in c#. 2012 October 11; Available from:
http://lorisfichera.github.io/blog/2012/10/11/an-implementation-of-imadjust-in-c-
number/.
188 Szczypinski, P. M., Strzelecki, M., Materka, A.and Klep, A. Mazda—a software
package for image texture analysis. Computer methods and programs in
biomedicine, 94 (1): 66-76 (2009).
189 Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P.and Witten, I. H. The
weka data mining software: An update. SIGKDD Explorations, 11 (1): 11-18
(2009).
190 MATLAB, Matlab and statistics toolbox release r2013a. The MathWorks, Inc.,
Natick, Massachusetts: United States.
191 Dongwoo Optron Co., L. Dongwoo optron. 2012. Available from:
http://www.dwoptron.com/lib/download.asp?aDir=datas&file=2012+DW+catalog
.pdf.
xi
Plagiarism Report
xii
List of Publications and Reprints
1. Tahir, F., and Fahiem, M. A. A Statistical-Textural-Features Based Approach for
Classification of Solid Drugs Using Surface Microscopic Images. Computational
and mathematical methods in medicine, 2014: (2014).
2. Tahir, F., Fahiem, M. A., Tauseef, H. Farhan, S. A Survey of Multispectral High
Resolution Imaging Based Drug Surface Morphology Validation Techniques. Life
Science Journal - Acta Zhengzhou University Overseas Edition, 10 (7s): 1050-
1059 (2013).
3. Farhan, S., Fahiem, M. A., Tahir, F. Tauseef, H. A Comparative Study of
Neuroimaging and Pattern Recognition Techniques for Estimation of Alzheimer’s
Disease. Life Science Journal - Acta Zhengzhou University Overseas Edition,
10 (7s): 1030-1039 (2013).
4. Tauseef, H., Fahiem, M. A., Farhan, S., Tahir, F. A Review of Image and
Phylogenetic Analysis Based Techniques for Ischemic Stroke Risk Estimation.
Life Science Journal - Acta Zhengzhou University Overseas Edition, 10 (7s):
1040-1049 (2013).