formulating offline nondestructive validation of...

FORMULATING OFFLINE NONDESTRUCTIVE

VALIDATION OF SOLID DRUG SURFACE MORPHOLOGY

USING MICROSCOPIC MULTISPECTRAL HIGH

RESOLUTION IMAGING

___________________________________________________________________________

FAHIMA TAHIR

___________________________________________________________________________

DEPARTMENT OF COMPUTER SCIENCE

LAHORE COLLEGE FOR WOMEN UNIVERSITY, LAHORE-

PAKISTAN

2015

FORMULATING OFFLINE NONDESTRUCTIVE

VALIDATION OF SOLID DRUG SURFACE

MORPHOLOGY USING MICROSCOPIC

MULTISPECTRAL HIGH RESOLUTION IMAGING ____________________________________________________________________

A THESIS SUBMITTED TO LAHORE COLLEGE FOR WOMEN

UNIVERSITY IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR

THE DEGREE OF DOCTOR OF PHILOSOPHY IN COMPUTER SCIENCE

By

FAHIMA TAHIR

03-B/LCWU-5703

_____________________________________________________________________

DEPARTMENT OF COMPUTER SCIENCE

LAHORE COLLEGE FOR WOMEN UNIVERSITY, LAHORE-

PAKISTAN

2015

CERTIFICATE

This is to certify that the research work described in this thesis submitted by

Ms. Fahima Tahir to Department of Computer Science, Lahore College for Women

University has been carried out under my/our direct supervision. I have personally gone

through the raw data and certify the correctness and authenticity of all results reported

herein. I further certify that thesis data have not been used in part or full, in a manuscript

already submitted or in the process of submission in Partial/complete fulfillment of the

award of any other degree from any other institution or home or abroad. We also

certified that the enclosed manuscript, has been to paid under my supervision and I

endorse its evaluation for the award of Ph.D. degree through the official procedure of

University.

________________

Dr. Muhammad Abuzar Fahiem

Supervisor

Date:

Verified By

________________

Name

Chairperson

Department of _______

Stamp

_________________

Controller of Examination

Stamp

Date: ___________

Dedicated to my parents

ACKNOWLEDGMENTS

All gratitude is to Almighty Allah and peace be upon the Holy Prophet Muhammad

(SAW). I heartily want to express my gratitude to everyone who has supported and

helped me in order to complete my Ph.D. dissertation

I am deeply grateful to my supervisor Dr. Muhammad Abuzar Fahiem for extending

his dedicated support and advice throughout the research. He not only continuously

guided me with his precious knowledge but also remained a source of motivation and

encouragement for me.

I am especially thankful to my institute Lahore College for Women University for

facilitating me during this research.

I would also like to extend thanks to all of my friends and colleagues who were the

honest critics and helped me with their fruitful comments and suggestions.

Last but not the least; I am most indebted to my family. The prayers of my parents

remain constantly with me, which lead me towards the successful completion of this

research. I am sincerely thankful for the cooperation of my siblings, who have always

been there for me.

CONTENTS

List of Tables i

List of Figures ii

List of Equations iv

List of Abbreviations v

Abstract ix

Chapter no. 1 Introduction 1

1.1 Drug Dosage Forms 1

1.1.1 Solid Dosage Forms 2

1.1.1.1 Tablets 2

1.1.1.2 Capsules 3

1.1.2 Liquid Dosage Forms 4

1.1.3 Semisolid Dosage Forms 5

1.2 Substandard Medicines 5

1.2.1 Counterfeits 6

1.2.2 Expired 6

1.2.3 Environment Affected 7

Chapter no. 2 Review of Literature 10

2.1 Data Acquisition 10

2.1.1 Chromatographic Techniques 11

2.1.1.1 Thin Layer Chromatography 12

2.1.1.2 HPLC 12

2.1.2 Spectroscopic Techniques 12

2.1.2.1 Mass Spectrometry 13

2.1.2.2 Nmr Spectroscopy 14

2.1.2.3 X-Ray Diffraction 14

2.1.2.4 Scanning Electron Microscopy 15

2.1.2.5 Vibrational Spectroscopic Techniques 16

2.1.3 Imaging Techniques 24

2.1.4 Spectral Imaging Techniques 26

2.1.4.1 Hyperspectral Imaging 27

2.1.4.2 Multispectral Imaging 32

2.2 Preprocessing Techniques 32

2.2.1 Smoothing 33

2.2.2 Normalization 34

2.2.3 Standard Normal Variate Correction 34

2.2.4 Multiplicative Scatter Correction 35

2.2.5 Savitzky-Golay Derivative Conversion 36

2.2.6 Image Enhancement 36

2.3 Feature Extraction Techniques 37

2.3.1 Low Level Feature Extraction 38

2.3.2 High Level Feature Extraction 38

2.3.3 Textural Feature Extraction 39

2.4 Feature Reduction Techniques 41

2.4.1 Information Gain 42

2.4.2 Symmetrical Uncertainty 43

2.4.3 One-R 44

2.4.4 Chi-Square 44

2.4.5 Gain Ratio 44

2.4.6 Relief-F 45

2.4.7 Principal Component Analysis 45

2.5 Classification Techniques 45

2.5.1 Pearson’s Correlation Coefficient 46

2.5.2 Euclidean Distance 46

2.5.3 K-Mean Clustering 47

2.5.4 Fuzzy Clustering 47

2.5.5 Partial Least Square Discriminant Analysis 48

2.5.6 Artificial Neural Networks 48

2.5.7 Naïve Bayes 49

2.5.8 K-Nearest Neighbor 50

2.5.9 Support Vector Machine 50

Chapter no. 3 Proposed Approach – Microscopic Imaging 64

3.1 Microscopic Imaging 64

3.1.1 Image Acquisition 65

3.1.2 Preprocessing 67

3.1.2.1 Grayscale Conversion 68

3.1.2.2 Contrast Enhancement 68

3.1.3 Feature Extraction 69

3.1.3.1 Gray-Level Co-Occurrence Matrix 70

3.1.3.2 Histogram Features 70

3.1.3.3 Run Length Matrix 70

3.1.3.4 Autoregressive Model 71

3.1.3.5 Wavelet Transformations 71

3.1.4 Feature Reduction 71

3.1.5 Classification 72

Chapter no. 4 Proposed Approach – Multispectral Analysis 75

4.1 Multispectral Analysis 75

4.1.1 Spectrum Acquisition 75

4.1.2 Preprocessing 76

4.1.3 Feature Extraction 78

4.1.3.1 Wavelet Transformation 79

4.1.4 Classification 79

Chapter no. 5 Analysis and Discussion 81

5.1 Microscopic Imaging 81

5.2 Multispectral Analysis 105

5.3 Hybrid 120

Chapter no. 6 Conclusion and Future Recommendations 130

References 132

Plagiarism Report xi

List of Publications and Reprints xii

i

LIST OF TABLES

Table No. Title Page No.

2.1 Comparison between various quality assessment

techniques for drugs

54

2.2 Comparison between different researches for the analysis

of medicines

58

3.1 Dataset description 67

3.2 List Of Top 15 Selected Features From CS, GR And RF 73

5.1 LOO results for all individual and combined datasets

using 281 features

83


using top 15 features

86



91

5.4 Accuracies for test datasets using 281 features 94

5.5 Accuracies for test datasets using top 15 selected features 97

5.6 Accuracies for test datasets using top 2 features 102

5.7 Results achieved by experiment I 108

5.8 Results achieved by experiment II 115

5.9 Results Against Combined Datasets Using Experiment

III

118

5.10 Test accuracies using hybrid approach for individual

datasets

123

5.11 Test accuracies using hybrid approach for combined

datasets

125

5.12 Comparison of highest accuracies achieved using MI,

MA and hybrid approaches

127

ii

LIST OF FIGURES

Figure No. Title Page No.

2.1 The electromagnetic spectrum 10

2.2 Two level decomposition for the computation of WC 41

3.1 Basic flow of the proposed MI approach 64

3.2 Detailed diagram of the proposed MI approach 66

3.3 The sample images of the DSPPs and NSPPs in each

dataset (a) images contained in dataset H1 (b) images

contained in dataset H2 (c) images contained in dataset

H3 (d) images contained in dataset T1 (e) images

contained in dataset T2 (f) images contained in dataset T3

(g) images contained in dataset M1 (h) images contained

in dataset M2 (i) images contained in dataset M3

68

4.1 Flow of the proposed MA approach 75

4.2 Multispectral data for NSPP and DSPP within UV

wavelength. (a) Spectra of NSPP and humidity affected

DSPP datasets H1, H2 and H3. (b) Spectra of NSPP and

temperature affected DSPP datasets T1, T2 and T3. (c)

Spectra of NSPP and moisture affected DSPP datasets

M1, M2 and M3.

77

4.3 Multispectral data for NSPP and DSPP within Visible





M1, M2 and M3.

78

4.4 Multispectral data for NSPP and DSPP within IR





M1, M2 and M3.

80

5.1 LOO results against all individual and combined datasets

using 281 features

84

iii



85



90

5.4 HV results against all individual and combined datasets

using all 281 features

93



101



104

5.7 Comparison of accuracies against UV, IR and Visible for

experiment I

106

5.8 Comparison of sensitivity against UV, IR and Visible for

experiment I

107


experiment II

112

5.10 Comparison of sensitivity against UV, IR and Visible for

experiment II

113


experiment III

114

5.12 Basic flow of the hybrid approach 121

5.13 HV results using hybrid approach for individual datasets 124

5.14 Test accuracies using hybrid approach for combined

datasets

126

5.15 Comparison of accuracies achieved using all three

approaches

129

iv

LIST OF EQUATIONS

Equation No Title Page No.

2.1 Formula for smoothing a signal 34

2.2 Formula for SNVC 35

2.3 Formula for the estimation of correlation coefficients

of MSC

35

2.4 Formula for the corrections in MSC 36

2.5 Entropy calculation when y is independent variable 42

2.6 Entropy calculation when y is dependent on x 43

2.7 Formula for IG 43

2.8 Symmetrical uncertainty formula 43

2.9 Formula for chi-square 44

2.10 Formula for gain ratio 45

2.11 K-mean clustering 47

2.12 Centroid calculation for FC 48

3.1 Contrast enhancement formula 69

3.2 Formula for dataset representation 70

3.3 Formula for EC 74

v

LIST OF ABBREVIATIONS

ANN Artificial Neural Networks

APIs Active Pharmaceutical Ingredients

AR Autoregressive

BA Bioavailability

CA Cluster Analysis

CI Chemical Imaging

CLSM Confocal Laser Scanning Microscopy

CS Chi-Square

DESI Desorption Electrospray Ionization

DSPP Defective Solid Pharmaceutical Product

ED Euclidean Distance

EM Electromagnetic

ER Electromagnetic Radiation

ES Electromagnetic Spectrum

FC Fuzzy Clustering

FD First Derivative

FDA Food and Drug Administration

FIR Far Infrared

GLCM Gray-level Co-occurrence Matrix

GR Gain Ratio

HCA Hierarchical Cluster Analysis

vi

HIS Hyperspectral Imaging

HPLC High Performance Liquid Chromatography

HV Holdout Validation

IE Image Enhancement

IG Information Gain

IP Image Processing

IR Infrared

KNN K-Nearest Neighbor

LDA Linear Discriminant Analysis

LOO Leave-one-out

LSLS Local Straight Line Screening

MI Microscopic Imaging

MIR Middle Infrared

ML Machine Learning

MLR Multiple Linear Regression

MN Max-min Normalization

MS Mass Spectrometry

MSC Multiplicative Scatter Correction

MSI Multispectral Imaging

NB Naïve Bayes

NDRA National Drug Regularity Authority

NIR Near Infrared

vii

NIRCI Near Infrared Chemical Imaging

NIRS Near Infrared Spectroscopy

NMR Nuclear Magnetic Resonance

NPRA National Pharmaceutical Regulatory Authority

NSPP Non-defective Solid Pharmaceutical Product

Oil-in-water O/W

OOS Out of Specification

PCA Principal Component Analysis

PLS Partial Least Square

PLS-DA Partial Least Square Discriminant Analysis

QDA Quadratic Discriminant Analysis

RF Relief-F

RGB Red-Green-Blue

RLM Run Length Matrix

RS Raman Spectroscopy

SD Second Derivative

SDC Savitzky-Golay Derivative Conversion

SEM Scanning Electron Microscopy

SFFC Spurious/Falsely-labeled/Falsified/Counterfeit

SIMCA Soft Independent Modelling of Class Analogy

SNVC Standard Normal Variate Correction

viii

SP Signal Processing

SPP Solid Pharmaceutical Product

SS Semisolid

SU Symmetrical Uncertainty

SVM Support Vector Machine

TLC Thin Layer Chromatography

TOF Time of Flight

ToF-SIMS Time-of-Flight Secondary-Ion Mass Spectrometry

VS Vibrational Spectroscopy

W/O Water-in-oil

WC Wavelet Coefficients

WHO World Health Organization

WT Wavelet Transformation

XRD X-ray Diffraction

ix

ABSTRACT

The non-destructive analysis of a Solid Pharmaceutical Product (SPP) is essential to

verify the quality without destroying the product. This analysis may be performed using

various image processing and signal processing techniques on images and multispectral

data. Based on this analysis, an SPP may be classified as defective or non-defective.

The SPP (categorized as defective) are exposed to three different environmental factors

(humidity, temperature and moisture) over different time periods and the variations in

data are analyzed to judge the effects of these factors on classification of an SPP. In this

research, we have proposed two non-destructive methods to identify defective and non-

defective SPPs using their surface morphology. In first approach, multiple textural

features are extracted using microscopic images of the surface of the defective and non-

defective SPPs. These textural features are Gray Level Co-occurrence Matrix, Run

Length Matrix, Histogram, Auto Regressive Model and HAAR Wavelet. Total textural

features extracted from microscopic images are 281. The features are reduced using

three feature reduction techniques; Chi-square, Gain Ratio and Relief-F. We have

formulated three feature sets, through experimentation, with 281, 15 and 2 features. We

have used four classifiers namely Support Vector Machine, K-Nearest Neighbors,

Naïve Bayes and Ensemble of Classifiers, to calculate the accuracy of proposed

approach. The classifiers are implemented using leave-one-out cross validation and

holdout validation methods. We tested each classifier against all feature sets and the

results were compared. The results showed that in most of the cases, Support Vector

Machine performed better than the other classifiers.

In second approach, we have used multispectral data and applied wavelet

transformations in conjunction with various machine learning techniques for the

classification. The results showed that the spectrum extracted from Ultra Violet

x

wavelength range is more suitable for the classification between defective and non-

defective SPPs. Furthermore, results also described that K-Nearest Neighbors classifier

or Ensemble of Classifiers is a more appropriate classifier.

In the last, the hybrid of the both approaches was tested. The analysis of the results

showed that the hybrid approach is better than the individual ones. An accuracy of 94%

is achieved using K-Nearest Neighbors when a combined dataset of SPPs affected by

all of the three environmental factors is used.

CHAPTER NO. 1

INTRODUCTION

1

In Pharmacology, drugs are such chemical substances that are used to change the

physical condition of the patients for the treatment of different diseases. The physician

prescribes them either for a short period in case of acute diseases or on a regular basis

for chronic disorders. According to World Health Organization (WHO), the usage of

substandard drugs like low quality, expired and counterfeits are the real threat to the

health of patients. The identification of such substandard drugs is a common problem

of developing as well as developed countries. According to US Food and Drug

Administration (FDA), up to 25% of all drugs consumed in poor countries are thought

to be counterfeit or substandard. Many reports describe; that the drugs used to treat

serious diseases such as malaria, tuberculosis, AIDS or other infections are more often

the object of counterfeits.

1.1 Drug Dosage Forms

According to a particular disease, the decision for the drug delivery system and accurate

required amount is very important. The physical type and amount of medication is

known as Dosage Form. The dosage forms also describe the route of the drug

administration; route of administration means the path through which a drug is

delivered to the site of action in the body. Dosage forms are required for accurate dosage

for the patient. With the passage of time, the evolution of the medical therapies and

drugs also results in new dosage forms. Drugs are available in different dosage forms

like:

Solid Dosage Forms

Liquid Dosage Forms

Semisolid Dosage Forms

2

1.1.1 Solid Dosage Forms

The solid dosage forms are medicines that contain accurate dosage and can be given to

the patients as a single unit (dose). They are administrated orally in the form of Tablets,

Capsules or Powders [1]. Solid medicines are the mixture of APIs, with a combination

of different diluents, binders, lubricants, glidants and many other excipients.

Manufacturing of solid medicines requires machines, which are complicated and costly.

Capsules and tablets are most common forms that are widely used in the industry and

have similar manufacturing procedure. Solid dosage forms are easy for shipment and

more stable as compared to liquid drugs. Mostly they have longer expiration dates.

1.1.1.1 Tablets

Tablets are the most popular dosage form. In Pharmaceutical industry, 70% of the

medicines are manufactured in the form of tablets [1]. Tablets are available in different

shapes and sizes, which would be easy to swallow for patients. For identification and

differentiation between different tablets, they are stamped with symbols, numbers and

letters. Along with different medicinal substances, tablets may also have some adjuncts

also known as excipients. For example to ensure efficient tableting manufacturers may

use binders, glidants, lubricants, pigments and coatings. Generally tablets can be

manufactured either by molding or compressing [2]. So various types of tablets are:

i. Compressed Tablets

ii. Molded Tablets

Compressed tablets are prepared by single compression technique using tablet

machines. These tablets are prepared by placing a specific quantity of powdered of

granulated tablet material into special dies. These dies are then compressed by the upper

and lower punches of the machine under high pressure (~tons/in2). It is the least

3

complex, shortest and most effective method for tablet production. After compressing

the APIs, the manufacturers can use excipients and the lubricants. Compressed tablets

can be further divided into three different categories: Multiple Compressed Tablets,

Chewable Tablets and Tablet Triturates. Multiple layered tablets are prepared by

compressing more than once. They are also known as Tablet within a tablet (with cores

and shells). Wet granulation and compression are used to prepare chewable tablets.

These tablets have a property to disintegrate when chewed so they can dissolve in mouth

rapidly. Chewable tablets are commonly used in making of multiple vitamin tablets or

tablet formulation for children. A Small amount of potent drug, formulated into small

and usually cylindrical shaped tablets is known as tablet triturates. These tablets are

completely soluble in water that is why any kind of water insoluble material is avoided

in the formulation process.

Tablets can also be made by modeling instead of compression. Molds of different

shapes can be used for this kind of tablet preparation. They can be prepared either by

tablet machinery or manually. The dampened tablet material is forced into the mold;

formed tablet is then ejected from the mold and left to dry. Modeling technique is

normally reserved for small scale production.

1.1.1.2 Capsules

Pharmaceutical ingredients often have bitter tastes, unpleasant odor or can be reactive to

oxygen. Such drugs may require some kind of coating or encapsulation and capsules are

the best and cheapest solution. Capsules are drugs in which drug substance is enclosed in a

container. These containers can be either hard, soft, gelatin shells or water-soluble [3].

Coatings or enclosing the drug substances in a capsule may affect the bioavailability (BA)

of the drug. Capsules can be of two types.

4

i. Hard Gelatin Capsules

ii. Soft Gelatin Capsules

Powdered or dry ingredients enclosed in hard-shelled capsules are known as hard gelatin

capsules. They are more versatile for controlled drug delivery than soft gelatin capsules.

These capsules are based on two pieces; a cap and a body. The gelatin shells start softening

after ingestion and commence to dissolve in the gastrointestinal tract. After dissolving the

capsule body, the encapsulated drug can disperse rapidly and easily which increases its BA.

These capsules can be supplied in a variety of different sizes starting from ‘000’ for the

largest and ‘5’ for the smallest [3].

Soft gelatin capsules are used for encapsulating oils or such APIs that can be dissolved or

suspended in oil. Plasticized gelatin is used to prepare soft gelatin capsules. Drugs like

Vitamin A, Vitamin E, Chlorotrianisene, Declomycin, Digoxin and Chloral hydrate are

usually prepared in soft gelatin capsules.

1.1.2 Liquid Dosage Forms

Liquid dosage forms cover solutions, syrups, emulsions, suspensions and many more.

Homogeneous mixtures of one solute dispersed in a solvent are known as solutions.

Syrups are the aqueous solutions having sugar or any substitute for sugar with a

combination of different flavoring agents. Physicochemical stability of the drug can be

maintained by using stabilizers in syrups and solutions. The combinations of two or

more liquids that are immiscible are known as emulsions. Emulsions range from low

viscosity lotions to ointments and creams, which come under the category of semi-solid

dosage forms. Insoluble fine solid particles of a drug substance forms suspension, when

dispersed in a liquid medium [3]. Some of the suspending agents are used to increase

the viscosity of the drug that results in slow dissolution of drug.

5

1.1.3 Semisolid Dosage Forms

Ointments, creams, gels etc. lie under the category of Semisolid (SS) dosage forms. The

greasy medications used for skin, rectum or nasal mucosa, which can dissolve to the

skin, are known as Ointments. Creams are mixtures of oil and water. Oil-in-water

(O/W) Creams are more effective and comfortable as they are less greasy. They are

easy to wash out so cosmetically acceptable. Another category is water-in-oil (W/O)

creams. They are reversed of O/W creams, more greasy and difficult to handle.

1.2 Substandard Medicines

In the pharmaceutical industry, each Solid Pharmaceutical Product (SPP) that is being

produced by manufacturers should be according to the defined quality metrics. In the

market, along with genuine drugs, customers can also find substandard SPPs.

According to WHO, Substandard SPPs are genuine products but they ultimately do not

fulfill the quality standards. They are also known as Out of Specification (OOS)

products. The licensed manufacturers who are working under National Pharmaceutical

Regulatory Authority (NPRA) [4] develop them. Some characteristics of such

substandard SPPs are [5]:

Sold either after expiration date

Affected due to improper supply and storage

They can have either too much or too low amount of API

They can have contaminated ingredients.

Sometimes they can have fake packaging or any other kind of quality

negligence.

6

These SPPs may be created either by the carelessness of the pharmacists, insufficient

financial and human resources, by the use of obsolete or malfunctioning of the

laboratory equipment or counterfeiting. They are harmful to patient’s health or

sometimes even cause death [4]. They can be further categorized into three categories:

Counterfeit

Expired

Environment Affected

1.2.1 Counterfeits

Counterfeits are subsets of Substandard SPPs but authorized or licensed manufacturers

do not develop them. According to WHO [6], they are also known as Spurious/Falsely-

labeled/Falsified/Counterfeit (SFFC) medicines. They are fake medicines that look like

genuine. The difference between other substandard and counterfeiting is; the illegally

or intentionally created false medicines are known as counterfeit [4]. Sometimes they

have fake packaging or there can be the absence of API or presence of API in an

inappropriate amount [7]. Multiple different factors contribute to the proliferation of

the counterfeits. They should be identified accurately so that government can take

appropriate actions to eradicate them.

1.2.2 Expired

Another kind of substandard SPPs is expired ones. Every SPP has an expiry date

prescribed by its manufacturers. Expired SPPs are unpredictable in their effectiveness

level. With the passage of time due to the loss of potency level, these medicines can be

either completely ineffective or sometimes even poisonous for health. Medicines are

7

chemical compounds that change their composition with the passage of time; these

changes can be either, in their color, smell or texture.

1.2.3 Environment Affected

Environment-affected medicines are those, which conform to the standards at the time

of manufacturing, but with the passage of time, different external factors change them

into the category of substandard medicines. These factors include moisture, light

(especially sunlight), extreme temperature and oxygen. Oxidation and reduction occur

due to the exposure of drug formulation with oxygen, which results in unstable or

substandard drug. Similarly high temperature and light may also fasten the oxidation

and reduction within a drug. Moisture and humidity may also damage the stability of

any drug. This instability of tablets may result in unpredictable behavior (like their

disintegration and dissolution time) or change in their physical appearance (like

hardness, shape, color etc.). As discussed by Islam and others [8], the escalation for the

moisture present in a pharmaceutical solid tablet from its actual level, results in the

reactions of the API and excipients. He also stated in his research that, moisture

accelerates the hydrolysis process and react with excipients, which affects the physical

and chemical stability of an SPP. In another research, Szakonyi and Zelkó [9] states,

Water absorption in the surface of a tablet results in degradation of its Active

Pharmaceutical Ingredients (APIs). The use of defective tablets may cause some minor

issues in the patient’s body like allergies or may result in their death. Therefore, there

is an immense need of such an approach that can identify environmental affected SPPs

after their manufacturing.

In this research, we are dealing with three environmental factors affecting on SPPs i.e.

moisture, humidity and temperature. Here term moisture is used to represent the liquid

8

form of water. The increase in moisture from its actual need can cause reactions of APIs

and excipients as discussed in [8]. Humidity refers to the gaseous state of water in the

air. The APIs of the pharmaceutical tablets indicates reaction with humidity if they left

in the open air which results in oxidation and reduction process. Lastly, high

temperature can cause a change in potency and efficacy of the APIs in SPPs. All of

these three factors have a great influence on the performance of the SPP, so there should

be an approach for the identification of such defective SPPs.

The non-destructive analysis of a Solid Pharmaceutical Product (SPP) is essential to

verify the quality without destroying the product. In pharmaceutical industries, the

existing approaches for the quality assessment of newly created medicines or for the

detection of counterfeit medicines are destructive and time consuming. These

approaches also require sample preparation and special laboratory environment and

equipment. The focus of this research is to formulate a nondestructive offline approach

for the classification between Defective SPPs (DSPP) and Non-defective SPPs (NSPP).

In this research, the analysis is performed using various Signal and Image Processing

(IP) techniques in conjunction with different Machine Learning (ML) techniques on

image and multispectral data.

The thesis can be read as Chapter 1 provides the brief introduction of the proposed

research; Chapter 2 provides the complete Review of Literature related to the

techniques available for the analysis of SPPs. We have divided the proposed approach

into two parts; one is related to the analysis of SPPs using microscopic high-resolution

imaging and the other one is using multispectral analysis. Both of the proposed

approaches are discussed in Chapter 3 and Chapter 4 respectively. Chapter 5 is

divided into three segments. First two provides analysis and discussion on both of the

proposed approaches and third part explains the results achieved from the combination

9

of the two. Chapter 6 provides the conclusion and future recommendations of the

proposed approach.

CHAPTER NO. 2

REVIEW OF LITERATURE

10

Different techniques are available in literature for data acquisition from the sample

being analyzed, feature extraction and classification purposes. In this chapter, we will

discuss various techniques that are useful for the analysis of SPPs.

2.1 Data Acquisition

In case of SPPs, the extracted data can be of two types: image and spectral data.

Trichromatic color space i.e. RGB (Red-Green-Blue) is used to define image data. Each

digital color of the image consists on the combination of these three colors. Different

electronic devices like digital cameras are used to capture such data. While on the other

hand, each spectral color space consists on tens or even hundreds of color components

[10]. An electromagnetic (EM) wave that is a function of the wavelength or frequency

is known as spectrum. The color of the sensed spectrum can be analyzed from its shape.

An EM spectrum as shown in Figure 2.1, consists of different types of EM radiations

e.g. Gamma rays, X-rays, ultraviolet, visible, infrared, microwaves and radio waves

[11].

Figure 2.1: The electromagnetic Spectrum

The visible part (380 nm to 780 nm) of the EM spectrum is the only part that can be

seen by human eye. The analysis using spectral data allow researchers to investigate

other important feature of the samples that cannot be seen by human eye.

11

Different devices are available that can be used to extract desired raw data from the

sample being analyzed. The assessment of the formulation, efficiency, correctness and

stability of a medicine is very important. In pharmaceutical industries, quality

assessment of drugs can be performed using different data acquisition methods. These

methods can give information about the active ingredients and the structural

information about the surface of the drug. They can be divided into four major

categories:

Chromatographic Techniques

Spectroscopic Techniques

Imaging Techniques

Spectral Imaging

Chromatographic techniques may include tests that are mostly being used in

pharmaceutical industry. These techniques are expensive, destructive, and time-

consuming and require sample preparation. Spectroscopic techniques use spectrometers

that provide spatial information in the form of spectrums for every sample being

studied. All of the spectral methods are non-destructive, less time-consuming and

require less or even no sample preparation except Mass spectrometry. Imaging

techniques involve IP for the analysis. Spectral Imaging techniques are newly emerging

techniques in pharmaceutics. They are the combination of spectroscopic techniques and

traditional imaging which provide both spatial as well as spectral information of the

given sample [12]. The detail of all these techniques is discussed below.

2.1.1 Chromatographic Techniques

Different techniques are available in literature for the assessment and estimation of

formulation, quality, correctness and stability of the solid drugs. Some of these

12

techniques are used at the time of manufacturing to get information about the correct

amount of APIs. According to different researches [7, 13, 14], Thin layer

Chromatography (TLC) and HPLC are most common techniques used for drug quality

testing. The brief description of some of them is given below.

2.1.1.1 Thin Layer Chromatography

TLC procedures can be used for the Detection of counterfeit drugs. These procedures

can be used for the identification and estimation of APIs from the drug. Deisingh [15]

uses TLC for the estimation and identification of counterfeit medicines or the APIs

from the tablets. Impure drug substances can also be identified using TLC [7].

2.1.1.2 HPLC

Most of the manufacturers use HPLC in the pharmaceutical industry to test the products

(medicines) and their raw material or ingredients. Manufacturers assign skilled analysts

for this test. They pass raw materials or prepared medicines through the HPLC machine

and then analyze their results. These machines required sample preparation for testing

which destroys the sample. Therefore, HPLC is a destructive, slow, expensive and time-

consuming method [16].

2.1.2 Spectroscopic Techniques

The interaction of light with molecules and atoms of the natural product (like drugs)

can provide information about their structures. This interaction may results in

spectrums, which lie under different regions of the ES and provide information about

surface structure and ingredients. These spectrums are further processed by the

computers [12]. Along with traditional tests, manufacturers may use these techniques

for the analysis of drugs composition and for the detection of counterfeit and

13

substandard medicines. These techniques just provide some characteristics of the

molecules but do not provide its three dimensional image. As discussed in some other

researches [12, 15, 17-24], solid drug assessment can also be performed using

Spectroscopic techniques. Spectral techniques, which are mostly used in literature for

the analysis of drugs, may include:

Mass Spectrometry (MS)

Nuclear Magnetic Resonance Spectroscopy (NMR)

X-ray Diffraction (XRD)

Scanning Electron Microscopy (SEM)

Vibrational Spectroscopic (VS)

VS includes Raman and Near Infrared Spectroscopy techniques. Different researches

[25-27] explain that all of these require either full or some amount of sample preparation

so they are either destructive or semi-destructive except that of the VS technique.

2.1.2.1 Mass Spectrometry

MS can be widely used to characterize pharmaceutical products. Drug profiling can be

done using time of flight (ToF) and electrospray ionization [15]. MS primarily

LC-MS/MS can be used in all stages of drug development. It can be used to elucidate

the structure of pharmaceutical drug mixtures through mass determination.

Pharmacokinetics of the newly created drug can be investigated through this technique

[19]. MS is a destructive and time consuming method. Barnes et al. [28] uses Time-of-

Flight Secondary-Ion Mass Spectrometry (ToF-SIMS) for the characterization of Bio

and solid-state pharmaceuticals. This can identify chemicals from pharmaceutical

materials and their distribution by analyzing their surface. Culzoni et al. [29] uses

ambient MS for the analysis and detection of falsified or substandard medicines. In

14

another research, MS along with Desorption electrospray ionization (DESI) is used by

Chen et al. [30] for the analysis of pharmaceuticals in an ambient environment.

2.1.2.2 NMR Spectroscopy

In Pharmaceutical industry, NMR is widely being used for the confirmation and

elucidation of the drug structures. Analysis of synthetic or natural products can be

performed through NMR spectroscopy. They can also be used to characterize the

composition and to find impurity profile of the drugs. NMR measurement can also

provide information about the conformations of drugs especially in Tablets [31].

European Pharmacopoeia use this for the identification of drugs and reagents.

Measurement of NMR spectra from liquid drugs is easier than from the solid drugs [17].

Multiple researches are available in literature that uses NMR in pharmaceutical

industry. Holzgrabe et al. [32], provides a review based on the applications of NMR in

pharmacy. According to him, quantitative NMR provides an evaluation based on

quality estimation of pharmaceuticals. In another review presented by the same author

illustrates that the use of quantitative NMR and diffusion-ordered spectroscopy DOSY

NMR experiments are extremely useful for the identification of counterfeit medicines

by elucidating their ingredients [33]. Malet-Martino explains the use of NMR in the

fields of pharmaceutical and biomedical [34].

2.1.2.3 X-ray Diffraction

XRD is one of the spectroscopic techniques, majorly used for the analysis and

identification of polymorphic and solvated forms. It is used to measure the degree of

crystallinity but have lower sensitivity as compared to IR Spectroscopy. This technique

can also be used to determine the quantitative amount of API from multicomponent

tablets. Croker [35] performs a comparative study on the performance of powder XRD,

15

RS and NIRS for the quantitative analysis of the polymorphic mixture named

Piracetam. The research concluded that in this situation RS and NIRS are more suitable

techniques than the XRD. In another research Maurin et al. [36] describes the use of

XRD for the identification of counterfeit medicines. The original Viagra® tablets and

their counterfeit versions were used for the analysis purpose in this research. The

research concluded that the use of XRD is a fast and reliable method for the prediction

of the absence or presence of active contents and excipients from the counterfeit and

genuine medicines. The authors also stated that the XRD is not well suited for trace

analysis. The more accurate methods for trace analysis can be HPLC, GS-MS and

HPLC-MS.

2.1.2.4 Scanning Electron Microscopy

In SEM, surface information of a sample can be traced using electron beam in a raster

pattern, which produce three-dimensional black and white. These images can be further

converted into color images using IP techniques. Surface fractures, contaminations,

chemical compositions, and crystalline structures can be examined using SEM. It is a

destructive method as it require sample preparation. Another drawback is its size and

cost. Trained persons are required to prepare sample and to operate SEM machine.

Scoutaris uses Energy Dispersive X-ray spectroscopy using SEM along with confocal

Raman microscopy for chemical characterization of Paracetamol tablets. The creation

of concentration maps helps in this chemical characterization of the solid tablets [37].

Klang et al. [38] provides a review of SEM techniques used for the domain of

pharmacy. Ruotsalainen et al. [39] uses Confocal Laser Scanning Microscopy

(CLSM) in combination with SEM for imaging film-core interface. The proposed

technique helps in identifying defects from the surface of film-coated tablets.

16

2.1.2.5 Vibrational Spectroscopic Techniques

Vibrational spectroscopic techniques such as IR, NIR and Raman Spectroscopy (RS)

are proving very beneficial for pharmaceutical quality analysis. It is more accurate, less

costly and reliable than the traditional methods. No sample preparation is required for

these analyses [18]. These tests are nondestructive so the sample product can be further

packaged or used again for other tests. Gendrin et al. [40] provides a review based on

VS and chemometric techniques for the analysis of pharmaceutical products.

Another research by De Beer et al. [41] states that the use of RS and NIRS can be

effectively used as process analyzers under Process Analytical Technology framework

in real time environment. These techniques facilitates in a nondestructive analysis

without sample preparation for the extracting physical and chemical composition of the

sample and for the measurement of critical processes and attributes of the sample.

NIR Spectroscopy

NIRS is a low cost, non-destructive and fast method for the analysis of powder

ingredients as well as SPPs in the pharmaceutical industry. It can describe physical

properties of the sample along with some other properties such as hardness, presence

of moisture, dissolution rate, particle size and compaction force. NIRS is widely being

used in pharmaceutical industry to replace traditional time consuming, destructive,

liquid chromatography techniques or wet-testing methods used for the analysis for the

medicines [42]. No sample preparation is required for NIRS. According to [16], NIRS

can be used for Microstructure as well as Macro chemical properties of the tablets.

Through macro chemical properties, analysts can determine the active concentration of

the tablet ingredients. Microstructure defines the distribution and size of the

components (APIs and excipients) within a tablet. NIRS is recognized for the analysis

17

of raw material, process monitoring and quality control of pharmaceutical products

[12].

Roggo et al. [27] reviews the NIRS and chemometric techniques used in the

pharmaceutical industry for the analysis of solid, liquid and biotechnological

pharmaceutical products. NIR Spectra can provide information about various physical

parameters of pharmaceutical product like hardness, compaction force, particle size,

dissolution rate etc. These physical parameters can be obtained from tablets as well as

powders. In another research, Morisseau and Rhodes [26] uses different regression

models such as Partial Least Square (PLS) and MLR along with NIRS to find hardness

of the tablets. NIR accuracy results highly depend on the drug products and their

formulation.

The key parameters of a product, known as polymorphs can change the dissolution

properties of the final drug. Ensuring the correct polymorphic form of a drug is very

important, as polymorphs of a drug can be helpful for the identification and detection

of counterfeits. This confirmation can also be done through NIRS. Water is another key

compound of the pharmaceutical drugs that ensure its stability. Moisture determination

from the drugs is one of the initial tasks of the NIRS. NIRS can be used to determine

the water components from the gelatin capsules [25].

According to Jamrógiewicz [43], along with its many advantages, there are also some

disadvantages. One of them is that, NIRS is less suitable for the direct quantity

determination from the aqueous forms of pharmaceutical ingredients. In another

research, in-line NIR data was used to predict some properties of SPPs that represents

its quality such as hardness, particle size and absorbency [44]. Chalus et al. [45] uses

Wavelet Transformations (WT) along with ANN on NIR spectra to determine the APIs

of the solid tablets. Wavelet transforms are efficiently used to reduce the dimensionality

18

of the large data and to extract relevant information from it. They compared their results

with PLS regression applied on the same raw data and resulted that wavelet coefficients

used with ANN is a better choice. Svensson et al. [46] use NIR based Chemical imaging

along with 2D wavelet filters for the estimation of texture-based difference between

pharmaceutical tablets. In another research Dowell et al. [47] use NIRS along with PLS

to differentiate between counterfeit and genuine Artesunate antimalarial tablets.

Bleye et al. [48] provides a review on the techniques used for NIRS method validation

in pharmaceutics. Shah et al. [49] also discusses the applications of NIRS for the

analysis of pharmaceuticals. Rodionova and Pomerantsev [50] discusses the use of

NIRS for the detection of counterfeits drugs. The Soft Independent Modelling of Class

Analogy (SIMCA), Principal Component Analysis (PCA) and CA were used for

classification. He also reviews the work done by different authors in the same field. In

another research by Rodionova et al. [51], NIR spectrometry was used along with PCA

for the classification between genuine or counterfeit drugs.

Storme-Paris et al. [52] concluded in their research that the use of appropriate

chemometric methods along with NIR spectra proves very helpful in the identification

of counterfeit drugs. They said that the closely related spectra could be classified using

supervised classification algorithms while the samples having different spectrums can

be classified using unsupervised algorithms. Another research using NIRS was

performed for the screening of Viagra tablets and their counterfeit versions by

Vredenbregt et al. [53]. The proposed approach can perform four different tasks. It can

verifies the homogeneities of the batches, distinguish between counterfeit and genuine

versions of the product, screen the active contents of the product from its excipients and

lastly it can identify that the similar sample is analyzed previously or not.

19

Candolfi et al. [54] provides a comparison between different classification techniques

used with NIRS in the analysis of pharmaceutical products. The comparative analysis

was performed by using tablet and capsule datasets. Three classification algorithms

were used in the analysis i.e. K-Nearest Neighbors (KNN), Linear Discriminant

Analysis (LDA) and Quadratic Discriminant Analysis (QDA). The research concludes

that the use of LDA classification algorithm for the analysis of tablets and capsules

results better than the other two. PCA was used as a feature reduction algorithm in

combination with LDA for classification.

Clarke [55] explains the importance of the use of NIR microscopy for the analysis of

pharmaceutical products in his research. Using NIR microscopy, we can analyze the

samples chemically and make a judgment about how well the ingredients are mixed

with each other. PCA and PLS are used for the analysis purposes in this research. IR

diffuse reflectance spectroscopy with acoustic microscopy was used to analyze the

thickness of the coating of the tablet by Bikiaris et al. [56].

Boiret et al. [57] explains the use of NIRCI along with 3D visualization to monitor the

formulation during the development process of the tablet. Another research was

conducted for reviewing the use of mid-IR spectroscopy in the field of pharmaceutics.

Authors illustrate that mid-IR spectroscopy can be used to extract various useful

information from the pharmaceutical samples. It can be used for the identification and

explanation of the structure of the sample. Another application of this technique can be

the characterization of the polymorphs and amorphous forms of the sample [58].

Reich [59] discusses the two non-destructive analytical techniques i.e. NIRS and NIRS

imaging. They can be used for both quantitative as well as qualitative analysis. This

review focuses on five aspects of the both techniques; basics of the NIR and

chemometric based data processing, qualification and identification of raw materials,

20

analysis based on intact solid dosage forms, process monitoring and control and lastly

the regulatory issues.

Blanco and Alcala´ [60] proposed an NIRS based approach for the analysis of intact

pharmaceutical products. This analysis can provide information about the harness of

the tablet and about the API and their uniformity level in tablets. PLS1 calibration

model was used for quantization. This model was created using laboratory calibration

samples of tablets. The NIRS based method is simpler for the quantification of the APIs

as the calibration set provides variability in production samples. Sulub et al. [61]also

used five different NIRS for the analysis of pharmaceutical tablets in order to validate

their content uniformity. Robust multivariate calibration transfer algorithms were used

in this research.

Process analytical techniques can be used to monitor three different levels of the

tableting process namely blend homogeneity, content uniformity and coating thickness.

Moes et al. [62] proposed an NIRS based analytical method for monitoring these three

levels. A calibration free blend homogeneity was estimated using diode array

spectrometer. For the estimation of content uniformity and coating thickness, authors

have used Fourier-transform spectrometer with calibration-based models.

Li et al. [63] measure blend and content uniformity using semi-quantitative reflectance

NIR. The authors state that three factors are most important to judge the applicability

of the proposed method. (1) Identifying the API from the NIR spectrum, (2) Spectrum

strength and (3) relationship between API and NIR spectrum. The authors also state

that this approach can be used in early stages of the formulation process and is able to

analyze multiple batches at the same time. Another research by the Li et al. [64]used

PCA for the prediction of content uniformity from the NIR based spectra of the

21

pharmaceutical tablets. Some other researches that used NIR for the estimation of

content uniformity of the pharmaceutical products are [65, 66].

Raman Spectroscopy

Raman is another advantageous technique used for the analysis of the pharmaceutical

products. Analysis of the samples require no sample preparation, so in fact it is a non-

contact and non-destructive technique. Raman is very suitable for the analysis of

samples, ranges from microscopic amount (<1 µm) to centimeters and can also provide

spectra of small changes in chemical structure. It is equally suitable for solids as well

as samples, in aqueous materials. Uniformity of the sample materials can be analyzed

using Raman spectra. In Raman Spectroscopy (RS), a laser source of visible IR and

monochromatic radiation are used for the analysis of the samples. It is a non-destructive

method so can be used for the analysis of bulk and final products directly in their

packaging. RS can also be used for online monitoring of the drug’s quality and require

minimum trained personnel. RS can be helpful for identification of raw material,

quantity determination of APIs and screening of polymorphs [67]. Vankeirsbilck et al.

[68] explains RS and its applications in pharmaceutical industry in detail. According to

the researcher, RS can be considered as one of the more influential techniques for the

analysis of pharmaceuticals.

Feng et al. [69] presents a technique for the identification of counterfeit drugs using

portable Raman spectroscopy. Extracted spectrums were analyzed using Local Straight

Line Screening (LSLS) and PCA and gave an accuracy of 96.35%. In another research,

Li et al. [70] uses Raman Spectroscopy for the classification of Azithromycin (AZM)

tablets manufactured by four different manufacturers. Classification was performed

using four different classifiers named Support Vector Machine (SVM), Bayes classifier,

K Nearest Neighbors (KNN), and Partial Least Squares Discriminant Analysis (PLS-

22

DA). Among these classifiers, PLS-DA provides 80% accuracy using full spectra and

100% using partial-spectrum. Romero-Torres et al. [71] proposed the use of RS to

examine the variability of tablets coating and later on in another research; they uses RS

to estimate the tablet coating thickness [72]. Muller et al. [73] use RS to monitor the in-

line active coating process.

Gao et al. [74] uses RS for the analysis of expired medicines and compare results

achieved using different classification and chemometric techniques. PLS-DA, KNN

and SVM are the three classifiers used for the comparison in this research. Data

preprocessing was performed using Savitzky–Golay algorithm, first derivative (FD),

second derivative (SD) and max-min normalization (MN). The comparison results that

an average 96.80% accuracy is achieved using SVM.

RS along with PCA and Hierarchical Cluster Analysis (HCA) was used for the

classification between genuine and counterfeit Viagra® tablets by Veij et al. [75].

Another research by Eliasson and Matousek [76] presented a new and improved method

that can be used for the identification of counterfeit pharmaceutical products such as

capsules and tablets. The proposed approach named as spatially offset RS (SORS) can

replace the conventional Backtracking based RS. This approach can be used for the

identification of these products even within their packaging. The conventional

backscattering RS normally fails in identifying the product that are still packed because

of the fluorescence of the packaging material. This packaging material can be the plastic

container, capsule shell or tablet coating.

Ricci et al. [77], proposed a new approach for the characterization between genuine and

counterfeit Artesunate tablets. The proposed approach is a combined version of SORS

and Attenuated Total Reflection Fourier Transform Infrared (ATR-FTIR). The

23

chemical composition of the tablet can be extracted by VS techniques however; the

combined approach can effectively analyze the composition from the overall tablet as

well as from surface only.

Zhang et al. [78] performed a comparative study based on the use of different

multivariate analysis techniques on Raman imaging for the analysis of pharmaceutical

tablets. Direct classical least squares (DCLS), multivariate curve resolution (MCR),

PCA and cluster analysis (CA), are the four different multivariate analysis techniques

that are used in this study. The comparative results are based the multivariate analysis

of the Raman data collected from the 400µm × 400µm area of the surface of a model

tablet. According to the authors the PCA, MCR and CA are suitable for analysis when

the chemical composition of the tablet is unknown, as these techniques do not require

any prior knowledge about the sample. The analysis is completely based on the input

sample in these techniques. While, on the other hand, DCLS requires a reference

spectrum for the analysis purpose. Relative quantitative information about the spectra

can be extracted through DCLS based on the reference. The authors also state there are

also some cautions that should be focused while performing quantitative analysis. The

use of preprocessing techniques is essential to reduce noise but in some situations when

the effects of noise dominate over the signal then the qualitative analysis should be

preferable instead of quantitative analysis.

In another research, authors provide a review based on the use of RS through a

microscope for the extraction of depth and lateral chemical maps of the samples. Bulk

RS data can be processed using different chemometric techniques for the analysis of

the pharmaceutics. According to them, the mapping data extracted from tablets using

RS can effectively be used to determine the distribution of the APIs [79].

24

Hédoux et al. [80] discuss the contribution of the low-frequency RS in order to

investigate and detect the small crystalline materials. It can also be used to expose the

ambiguous polymorphic and non-polymorphic materials and helps in studying their

stability and characteristics.

Another research based on the noninvasive analysis of the pharmaceutical capsules and

tablets can be performed using transmission RS geometry. The proposed approach can

provide bulk information about the contents of the product by reducing surface

fluorescence signals. Transmission RS can also provide better results as compared to

traditional backscattering RS with high specificity, speed and ease in development [81].

Strachan et al [82] states that RS is an analytical approach for both solids and materials

in the aqueous environment. The review also describes that different multivariate

analysis algorithms can be used to overcome the quantification problems raised due to

poor peak resolution of the spectrum. According to the research, RS can also be used

for the complex pharmaceutical products such as suspensions and microspheres.

O’Connell et al. [83], presented based on the discrimination between target substance

and different excipients for the identification of illicit drugs. RS was measured from the

sample and then preprocessed using FD and normalization techniques. The analysis

was performed using PCA, SVM and Principal Component Regression (PCR.). The

results show that SVM outperform from all other algorithms.

2.1.3 Imaging Techniques

Imaging is also used for the examination and classification of the tablets. High-

resolution cameras can be used to capture images of the sample SPPs. Information that

is more detailed can also be captured from the samples by using microscopic cameras.

It is a non-destructive, less expensive and simple approach based on different IP

techniques like IE, Segmentation, Edge and Contour detection and Texture analysis etc.

25

Segmentation of grayscale tablet images using adaptive thresholding and

morphological operations is used for the tablet identification that is also known as pill

recognition. Andreas et al. in their researches [84, 85], performed classification using

Euclidean Distance on a feature set based on size, shape and color, and the results

describe that the most dominant feature from these three is ‘size’. Ramya et al. [86]

used template matching along with a series of IP techniques to detect broken tablets

from blister packaging.

Špiclin et al. [87] performed inspection of imprinted tablets using image registration on

an image database of different defective and non-defective tablets. They used three

registration methods in this research: direct matching of pixel intensities, principal axis

matching and circular profile matching. Comparative analysis shows that circular

profile matching is more powerful registration technique of visual inspection of the

tablets. Bukovec et al. [88, 89] performed two studies on the comparison of geometrical

and statistical methods for visual inspection of tablets using Receiver Operating

Characteristics Analysis. Geometrical features are based on imprinted shape while on

the other hand statistical features are based on tablet surface statistics. The proposed

inspection method can identity five types of defects: spot, deboss, emboss, crack and

dot. Results show that the features extracted from the statistical methods are better than

the geometrical methods for the tablet inspection.

In another research, statistical textural features are used for the classification between

defective and non-defective solid tablets. These features are extracted from microscopic

images of the surface of the tablets [90]. Možina et al. [91] provides an automated

technique for visual inspection of the imprints of the solid pharmaceuticals. Lee et al.

[92] also provide an imprint based automated method for matching and retrieving illicit

pills. Edge localization and invariant moments were used a feature vector for matching.

26

Yu et al. [93] use content-based image retrieval technique to develop an online solution

to drug tablets retrieval. Signature features are used to extract shape and Gabor features

for imprint mark from an image of a tablet. A research conducted by Jung et al. [94]

describes the use of image processing and statistical analysis for the detection of

counterfeits solid tablets. Image acquisition was performed using a high-resolution

VSC 5000. Different morphological operations segments the tablet image from its

background. RGB color components of the images are used to build a statistical model

for the detection purpose. Bhattacharyya distance measures are used for the

discrimination between genuine or counterfeit tablets.

2.1.4 Spectral Imaging Techniques

Spectral Imaging (SI) techniques are another type that can be used for the analysis of

solid form of dosages. These techniques provide detailed information about the

concentration and distribution of the drug ingredients. These are combination of both

vibrational spectroscopic techniques and digital image processing. SI involve two major

techniques:

a) Hyperspectral Imaging (HSI)

b) Multispectral Imaging (MSI)

Chemical Imaging (CI) is also used for the analysis of SPPs. The combination of IP

along with any of the VS techniques is known as CI. CI is used to capture spatial as

well as spectral information from an object. Initially it was developed for remote

sensing but recent researches proved that it could be used for nondestructive analysis

of pharmaceutical products [95]. HSI is related to MSI, the basic difference between

both of them is the number of bands or the type of measurements.

27

De Juan et al. [96], illustrates the benefits of using spectroscopic imaging techniques

along with different chemometrics for the analysis of pharmaceutical products. The

merger of these two can be helpful in extracting local as well as global information

from the chemical components of the surface area. It can be used to estimate

homogeneity of chemical components, detecting impurities from the sample and

monitoring process

2.1.4.1 Hyperspectral Imaging

Hyperspectral sensors are used to collect information of each spatial position as a set of

images. Each of the image is based on spectral band range means each pixel in the

image contains a spectrum of that specific position [95]. Chemical Images are three

dimensional blocks of data, based on one wavelength and two spatial dimensions.

Chemical Images can be formed by combining either NIR or Raman Spectroscopy with

digital imaging. They both can be used for the analysis of pharmaceutical dosage forms.

Therefore, in general CI can be NIR-CI and Raman-CI. Both of them are used for the

analysis of raw ingredients of the drugs, drug development process monitoring and

quality control.

NIR-CI

Near-infrared chemical imaging (NIR-CI) is an emerging technology as compared to

simple NIR spectroscopic technique in pharmaceutical industry. NIR-CI is used for the

prediction of APIs and excipients concentrations from the solid pharmaceutical dosage

forms [97]. In another research NIR-CI was used for the detection of counterfeit

pharmaceutical tablets, where no prior knowledge of the composition of sample is

required [98]. NIR-CI is also used to assess content uniformity from the batch of tablets.

Content uniformity was evaluated by applying different quantitative algorithms to

28

global hyperspectral image of ten tablets [99]. Another recent research demonstrates

the use of single point NIRS along with NIR-CI and statistical variance analysis for the

detection of counterfeit tablets [100]. NIR-CI is also used for the quantification of

coating thickness of the tablets and their chemical structure of the tablet core and

coating [101]. High throughput quality analysis is highly required in pharmaceutical

industry. NIR-CI can also be applied to perform analysis on multiple sample tablets at

a time even if they are packed. This results in fast and nondestructive identification of

APIs and excipients. Hamilton and Lodder [102] uses HSI for the analysis of

pharmaceutical medicines to compare the performance of HSI over HPLC and

concludes that HSI is more accurate.

In another research, Gowen et al. [95] performed non-destructive assessment of the

pharmaceutical tablets using VS along with various Image Processing (IP) techniques.

The image created from the combination of digital imaging with either Raman

Spectroscopy or Near-Infrared Spectroscopy are known as Chemical Image. From

different researches [99, 101, 103-105] , it is found that chemical imaging can also be

used to monitor the development process and quality control of the pharmaceutical

tablets. Puchert et al. [100] uses Near Infrared Chemical Imaging (NIRCI) for the

identification of counterfeit medicines. Sacré et al. [106] present a detailed review on

VS based hyperspectral imaging for the analysis of pharmaceuticals. This paper also

provide detailed information about the chemometric techniques used for pre and post

processing of the data. In a research, the author Amigo and Ravn [107] tried to avoid

calibration required for the quantification of major and minor components of the

pharmaceuticals. Various methods have been tested using NIRCI for this purpose and

concluded that Multivariate Curve Resolution (MCR) provides reliable results. Franch-

29

Lage et al. [108] uses NIR based HSI for the surface based assessment of the

distribution of APIs and excipients in the pharmaceutical products.

Carneiro and Poppi [109] use NIR imaging spectroscopy to study the distribution of

API and excipients in the spironolactone tablets. Concentration maps against each

compound were obtained using the Interval partial least squares model. These maps

were created by quantifying API and excipients at each pixel. The results indicated that,

the research is helpful for the quantification of compounds at each pixel level. In

another research by Palou et al. [110], the nondestructive analysis of pharmaceutical

products was performed to determine the distribution and concentration of the major

and minor components of the products. The calibrations models were build using PLS.

In another research [111], Super-resolution was used to improve the performance of

NIRCI to determine the quality of the pharmaceutical solids.

Osorio et al. [112] used NIRCI to characterize the pharmaceutical powder blends. A

Science-Based Calibration (CBS) chemometric method was used in this research. This

method creates calibration model based on pure spectra of the component. CBS helps

in characterizing blends by creating concentration maps. CBS does not require large

number of samples to create a calibration model and that is the main benefit of this

algorithm upon conventional methods like PLS or PCA.

Lyon et al. [113] studied NIRS imaging for the assessment of the quality i.e. the blend

uniformity of the pharmaceutical products. High contrast NIR images of a tablet were

acquired using array detector technology. These experimental tablets were based on

five levels of blending of APIs i.e. from well blended to un-blended. The results were

compared with those acquired from simple NIRS. The authors concluded that spectral

imaging based approach could clearly differentiate between all five levels of blending

qualitatively as well as quantitatively.

30

Lee et al. [114], proposed a possible approach for the measurement of content

uniformity from multiple tablets at the same time. This approach is based on NIR-CI.

A field of view of the size 59.5mm x 47.5mm was used for data extraction from a total

of twenty tablets simultaneously. Each tablet consists of 3000 pixels where each pixel

is of 186µm by 186µm of the sample area. The results of the proposed approach were

compared with conventional UV method. The authors concluded that the results from

both of the approaches are of the same accuracy. They also stated that the location

variation of the tablets in the field of view did not affect the performance of the

proposed approach.

Westenberger et al. [115]performed a comparison of traditional and non-traditional

analytical methods to estimate the quality of pharmaceutical products that are available

on internet for sale. The HPLC was used as a traditional method in comparison with

NIRS, NIR imaging and Thermogravimetric methods. The comparison of results

describe that the use of non-traditional methods effectively highlight more

characteristics as compared to HPLC.

Gendrin et al. [116]investigate the feasibility to use NIR-CI in order to quantify APIs

and excipients available in pharmaceutical products. The chemical images were

captured with two pixel sizes 10µm/pixel and 40 µm/pixel. Two preprocessing

techniques were applied namely SNVC and SDC. Concentrations were extracted using

PLS2 and multivariate CLS. The comparison of results indicate that the use of 40

µm/pixel with SNVC as preprocessing and PLS2 can be a better combination for

prediction of API contents. Another research by Li et al. [117]proposed that the CI

based on NIR can be used to estimate API particles/domain. The proposed approach

can effectively be used to evaluate the blending behavior of the APIs in the process of

their formulation.

31

The effects of the particle size of the extra granular tartaric acid on the uniformity of

the BMS-561389 tablets were estimated by Hilden et al[118]. The relation between the

two was estimated using NIR based chemical imaging.

Raman-CI

In pharmaceutical industry, Raman-CI can be applied to find particle size estimation,

minor component detection and tablet characterizations. For the analysis purpose, data

at each pixel of the sample is compared to a standard spectrum of the sample that has

APIs and excipients to its correct level [12]. Sasic applied Raman-CI to capture

spectrums from the drugs for the detection of low content API pharmaceutical

formulations. Author reported that PCA is more helpful for such kind of detections

[119].

Another research conducted by Doub et al. [103] focuses the application of Raman-CI

for ingredient specific particle size characterization of nasal spray formulation. It is

suitable for identification of APIs as well as placebo. Similar chemical compositions in

drugs can effectively be described using Raman-CI.

CA is used for the segmentation of images, which enable the visualization of distinct

regions for the characterization of solid dosage by Bell et al. [120]. Vidal and Amigo

[121] illustrate the essential preprocessing techniques required to process hyperspectral

images before starting actual analysis. These preprocessing techniques can handle

various issues like image compression, identification and removal of background,

spiked points and dead pixels etc.

Sasic [122] used the combination of NIR and Raman global illumination mapping

devices to capture chemical images of the pharmaceutical granules. The main purpose

of this techniques was to measure how well the APIs and excipients are mixed with

32

each other. Both of the devices were used to analyze randomly distributed 50 – 100

granules through a microscopic slide of 3.5mm x 3.5mm. Spectra acquired from both

of the instruments easily characterize the granules. The comparison of the results show

that Raman global illumination provides more comprehensive information about the

chemical structure of the sample.

2.1.4.2 Multispectral Imaging

MSI systems use MSI sensors, which can collect spectra from less than 20, generally

noncontiguous spectral bands [123]. These bands can detect information in a specific

combination from the desired region of the spectrum. Unique combination of spectral

information can be achieved by varying number and position of bands within MSI

system [124]. Another research uses MSI for the determination of moisture and salicylic

acid from a single packaged Asprin tablet. They conclude that MSI offers high-speed

advantage approximately 30000 times over HPLC. They also concluded that MSI of a

field of tablets is almost 1000 times faster than spectrometry of a single tablet [125].

Many other researches [126, 127] are available in literature that uses MSI in

pharmaceutical industry.

2.2 Preprocessing Techniques

Preprocessing is an essential step before applying any kind of analysis techniques on

the acquired data. Spatial and spectral information gathered from the sample provides

knowledge about its physical structure, surface information and chemical composition.

However, multiple external factors cause systematic variations between spectra or

image. In case of SPPs, different nonchemical factors may get included in the data after

acquisition process. These factors can be scattering effects due to surface

inhomogeneity, specular reflections, random noise, interference from external light

33

sources etc. Different preprocessing techniques are required to remove such

nonchemical biases from the spectral and spatial information such as [128]:

Smoothing

Normalization

Standard Normal Variate Correction

Multiplicative Scatter Correction

Savitzky-Golay Derivative Conversion

Image Enhancement

2.2.1 Smoothing

Smoothing algorithms are useful in both IP and Signal Processing (SP) in order to

prepare images or signals for further processing by reducing noise. Different types of

noise may exist in images salt and pepper noise. It is a sparse light and dark disturbance

in an image in such a way that the color of the noisy pixels will have no any relation

with the pixels of the original image. It is like light and dark spots on the image. Another

type of noise is known as Gaussian noise in which, each pixel of the image is slightly

changed from its original pixel value. Histograms can be used to visualize the normal

distribution of noise. In IP mostly different convolution or filtering based algorithms

are available that can be used to reduce noise from images. Low pass filtering is a

technique used to make an image smooth. A low pass filter retain the low frequencies

in the image by attenuating the high frequencies [129]. Some other filters are also used

in IP e.g. weighted average filters, binomial filters, mean filter, median filter.

In SP, the most common technique used to remove signal noise is known as moving

average. This algorithm generates a smooth signal as output, which is based on

equidistant points. Each smoothed point (Yk)s of the output signal consists on the

34

average of filter width. Usually, filter width is an odd number of the consecutive 2n+1

points (n = 1, 2, 3... n) of the raw data, where the raw signal consists of

Y1, Y2, Y3 … Yn. According to Efstathiou [130], the formula for applying moving

average to smooth a point is given below.

(𝑌𝑘)𝑠 = ∑𝑌𝑘+𝑖

(2𝑛 + 1)

𝑖=𝑛

𝑖=−𝑛

Equation 2.1: Formula for Smoothing a signal

Smoothing level of the signal depends on the filter width, the greater the filter width

the more strong the smoothing effect will be. Signal to noise ratio can also be increased

by applying the smoothing algorithm multiple times. However, this is a lossy technique

to smooth the signal. Each time the application of this algorithm to the data may results

in loss of first and last n points from data.

2.2.2 Normalization

In IP, the process of changing the range of the intensity values of each pixel to enhance

contrast level of the image is known as normalization. This is also known as contrast

stretching or histogram stretching. Histogram equalization is used for contrast

stretching. However, in SP, normalization is also known as dynamic range expansion.

It is used to enhance short wave spectra. The dynamic range expansion converts an

image or signal into a more consistent form.

2.2.3 Standard Normal Variate Correction

Standard Normal Variate Correction (SNVC) is a well-known algorithm used for scatter

correction of spectral data (Mostly for NIR). The application of this algorithm to the

35

input spectrum results in reducing the spectral noise and elimination of background

effects. Baseline shifting or tilting can occur in data due to the chemical composition

of the sample and the use of the variable length of the spectral path. These issues

normally occur at longer wavelengths. SNVC is used to either reduce or eliminate such

type of scatter corrections. The basic working of SNVC is same as of MSC. The

difference between both is SNVC performs baseline and reference corrections

consecutively. According to Rinnan et al. [131], the formula for SNVC is:

𝑋𝑐𝑜𝑟 = 𝑋𝑜𝑟𝑔 − 𝑎0

𝑎1

Equation 2.2: Formula for SNVC

Here Xcor is the corrected spectrum, Xorg is the original sample spectrum, a0 is the

average of the Xorg and a1 is the standard deviation calculated from Xorg.

2.2.4 Multiplicative Scatter Correction

Multiplicative Scatter Correction (MSC) is another commonly used technique to

remove imperfections or unwanted scatter effects from the sample spectrum. In

principle, the MSC was originally developed to apply only to that part of the spectrum,

which contain no chemical information, but in practice whole spectrum is used. This

technique is useful when the chemical difference between samples is small. The

application of MSC is performed in two steps. In step one the correction coefficients

are estimated using the formula [131]:

𝑋𝑜𝑟𝑔 = 𝑏0 + 𝑏𝑟𝑒𝑓,1 . 𝑋𝑟𝑒𝑓 + 𝑒

Equation 2.3: Formula for the estimation of correlation coefficients of MSC

In step two, the correction is performed on the spectrum using the formula [131]:

36

𝑋𝑐𝑜𝑟 = 𝑋𝑜𝑟𝑔 − 𝑏0𝑏𝑟𝑒𝑓,1

Equation 2.4: Formula for the corrections in MSC

Here, Xorg is the sample spectrum, Xref is the reference spectrum that is being used to

preprocess the whole dataset, b0 and bref,1 are the scalar parameters, e is the unmolded

part of the original spectrum and Xcor is the corrected spectrum. The scalar parameters

will be different for each sample being preprocessed.

2.2.5 Savitzky-Golay Derivative Conversion

Savitzky-Golay Derivative Conversion (SDC) is an algorithm also used to implement

smoothing. SDC is better than moving average algorithm as it is based on polynomials

[132]. An individual polynomial is used to fit a filter width (also called window) around

each data point in the spectrum. After fitting, a polynomial curve to the input data point,

a central point is calculated which is treated as the newly smoothed point. However,

this is also a lossy method. SDC algorithm strongly depends on two things, one is the

order of the polynomial and the other one is the size of the window. The use of lower

order of polynomial and larger size of window may result in a more smoothed signal

[130].

2.2.6 Image Enhancement

Image enhancement (IE) techniques are basically used to prepare images for a better

perception of the human viewers. The basic purpose of its application is to modify the

attributes of the input image in such a manner so that they can act better for a specific

task [133]. IE can be applied on images in two domains; spatial and frequency. Spatial

domain enhancement techniques are those, which can be directly applied to pixels of

37

the images. While on the other hand frequency domain is the one, in which images must

be converted to Fourier Transform (FT). All the enhancements should be applied on the

FT of the image. After applying enhancements, the inverse FT is applied to get back

the image form.

Maini and Aggarwal [134] in their research illustrates that multiple different

transformation techniques are available in the literature that can be used to enhance the

images. These techniques can be logarithmic, power law, piecewise linear and intensity

transformations. Some other techniques can be gray level slicing, image negation,

Histogram matching and equalization. Different morphological operations can also be

applied on the images to improve their visibility. These can be erosion, dilation,

opening and closing.

Color or grayscale mapping with intensity scaling can visualize compositional contrast

between pixels in an image. To enhance the contrast level between distinct regions of

the sample, Image Fusion can be implemented. Image Fusion combines two or more

images at different wavelengths to create a new one [135]. Some straightforward

mathematical operations can also be applied on images to combine them, such as

addition, subtraction, multiplication and division.

2.3 Feature Extraction Techniques

Extraction of useful information from the data (both spatial and spectral) gathered from

the sample being analyzed; require different advanced IP and SP techniques. Various

methods exist for the extraction of chemical, spatial and physical information hidden in

these spectrums and images.

Spectral analysis is helpful to determine different components, which are present in the

sample, their concentration and distribution. This analysis can be performed by

38

evaluating intensity at a single wavelength, ratios of intensities at different wavelengths

and the integrated intensity (area) under a spectral peak. After spectral analysis, it is

necessary to reduce the number of available variables, by keeping those variables that

have a maximum variation in their data and discarding all other ones. This can be

performed by using multivariate chemometric method [128].

Different types of feature extraction techniques are available in the literature to extract

useful information from images or spectral data. Some of them are described below.

2.3.1 Low Level Feature Extraction

The techniques that are helpful in extracting features from images without having any

information about the shape are known as low-level feature extraction techniques. Edge

Detection (ED) is most common extraction method of this category. ED provides a line

drawing of the input image [136, 137]. Different ED operators are available in literature

that can extract refined edges from an image, e.g. Prewitt, Sobel, Canny, Laplacian and

Marr–Hildreth operators [138].

2.3.2 High Level Feature Extraction

Another group of the technique used for feature extraction from images is known as

high-level feature extraction techniques. This set of algorithms is based on shapes

extraction along with some other information like their position, size and orientation.

Some basic geometric shapes (circle, rectangle and squares) are used for the extraction

of complex shapes from images. Different algorithms are available for extraction of

shape features from images e.g. Thresholding, Image Subtraction, Template Matching,

Hough Transform, Generalized Hough Transform and Snakes [138].

39

Fourier Descriptors are also available to extract shapes from an image. These

descriptors also help in extracting some other information about shape e.g. its area,

perimeter, centroid, shape layout etc.

2.3.3 Textural Feature Extraction

The texture of a surface can be defined using different types of features, which can be

extracted from the gray level distribution of the image intensity. Statistical feature

extraction methods are extensively used for the texture analysis [139-143].

2.3.3.1.1 Gray-level Co-occurrence Matrix

Gray-level Co-occurrence Matrix (GLCM) is one of the statistical feature extraction

methods, which can be used to define the texture of a surface. It is based on the spatial

relationship between pixels. Texture characterization can be performed by calculating

how often pairs of pixel with specific values and in a specified spatial relationship occur

in an image [144]. Thus, we can say a second order statistics of grayscale histograms

are used in this method [142].

2.3.3.1.2 Histogram Features

Histogram features are first-order statistics based features, used to represent surface

texture. According to Srinivasan [145], histogram-based features represent intensity

concentration on all parts of the image.

2.3.3.1.3 Run Length Matrix

Run Length Matrix (RLM) from an input gray level image is defined by a set of

consecutive, collinear pixels having same gray-level. The coarseness of a texture in a

specific direction can be captured using RLM [146]. RLM is a higher order statistic of

the grayscale histogram.

40

2.3.3.1.4 Autoregressive Model

The local interactions between image pixels are used in Autoregressive (AR) model.

The intensities of the pixels are based on the weighted sum of the input pixels. The AR

model is considered as more simplified and efficient model used for segmentation of

unsupervised segmentation of textures. The parameters defined by using the AR model

when implemented for image regions helps in texture discrimination [147].

2.3.3.1.5 Wavelet Transformations

In the field of texture classification and segmentation, Wavelet Transformations (WT)

is another feature extraction approach that can be used to characterize the texture.

Wavelet Coefficients (WC) extracted from the images, which are then used to compute

textural features. Different textures have different textural features if their frequency

spectrum is decomposed properly. These textural features include energy, entropy or

averaged l1-norm [144].

The use of WC as features, for the classification of any type of signals like audio, EEG

etc. is also very helpful. WT of spectra provide two main benefits, one is dimensionality

reduction and other is de-noising of spectrum. Cai et al. [148] discusses the usage of

Multiresolution Wavelet Transformation to reduce the noise intrusion and background

subtraction in RS, specifically in the domain of automated processing of large spectral

dataset and spectral imaging.

In another research, Li et al. [149] uses near-infrared diffuse reflectance spectroscopy

along with PLS regression and PCA for the identification and quantification of

azithromycin tablets. Continuous Wavelet Transformation was used for baseline

elimination in this research.

41

The computation process of WC includes decomposition of the whole signal into

multiple wavelets. Many wavelet functions are available for this purpose e.g.

Daubechies, Coiflets, Symlets, Discrete Meyer, Biorthogonal and Reverse

Biorthogonal. Signal is then projected on the chosen wavelet function for the

calculation of WCs. This result into two type of coefficients; detailed and

approximated. Therefore, the selection of decomposition levels and wavelet function is

very important. The output of the WT is a vector consisting of the final level of

approximation coefficients and all detailed coefficients calculated up to that level.

Figure 2.2 explains the basic approach for the decomposition and calculation of WC at

level two. Where ‘ai’ represent approximation coefficient at level ‘i’ and ‘di’ is the

detailed coefficient at level ‘i'.

Figure 2.2: Two level decomposition for the computation of WC

2.4 Feature Reduction Techniques

The process of optimally reducing the original feature space according to some defined

criteria is known as feature reduction. Not all the features extracted from feature

Level 2 decomposition

Level 1 decomposition

Input Signal

a1 d1

a2 d

2

d1 a

2 d

2

42

extraction phase has equal importance against some specific target concept. Feature

reduction techniques reduce the dimensionality of the feature set by removing

irrelevant, noisy or redundant features. The use of optimally reduced feature vectors

may enhance the efficiency, accuracy and processing speed of the classification or data

mining algorithms [150]. Multiple different algorithms are available in the literature for

this purpose. The details of some of these algorithms are given below.

2.4.1 Information Gain

Information Gain (IG) is a feature reduction algorithm based on entropy measure.

Entropy is an information theory measure used to characterize the purity of examples.

It is used to measure system’s unpredictability. If Y is a random variable, then the

entropy of Y can be calculated using the formula given below [151].

𝐻(𝑌) = ∑𝑝(𝑦) log2(𝑝(𝑦))

𝑦∈𝑌

Equation 2.5: Entropy calculation when Y is independent variable

Here, the marginal probability density function of Y is represented by p(y). A

relationship exists between two variables X and Y if they satisfy two conditions.

According to the first condition, the observed values of Y from the training dataset must

depend on the variable X. The other condition is based on their entropy measure i.e. the

entropy of Y according to the partitions based on X should be less than the entropy of

Y prior to the partitioning based on X. Therefore, the formula for entropy when Y

depends on X is:

43

𝐻(𝑌|𝑋) = ∑𝑝(𝑥)

𝑥∈𝑋

∑𝑝(𝑦|𝑥) log2(𝑝(𝑦|𝑥))

𝑦∈𝑌

Equation 2.6: Entropy calculation when Y is dependent on X

Here 𝑝(𝑦|𝑥) is the conditional probability of y given x. The variable X can also provide

some additional information about Y based on its entropy i.e. the amount by which the

entropy of Y decreases. It is known as IG and calculated by using equation 2.7.

𝐼𝐺 = 𝐻(𝑌) − 𝐻(𝑌|𝑋) = 𝐻(𝑋) − 𝐻(𝑋|𝑌)

Equation 2.7: Formula for IG

The IG calculated about Y by observing X should be equal to the IG calculated from X

by observing Y. Therefore, we can say that IG is a symmetric measure. One major

disadvantage of IG is that it more biased to the features having more values even when

they are information less.

2.4.2 Symmetrical Uncertainty

The Symmetrical Uncertainty (SU) provides the solution for the problem of IG i.e. the

more biasedness for the features having more values. The solution of this issue is

achieved by dividing the IG with the sum of entropy measures of X and Y [152], as

shown in Equation 2.8.

𝑆𝑈 = 2 𝐼𝐺

𝐻(𝑋) + 𝐻(𝑌)

Equation 2.8: Symmetrical Uncertainty formula

The SU always provides normalized values between [0, 1] because of the correction

factor 2. If SU results in 0, it means zero correlation between X and Y. While, on the

44

other hand, SU = 1 means highly correlated. The problem of SU is that it is more biased

towards the features having fewer values.

2.4.3 One-R

Holte [153] proposed an algorithm named OneR, based on rules for feature reduction.

An individual rule was created for each attribute of the training dataset. The rules

having minimum error is then selected for further processing. All the features based on

numerical values are treated as continuous by using a simple method for dividing the

range of values into multiple disjoint intervals. Missing values are handled by using

‘missing’ as a legitimate value.

2.4.4 Chi-Square

Chi-square (CS) feature selection algorithm performs ranking of features by calculating

chi-squared statistic for each class. CS calculates the degree of the dependency between

attributes and a specific class. According to Chatcharaporn et al. [154], the formula for

CS is:

𝑋2 = ∑∑(𝑂𝑖𝑗 − 𝐸𝑖𝑗)

2

𝐸𝑖𝑗

𝑐

𝑗=1

𝑟

𝑖=1

Equation 2.9: Formula for Chi-Square

Where Oij and Eij is the observed and expected frequencies respectively.

2.4.5 Gain Ratio

Gain Ratio (GR) ranks the attributes by compensating the bias for Information Gain

(IG). According to Chatcharaporn et al. [154] GR can be measured by:

45

𝐺𝑅 = 𝐼𝐺

𝐻(𝑋)

Equation 2.10: Formula for Gain Ratio

Where H(X) is entropy of X. The result of the GR is always between [0, 1]. GR = 1

means that X can completely predict Y, where Y is the variable to be predicted and

GR = 0 indicate no relation between X and Y.

2.4.6 Relief-F

Another statistical attribute selection technique used in this research is Relief-F (RF).

RF calculates weight for each feature using relationship between a feature and a specific

class to rank it [154]. This weight calculation is based on two types of nearest neighbor

probabilities. The first probability is calculated through two different classes with

different feature values and the other probability of weight computation is based on the

same class of two nearest neighbors with the same feature value [151].

2.4.7 Principal Component Analysis

PCA is another algorithm used to select relevant features without any loss of the useful

information. PCA can be used to avoid issues of over-fitting. PCA calculates new

variables from a large number of original variables by using projections. Therefore,

each new variable is based on the linear combination of the actual measurements. This

new variable contain information based on the whole data [155]. Different researches

[125, 149, 156, 157] use PCA for the analysis of pharmaceuticals.

2.5 Classification Techniques

For analysis and comparison, spectrums and images of the samples are compared with

reference data from the external library. Different similarity measures can be used like

46

Pearson’s correlation coefficients and Euclidean distance for this purpose [158].

Clustering is required for the identification of regions having similar spectral or image

characteristics, which provide information about chemical and physical properties of

the sample, their concentration and distribution. Clustering can be performed by using

unsupervised classification techniques, such as: K-means clustering and Fuzzy

clustering. These methods do not require any prior knowledge about the sample being

tested and helpful for the extraction of important features.

Some other supervised techniques can also be used for classification purposes. These

are known as, Supervised Classification methods and require prior knowledge about

the data. They use separate training and testing datasets for classification. Some

important supervised classification algorithms are Partial Least Squares, Artificial

Neural Networks, Naïve Bayes, K-Nearest Neighbors and Support Vector Machines.

The details of all these algorithms are given below.

2.5.1 Pearson’s Correlation Coefficient

Pearson’s Correlation Coefficient (PCC) is a measure to find similarity between two

objects. It calculates the linear association between two variables. It is also known as

the product moment correlation coefficient. Its value can be from [-1, +1] and

represented by r [159]. If the value of r is positive, it means there is a positive

correlation exists between the two variables (no fluctuations between the two). The

lower the value of r from +1, the greater the fluctuations exists between the data [160].

2.5.2 Euclidean Distance

Euclidean Distance (ED) is a distance-based measure to find out the similarity between

objects [161]. Hierarchical trees named dendrograms can be used to visualize these

distances. New objects are formed based on linking the objects having the smallest

47

distances. These newly formed objects are combined again in the same manner until all

the objects were linked. Wang et al. [162] presents a modified version of the original

ED to apply on images.

2.5.3 K-mean Clustering

K-mean clustering belongs to the field of SP for CA. It is used for vector quantization.

It helps in partitioning n observations into a k number of clusters. Each observation is

added to its relevant cluster based on the nearest mean. The formula for the k-mean

clustering is given in Equation 2.11 [163]. The k-mean clustering aims to partition n

observations (x1, x2, …, xn) into k (≤ n) clusters S = { S1, S2,…,Sk }. Each observation

can be a d dimensional vector. Observations are assigned to their relevant clusters if the

sum of square within a cluster is minimum.

𝑎𝑟𝑔𝑚𝑖𝑛𝑆

∑∑‖𝑋 − 𝜇𝑖‖2

𝑥∈𝑆𝑖

𝑘

𝑖=1

Equation 2.11: K-mean Clustering

Where, the mean of points in Si is represented by μi.

2.5.4 Fuzzy Clustering

The Fuzzy Clustering (FC) is based on fuzzy logic. In FC, every point does not

completely belong to a cluster but they have a certain degree of belongingness for a

cluster. The points lying in the center of a cluster may be in the cluster to a higher degree

than the points on the edge of the cluster. A set of coefficients from any point x

represents the degree of that point for being a part of the kth cluster. The degree of

belongingness is represented by wk(x). The centroid of a cluster is defined by

48

calculating mean of all points weighted by wk(x) for the c-mean FC as shown in

Equation 2.12 [164].

𝐶𝑘 = ∑ 𝑤𝑘 (𝑥)

𝑚 𝑥𝑥

∑ 𝑤𝑘 (𝑥)𝑚

𝑥

Equation 2.12: Centroid calculation for FC

wk(x) is inverse of the distance between x and the cluster center calculated in the

previous pass. Another important parameter is m that handles the assignment of weight

to the closest center. The degree of belongingness also depends on the parameter m. FC

also tries to minimize the intra-cluster variance but the results strongly depends on the

initial choice of weights. FC using c-mean can be used for the clustering of objects from

images.

2.5.5 Partial Least Square Discriminant Analysis

The Partial Least Square Discriminant Analysis (PLS-DA) is a supervised classification

technique that is based on PLS. It can be used when dimensionality reduction is

required. Latent variables are calculated for the classification of samples into different

groups [74]. The first step towards the implementation of PLS-DA is the application of

PLS regression model on the variables. These variables acts as indicators of the

classification groups. In next step, based on the largest predicted indicator variable, the

classification of the observations is performed [165].

2.5.6 Artificial Neural Networks

Artificial Neural Network (ANN) is an easy to implement classification algorithm

based on fewer parameters but slower in learning [166]. The ANN is a specific neuron

architecture based on a specific number of layers. Each layer further consists on a

49

number of neurons. The training of the network relies on an iterative process for the

adjustment of the weights related to input. This process results in an optimal prediction

of the sample data of the training set. Furthermore, this trained network can be used to

predict new unknown data [167]. Multiple algorithms exists for the implementation of

ANN. Perceptron learning algorithms are available to classify data, which are linearly

separable. Linearly separable means the instances of data can be categorized into their

correct classes by drawing a straight line or plane. When data is not linearly separable,

multi-layer neural networks (NN) can be used to categorize data into classes. Multilayer

NN consists of three layers: input, hidden and output. Input layer receives input, output

layer produces the result of classification and hidden layers helps in generating the right

output. Proper estimation about the size of hidden layers is very important as

underestimation can lead towards poor approximation, on the other hand, excessive

nodes results in over-fitting [168]. Wu et al. [169] use ANN with back propagation for

the classification between the drugs having different strengths. NIR spectra was used

as data in this research. The research also performs a comparison between training set

selection methods to choose the one that can produce best result when used in

combination with ANN. Kennard-Stone, D-optimal designs, Kohonen self-organized

mapping and on random are the four selection methods. The comparative analysis

describe that the Kennard-Stone is better than D-optimal designs. The performance of

Kohonen self-organized mapping is on third level and lastly the random selection

method.

2.5.7 Naïve Bayes

Naïve Bayes is a statistical learning algorithm that performs probabilistic classification

based on Bayesian networks [170]. Naïve Bayes performs training by estimating prior

and conditional probabilities from the dataset. Prior probability for a specific class is

50

calculated by dividing the count of training examples falling in that class by total

number of examples. On the other hand, conditional probabilities are based on the

frequency distribution of feature xi from the training data that belong to that specific

class [151]. Some important studies related to drugs using Naïve Bayes as a classifier

are [171-174]. NB is also suitable in the situation, when the training dataset is small in

number. It can easily estimate the classification parameters i.e. mean and variance from

the data [151]. According to Kotsiantis et al. [168], NB classification require short

computational time for training of the dataset.

2.5.8 K-Nearest Neighbor

K-Nearest Neighbor (KNN) is a simple but robust algorithm that can efficiently deal

with complex problems of classification. The classification of objects depends on the

majority votes of the neighbors [175]. This algorithm is based on two parameters i.e.

how many nearest neighbors must be considered while classification, it is denoted by

K, and the distance of features within a dataset to determine which data belong to which

group. The value of K must be a small positive integer. KNN was used for the

classification and analysis of pharmaceutical solid tablets by many researchers [70, 74,

90, 176].

2.5.9 Support Vector Machine

Support Vector Machine (SVM) uses linear equation built from the training data for

partitioning the dataset. Two main steps are involved in classification using SVM:

mapping and similarity measure. In the first step mapping of nonlinear data is

performed (input space to feature space) and in the second step, the kernel function is

used to measure similarity between feature vectors. It can handle large feature sets with

high accuracy [154]. Hou et al. [177] has used SVM models for the recognition of SH3

51

domain-peptide. Li et al. [70] used SVM with linear kernel function for the

classification of Raman spectra of Azithromycin tablet.

In another research, RS along with SVM was used to identify the pharmaceutical

tablets. The identification process was performed in two steps: identification of tablet

family and after that identification of the formulation of the tablet [178]. Some other

researches that uses SVM for the analysis of pharmaceutics includes [74, 156, 179].

SVM produces efficient results even the training data is not linearly separable [166].

Table 2.1 describes the comparison between various techniques that can be used for

quality assessment of drugs. These techniques are compared against different features.

Some of these techniques provide only spectral information about the sample while

others provide only spatial information. Multispectral and Hyperspectral imaging

techniques provide both spectral and spatial data of the sample. Both of these provide

much more detailed information about the sample being studied than any other. Some

of the techniques require sample preparation before the analysis that destroys the

sample so they are destructive. Drugs used in such kind of analysis cannot be used again

for any other purpose. Techniques that do not destroy the sample are known as non-

destructive techniques and are more appropriate when we do not want to waste the

sample. This table also provides comparison of these techniques against the time

required for the analysis, their processing complexities and the cost in terms of

machinery and work force.

According to the comparison, destructive techniques are more complex, time

consuming and costly, as they require sample preparation before analysis. XRD and

SEM are more suitable for SS drugs that are of crystalline form. NIRS and RS can be

used for solids and are nondestructive methods of analysis requiring no sample

preparation, but provide only spectral information of the sample.

52

On the other hand, MSI and HSI are also nondestructive methods of analysis but have

advantage over other techniques by providing both spectral and spatial data of the

sample. This allow application of image processing techniques along with different

classification methods for more detailed analysis. MSI is better than HSI as it require

less time for data processing because HSI contain data that are more redundant.

Therefore, it is more complex than MSI.

These chemical imaging techniques are also known as surface based techniques. Each

measurement captured from the penetration of the rays into the sample material

provides information about the surface of the sample. Homogeneity of the data captured

from the surface of the tablets represent its correctness. Analysis of the surface area of

the tablets can provide information about the correct shape, size, color, hardness and

dissolution. Homogeneous nature of the surface can also be used to determine

oxidization reaction of the APIs and excipients.

Table 2.2 provide a quick review and comparison between different researches in the

literature for the quality assessment of the drugs. NIR-CI is mostly used for the analysis

of solid medicines especially tablets. Analysis of tablets using NIR-CI provides

information about content uniformity, composition determination, identification of

counterfeits and tablet classification. This is based on both spectral and spatial data of

the medicines and mostly require no sample preparation. MSI can also be used for the

analysis of solid medicines especially tablets. Analysis through MSI can also be

beneficial even the medicines are in packaged form. Contrast enhancement, histograms,

binarization, noise reduction and gray scaling are commonly used image analysis

techniques. Classification process can be performed mostly using PLS, SVM, NB and

KNNs. PCA is the most common feature reduction technique. SDC, SNVC and MSC

are mainly used pre-processing techniques.

53

MSI systems can be used for acquisition of wavelengths representing multichannel

images of visible spectra known as RGB and going to IR wavelengths. IR region is

further classified into NIR, MIR and FIR.

54

Table 2.1: Comparison between various quality assessment techniques for drugs T

ech

niq

ue

Dosage

forms Applications

Sp

ectr

al

Info

rmati

on

Sp

ati

al

Info

rmati

on

Sam

ple

Pre

para

tion

Des

tru

cti

ve

(D)/

Non

-

Des

tru

cti

ve

(ND

)

Tim

e

Con

sum

pti

on

Com

ple

xit

y

Cost Disadvantage

HPLC Solids Raw ingredients

and final drug

testing

No No Yes D High Max High May lend to inaccurate

compound

categorization

MS Solids Detection of low

quantities in

compounds

Yes No Yes D High Moderate High Inability to

discriminate between

enantiomers, most

diastereomers, and salt

forms of drugs

SEM Solids,

SS

Determine

Particle

morphology and

size distribution

No Yes Yes D Mediu

m

Moderate High Characterize only

small area of a tablet

RGB

Imaging

Solids IE No Yes No ND Low Min Low Sensitivity depends on

detector device

55

Tec

hn

iqu

e

Dosage

forms Applications

Sp

ectr

al

Info

rmati

on

Sp

ati

al

Info

rmati

on

Sam

ple

Pre

para

tion

Des

tru

cti

ve

(D)/

Non

-

Des

tru

cti

ve

(ND

)

Tim

e

Con

sum

pti

on

Com

ple

xit

y

Cost Disadvantage

XRD SS Crystallinity

measurement,

Amount of API

determination

Yes No Little

/ No

Semi-D High Max High Cannot examine

solutions and non-

crystalline drug forms

NMR Liquids

, Solid

Crystallinity

measurement,

API and

excipients

interaction

investigation

Yes No Little

/ No

Semi-D Mediu

m

Moderate High More suitable for

liquid drugs.

NIRS Solids Monitoring final

quality of drugs,

identification of

organic

compounds and

Counterfeit drug,

Yes No No ND Low Min Low Low structural

selectivity

56

Tec

hn

iqu

e

Dosage

forms Applications

Sp

ectr

al

Info

rmati

on

Sp

ati

al

Info

rmati

on

Sam

ple

Pre

para

tion

Des

tru

cti

ve

(D)/

Non

-

Des

tru

cti

ve

(ND

)

Tim

e

Con

sum

pti

on

Com

ple

xit

y

Cost Disadvantage

quantitative

measurement of

API

RS Solids Crystallinity

measurement,

determination of

multi-

components,

Analysis of

Polymorphic

Forms

Yes No No ND Low Min Medium Not suitable for

Moisture analysis

MSI Solids,

Liquids

Tablet

identification /

composition

Determination,

Surface analysis

Yes Yes No ND Low Min Low

--

57

Tec

hn

iqu

e

Dosage

forms Applications

Sp

ectr

al

Info

rmati

on

Sp

ati

al

Info

rmati

on

Sam

ple

Pre

para

tion

Des

tru

cti

ve

(D)/

Non

-

Des

tru

cti

ve

(ND

)

Tim

e

Con

sum

pti

on

Com

ple

xit

y

Cost Disadvantage

HSI Solids,

Liquids

Distribution and

Identification of

counterfeit,

contaminated and

minor

components of

drugs,

Surface analysis

Yes Yes No ND High Max High Require large data

storage

58

Table 2.2: Comparison between different researches for the analysis of medicines R

efer

ence

Tec

hn

iqu

e

Dosage

Form

Type of

Processing

Sam

ple

Pre

para

tion

Features

Pre

pro

ces

sin

g

Fea

ture

Red

uct

ion

Seg

men

tati

on

/

IP

Cla

ssif

icati

on

Software

Sp

ectr

al

Sp

ati

al

[180] NIRS Tablets Whole Tablet

Uniform Content

Checking

Yes Yes No Mean

Centering

No No PLS - I PLS-IQ

[181] Mono-

chromat

ography

Solids Particle Size

Characterization

Yes No Yes No No Noise

Reduction,

Binarizatio

n,

Gray scale

Difference

Matrix

PLS -

[83] RS Solids Target substance

and Excipients

Discrimination

No Yes No FD,

Normaliz

ation

PCA No SVM,

K-NN,

Ripper,

NB, C4.5

Unscrambler,

MATLAB

59

Ref

eren

ce

Tec

hn

iqu

e

Dosage

Form

Type of

Processing

Sam

ple

Pre

para

tion

Features

Pre

pro

ces

sin

g

Fea

ture

Red

uct

ion

Seg

men

tati

on

/

IP

Cla

ssif

icati

on

Software

Sp

ectr

al

Sp

ati

al

[182] NIR-CI Tablets Content

uniformity

Checking

No Yes Yes No No Histograms MCR,

Alternatin

g Least

Squares

TS Capture,

MATLAB

[125] MSI Tablets Multiple Tablet

Simultaneous

identification /

composition

Determination

No Yes Yes No No No PCA -

[100] NIR-CI Tablets Counterfeit

tablet

Identification

Yes Yes Yes No PCA No PLS Unscrambler

[183] NIR-CI Tablets Tablet

classification /

sourcing

No Yes Yes MSC,

SNVC,

SDC

PCA Histograms k-Means

Clustering

ISys,

Matlab

60

Ref

eren

ce

Tec

hn

iqu

e

Dosage

Form

Type of

Processing

Sam

ple

Pre

para

tion

Features

Pre

pro

ces

sin

g

Fea

ture

Red

uct

ion

Seg

men

tati

on

/

IP

Cla

ssif

icati

on

Software

Sp

ectr

al

Sp

ati

al

[97] NIR-CI Solids Chemical Image

generation

No Yes Yes SDC,

SNVC,

MSC

No Noise

Removal

PLS -I,

Classical

Least

Squares

MATLAB,

PLS Toolbox

[184] CSLM Solids Coating

Thickness and

Pore

Distribution

evaluation

No No Yes No No Binarizatio

n, Image

Contrast

Enhanceme

nt

Fuzzy c-

Means

Cluster,

ED

MATLAB

[185] NIR-CI Tablets Composition

Determination

No Yes Yes No No No Classical

Least

Squares

-

[178] RS Tablets Identification of

tablets

No Yes No PLS No No SVM -

61

Ref

eren

ce

Tec

hn

iqu

e

Dosage

Form

Type of

Processing

Sam

ple

Pre

para

tion

Features

Pre

pro

ces

sin

g

Fea

ture

Red

uct

ion

Seg

men

tati

on

/

IP

Cla

ssif

icati

on

Software

Sp

ectr

al

Sp

ati

al

[75] RS Tablets Detection of

counterfeit

tablets

No Yes No - No No PCA.

HCA

MATLAB

[53] NIRS Tablets Screening of the

suspected

counterfeit

tablets

No Yes No No No No PCA MATLAB

[51] NIRS Tablets Counterfeit

detection

No Yes No No No No PCA MATLAB,

Unscrembler

[69] Portable

RS

Tablets Counterfeit

detection

No Yes No WT No No PCA,

LSLS

MATLAB

[54] NIRS Tablets,

Capsule

identification of

clinical study

lots

No Yes No SNVC Fisher

criterion

, FT,

PCA

No LDA,

QDA,

KNN

MATLAB

62

Ref

eren

ce

Tec

hn

iqu

e

Dosage

Form

Type of

Processing

Sam

ple

Pre

para

tion

Features

Pre

pro

ces

sin

g

Fea

ture

Red

uct

ion

Seg

men

tati

on

/

IP

Cla

ssif

icati

on

Software

Sp

ectr

al

Sp

ati

al

[74] RS Tablets Identification of

expired drugs

No Yes No SDC, FD,

SD, MN

No No PLS-DA,

KNN,

SVM

MATLAB

[50] NIRS Tablets Screening of

counterfeit

tablets

No Yes No SNVC,

MSC

NO No PCA, CA,

SIMCA

-

[107] NIR-CI Tablets Distribution

assessment of

major and minor

components and

their

quantification

No Yes Yes SDC,

SNVC

No No PLSR,

MCR,

CLS

-

[108] NIR-

HSI

Tablets Assessment of

APIs and

excipients from

No Yes Yes SDC,

SNVC

No No MCR MATLAB

63

Ref

eren

ce

Tec

hn

iqu

e

Dosage

Form

Type of

Processing

Sam

ple

Pre

para

tion

Features

Pre

pro

ces

sin

g

Fea

ture

Red

uct

ion

Seg

men

tati

on

/

IP

Cla

ssif

icati

on

Software

Sp

ectr

al

Sp

ati

al

the surface of the

tablet

[94] RGB

Imaging

Tablets Identification of

counterfeit

tablets

No No Yes Binarizati

on

No Morphologi

cal

operations

Bhattacha

ryya

distance

MATLAB

[92] RGB

Imaging

Tablets Illicit tablet

matching and

retrieval system

No No Yes Segmenta

tion,

smoothin

g

No Edge

detection,

Boundary

removal

L2-norm MATLAB

[110] NIR-CI Tablets Determination of

excipients and

coating

distribution

No Yes Yes SNVC No No PLS MATLAB

CHAPTER NO. 3

PROPOSED APPROACH –

MICROSCOPIC IMAGING

64

In this chapter, our focus is to develop a methodology based on high-resolution

Microscopic Imaging (MI). The proposed approach will use combination of IP and ML

techniques for the classification of these SPPs into DSPP and NSPP. This part of the

research is aimed at formulating a new nondestructive method, which is based on the

surface analysis of SPPs for their above said classification.

3.1 Microscopic Imaging

In this part, our focus is on an analysis based on surface morphology of the SPP using IP

and ML techniques. The surface of an SPP can effectively represent various characteristics.

In proposed approach, we are using the high resolution microscopic surface images of SPPs

for the classification between DSPP and NSPP. The proposed approach mainly consists of

five phases: image acquisition, preprocessing, feature extraction, feature reduction and

classification. The main flow of the proposed approach is shown in Figure 3.1.

Figure 3.1: Basic flow of the proposed MI approach

The first phase is based on the acquisition of the surface images of the SPPs. This is

followed by the preprocessing phase in which input images are prepared for further

analysis. The preprocessed images are then passed to feature extraction phase to extract

different textural features, which will be stored as Feature Vector (FV). In the next phase,

NSPP

Enhanced Image Color Image

FV

Reduced FV

Preprocessing Feature

Extraction

Feature

Reduction Classification

DSPP

Image

Acquisition

SPP

65

feature reduction techniques are applied on the extracted FV to reduce its dimensionality. In

the end, the last phase classifies the images into two categories i.e. DSPP and NSPP based

on their selected features. The detailed proposed approach is shown in Figure 3.2.

3.1.1 Image Acquisition

We have created nine different datasets for the experimentation of the proposed approach.

Each dataset comprises the images of defective and non-defective versions of ten different

SPPs. These images are captured using Labomed 5MP digital camera mounted on Nikon

Eclipse LV100 microscope [186] with a resolution of 2580 x 1944. We have considered

three major environmental factors i.e. temperature, moisture and humidity to expose the

surface of the SPPs. We have created our own datasets for the analysis as the datasets for

the environment affected SPPs are not available publically

Three datasets are created for the SPPs affected by temperature and labeled as T1, T2 and

T3. T1 consists of images of the SPPs, which are placed in an area having 200⁰C

temperature for five minutes and their non-defective versions. Similarly, T2 and T3 contain

images of defective and non-defective SPPs placed in 240⁰C and 280⁰C for five minutes

respectively. In the same way, three datasets are created for humidity factor labeled as H1,

H2 and H3. Defective SPPs in H1 are placed out of their packaging (in open air) for three

days, similarly H2 and H3 contain images of the SPPs that remain out of their packaging

for two and one day respectively. Another three datasets are created for the SPP images

affected by moisture. Moisture affected SPP images were captured after affecting ten SPPs

at day 1, ten at day 2 and ten at day 3 with different levels of moisture (liquid water) exposed

to them and these datasets of the SPPs are referred to as M1, M2 and M3 respectively. A

brief description of datasets is given in Table 3.1.

66

Figure 3.2: Detailed diagram of the proposed MI approach

Figure 3.3 shows some of the images of the datasets used in this research. In each part of

this figure, first four images are of environment-affected SPPs and last four are of their

non-defective versions. Figure 3.3: (a), (b) and (c) parts shows SPP images of datasets H1,

H2 and H3 which are affected by humidity. Similarly, parts (d), (e) and (f) display SPPs

SPP

Feature Extraction

GLCM RLM Histogram AR

Model HAAR

Wavelet

Grayscale

Conversion

Contrast

Enhancement

Preprocessing

Feature

Reduction

Classification

DSPP NSPP

Image

Acquisition

Color Image

Gray Image

Enhanced Image

GLCM

FV

RLM FV Hist FV HAAR

FV ARM

FV

Combined FV

Reduced FV

67

affected by temperature and labeled as T1, T2 and T3. Figure 3.3 part (g), (h) and (i)

represent SPPs of datasets M1, M2 and M3 respectively. Each of these three datasets

belongs to moisture-affected SPPs.

Table 3.1: Dataset description

Environmental

Factors

Dataset

No. of

DSPP

No. of

NSPP

Humidity

H1 11 17

H2 13 17

H3 13 17

Temperature

T1 13 17

T2 15 17

T3 19 17

Moisture

M1 14 17

M2 15 17

M3 16 17

3.1.2 Preprocessing

Preprocessing consists of algorithms that can be used for IE and noise removal. After image

acquisition, preprocessing is an essential step to prepare the captured images for the feature

extraction. Preprocessing is performed in two steps i.e. Grayscale Conversion and IE.

68

3.1.2.1 Grayscale Conversion

Texture analysis is used in different machine vision problems such as surface inspection

and classification. We can define texture as the spatial distribution of different gray levels

in a neighborhood. To perform textural analysis it is important to convert color image into

grayscale image.

Figure 3.3: The sample images of the DSPPs and NSPPs in each dataset (a) images

contained in dataset H1 (b) images contained in dataset H2 (c) images contained in

dataset H3 (d) images contained in dataset T1 (e) images contained in dataset T2 (f)

images contained in dataset T3 (g) images contained in dataset M1 (h) images contained

in dataset M2 (i) images contained in dataset M3

3.1.2.2 Contrast Enhancement

IE is very important to improve the quality of the input image. The enhancement technique

used in the proposed approach is contrast enhancement. The increase in image contrast is

69

performed using the formula given in Equation 3.1 [187], which is based on saturating 1%

of the data at high and low gray intensity values of the input image.

𝐶𝐸 (𝑖, 𝑗) =

{

255, 𝑖𝑓𝑓(𝑖, 𝑗) > ℎ

0, 𝑖𝑓 𝑓(𝑖, 𝑗) < 𝑙

min (𝑓(𝑖, 𝑗) − 1

ℎ − 𝑙 , 255) , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Equation 3.1: Contrast enhancement formula

Where

CE(i , j) = Contrast enhancement at pixel i , j

f(i , j) = image intensity at a particular index i , j

h = high intensity of the image

l = low intensity of the image

3.1.3 Feature Extraction

After applying preprocessing on the input image, we need to perform feature extraction to

quantify surface of the image through different parameters. Analysis of the surface of the

SPPs through its texture can provide great help in classifying them into defective and non-

defective SPPs. Different textural features used in this study are Gray level Co-occurrence

Matrix (GLCM), Histogram, Run Length Matrix (RLM), Autoregressive (AR) Model and

HAAR Wavelet features. The details of these features are available in Chapter 2. A total

281 textural features are extracted from each of the preprocessed image using MaZda

(Texture Analysis Software) designed by Szczypinski et al. [188]. The creation of jth dataset

is shown in Equation 3.2; here the value of j is from 1 to 9.

70

𝑑𝑎𝑡𝑎𝑠𝑒𝑡𝑗 = ⋃𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠(𝐼𝑘)

𝐿

𝐾=1

Equation 3.2: Formula for dataset representation

Ik is the kth input image

features (Ik) is the feature set of the kth image

L is the total number of images for each dataset

The detail of these features used in this research is given below.

3.1.3.1 Gray-level Co-occurrence Matrix

MaZda provide eleven features extracted from GLCM. These are angular second moment,

contrast, correlation, sum of squares, inverse difference moment, sum average, sum

variance, sum entropy, entropy, difference variance and difference entropy. In this research

we have computed GLCM features for five between-pixel distances (1, 2, 3, 4, and 5). So

total 220 features are extracted.

3.1.3.2 Histogram Features

MaZda provides nine histogram features in total from which we have chosen four: Mean

(histogram’s mean), Variance (histogram’s variance), Skewness (histogram’s skewness)

and Kurtosis (histogram’s kurtosis).

3.1.3.3 Run Length Matrix

Twenty features are extracted for RLM using MaZda in this research. These features are

run-length non-uniformity, grey level non-uniformity, long run emphasis, short run

emphasis and fraction of image in runs. Each feature is computed in four different

directions (horizontally, vertically, 45 degree and 135 degree).

71

3.1.3.4 Autoregressive Model

MaZda provides five different features based on AR model. These are Teta1 (parameter

θ1), Teta2 (parameter θ2), Teta3 (parameter θ3), Teta4 (parameter θ4) and Sigma

(parameter σ).

3.1.3.5 Wavelet Transformations

Wavelet energy features based on HAAR wavelet are measured at eight scales using four

bands of frequency (LL, LH, HL and HH) using MaZda. This provides a total number of

32 features.

3.1.4 Feature Reduction

The feature extraction phase results in 281 different features, which are very hard to deal

with. Therefore, for better results it is important to reduce the dimensionality of the feature

sets. For this reason, two reduced feature sets were derived from the original 281 features.

First set is based on the features selected using feature reduction algorithms. Three different

feature reduction algorithms are used in this research for extracting the most promising

features that can lead us towards the correct classification between DSPP and NSPP. These

three algorithms are Chi-Square (CS), Gain Ratio (GR), and Relief-F (RF). Feature

reduction is performed using each of the three techniques. The size of the reduced feature

sets is selected as 15. Therefore, in result 15 top ranked features against each feature

reduction algorithm are extracted from the complete feature set. Feature reduction is

performed using an ML based software named WEKA developed by Hall et al. [189]. All

of these feature selection algorithms are used along with Ranker search algorithm.

It is observed that top 15 features extracted from both GR and RF for our dataset are the

same. The names of these selected features are given in Table 3.2. It can be noticed from

72

this table that according to the CS, 14 features are related to angular second moment

(AngScMom) from multiple distances and one is inverse difference moment (InvDfMom)

at distance 4 from these fifteen features. All of these features lies under GLCM. On the

other hand, 14 features selected from GR are related to GLCM and one is Wavelet Energy

from HAAR Wavelet features.

The second reduced feature set is based on the top two features from overall 281 features.

These top two features were extracted on the base of multiple experiments. For the

experimentation, we have used the overall 281 feature in multiple combinations of different

lengths as feature sets. These feature sets are then used as input to the classifiers. The

comparison of the accuracies was performed. The analysis showed that the use “S (5, 0)

Entropy” (entropy at distance 5) and “Horzl_GLevNonU” (horizontal grey level non-

uniformity) in combination is a better choice for the classification of the DSPPs and NSPPs.

Entropy measure is from GLCM and Horzl_GlevNonU is from RLM.

3.1.5 Classification

The evaluation of the features extracted from the images of the SPPs is performed using

four different types of classification algorithms i.e. SVM, KNN, NB and Ensemble of

Classifiers (EC). In this research, we have performed a comparison between the accuracies

achieved from these classifiers. All experimental work for this research is performed using

MATLAB [190]. Classification is performed by using all 281, and the 2 reduced feature

sets based on top 15 and 2 selected features. The details of all these classifiers can be found

in Chapter 2.

73

Table 3.2: List of top 15 selected features from CS, GR and RF

Rank

Features

Chi-Square Gain Ratio / Relief-F

1 S(5,5)AngScMom WavEnHH_s-8

2 S(4,-4)AngScMom S(3,0)SumOfSqs

3 S(0,4)AngScMom S(3,0)Contrast

4 S(0,2)AngScMom S(3,0)AngScMom

5 S(4,4)AngScMom S(2,-2)DifEntrp

6 S(2,2)AngScMom S(2,-2)DifVarnc

7 S(4,0)AngScMom S(2,-2)Entropy

8 S(2,-2)AngScMom S(3,0)Correlat

9 S(0,3)AngScMom S(3,0)InvDfMom

10 S(3,-3)AngScMom S(2,-2)SumVarnc

11 S(3,0)AngScMom S(3,0)SumAverg

12 S(4,4)InvDfMom S(3,0)DifEntrp

13 S(2,0)AngScMom S(3,0)DifVarnc

14 S(3,3)AngScMom S(3,0)Entropy

15 S(1,-1)AngScMom S(3,0)SumEntrp

In the proposed approach, we have used the value of k as two for the implementation of

KNN. Therefore, two nearest neighbors (2NN) with ‘Cosine’ as a distance metric is used

for the classification between NSPP and DSPP. For SVM, the training of the datasets is

74

performed using linear kernel function with Sequential Minimal Optimization (SMO)

method for separating hyperplanes. The implementation of NB is performed using Normal

Gaussian Distribution.

Ensemble of Classifiers (EC) is also used for the classification between DSPP and NSPP

in this research. The working of EC is voting based, i.e. a class having maximum votes

from the set of base classifiers is assigned to the test instance. We have used SVM, KNN,

NB as base classifiers in this research, and implemented using MATLAB software. The

purpose of EC is to enhance the performance of the base classifiers. Equation 3.3 explains

the formula for EC.

𝐸𝐶 = {1 𝑖𝑓 ∑𝐵𝑎𝑠𝑒_𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑟_𝐶𝑙𝑎𝑠𝑠

𝑛

1

> 𝑛

2

−1 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Equation 3.3: Formula for EC

Where

‘1’ represents NSPP class

‘-1’ represents DSPP class

‘n’ is the total number of base classifiers

CHAPTER NO. 4

PROPOSED APPROACH –

MULTISPECTRAL ANALYSIS

75

In this chapter, we have proposed another approach based on Multispectral Analysis (MA)

for the classification between DSPP and NSPP. The literature indicates that MA is a very

effective technique for the analysis of SPPs. The MA of the spectral data along with various

ML and SP techniques is used for the required classification.

4.1 Multispectral Analysis

The proposed approach is based on four main phases i.e. Spectrum Acquisition,

Preprocessing, Feature Extraction and Classification. The basic flow of the proposed

approach is shown in Figure 4.1.

Figure 4.1: Flow of the proposed MA approach

4.1.1 Spectrum Acquisition

Spectrum acquisition is made using µRamboss Raman Spectrometer (Dongwoo Optron,

South Korea) [191] for all of the nine datasets. All spectrums are collected using He-Cd

laser with a source excitation of 442 nm and laser power of 40 mW. A complete spectrum

of a single SPP consists of 1024 data points over the range of 106 nm – 2805 nm. The

‘Andor Solis for Spectroscopy’ software used for data acquisition.

SPP

Wav

elet

Coefficien

ts

IR/ UV/ Visible

Spectrum Raw

Spectrum Spectrum

Acquisition Preprocessing

Feature

Extraction

Classification

DSPP NSPP

76

For spectrum acquisition, again we have created the required datasets based on environment

affected SPPs. Fourteen general purpose SPPs that are commonly prescribed by the

physicians to the patients against different diseases were used in the experimentation of this

approach. We have again created nine datasets on the same conditions as used in MI based

approach. Each of these nine datasets consists of fourteen DSPPs and fourteen NSPPs.

Therefore, twenty-eight SPPs in each dataset. DSPP are again prepared using three

environmental factors i.e. temperature, moisture and humidity. Temperature affected

DSPPs are further divided into three groups labeled as T1, T2 and T3, by placing them in

a preheated environment with a temperature of 200⁰C, 240⁰C and 280⁰C respectively for

five minutes. Similarly, three groups of moisture affected DSPPs namely M1, M2, and M3

are created by exposing with a different amount of liquid water at day 1, day 2 and day 3

respectively. Groups of humidity-affected SPPs are called as H1, H2 and H3. H1 SPPs are

placed out of their packaging for three days, H2 for two days and H3 for one day.

4.1.2 Preprocessing

The extracted spectrum consists of 1024 points, which is in fact quite a large number of

variables to process. For further processing, we have divided the spectra of each dataset

into three ranges i.e. UV (10 nm – 380 nm), Visible (380 nm – 750 nm) and IR (above 750

nm). Now UV range consists of 91 data points, Visible of 127 and IR of 806 points.

Figure 4.2 shows sample spectral data of Non-Defective and Defective SPPs within the UV

wavelength range. Similarly, Figure 4.3 represents the spectra of the same SPP within the

visible wavelength. (a), (b) and (c) parts of both of the figures, represent spectra of SPP

affected by each of the three environmental factors respectively.

77

Figure 4.2: Multispectral data for NSPP and DSPP within UV wavelength. (a) Spectra of

NSPP and humidity affected DSPP datasets H1, H2 and H3. (b) Spectra of NSPP and

temperature affected DSPP datasets T1, T2 and T3. (c) Spectra of NSPP and moisture

affected DSPP datasets M1, M2 and M3.

In the same way, Figure 4.4 (a) shows multispectral data of the NSPP and humidity affected

DSPP within the IR wavelength. Figure 4.4 (b) and (c) parts represent spectra of

temperature and moisture affected DSPP along with their NSPP correspondingly.

78

Figure 4.3: Multispectral data for NSPP and DSPP within Visible wavelength. (a) Spectra

of NSPP and humidity affected DSPP datasets H1, H2 and H3. (b) Spectra of NSPP and


affected DSPP

4.1.3 Feature Extraction

For further processing, we need to extract some features from multispectral data that can

help in a proper classification between DSPP and NSPP. In this research, we have used

Wavelet Coefficients (WC) of a signal as features for the classification purpose.

79

4.1.3.1 Wavelet Transformation

In this research, we are using WT to gain both of its advantages: spectrum de-noising and

dimensionality reduction. In this research, we ae using only detailed coefficients because

they represent a good match between spectra and wavelet function. The MATLAB software

package is used for the computations of WC in this research. We have performed multiple

experiments, using Daubechies, Coiflets, Symlets, Discrete Meyer, Biorthogonal and

Reverse Biorthogonal wavelet functions along with different decomposition levels (2 -5)

to achieve best results. The detailed coefficients are then used to train the classifiers for

evaluating the performance of the proposed approach.

4.1.4 Classification

The last phase of the proposed approach is the classification based on feature sets extracted

from the previous phase. We have used four classification algorithms in this research i.e.

SVM, KNN, NB and EC. In this research, we are also performing comparison of accuracies

achieved using all the four classifiers. The detail of these algorithms is given in Chapter 2.

80

Figure 4.4: Multispectral data for NSPP and DSPP within IR wavelength. (a) Spectra of

NSPP and humidity affected DSPP datasets H1, H2 and H3. (b) Spectra of NSPP and


affected DSPP

CHAPTER NO. 5

ANALYSIS AND DISCUSSION

81

In this research, we have proposed two different approaches for the classification between

DSPP and NSPP. One is based on two-dimensional MI data and the other one is based on

single dimensional multispectral data. We have applied statistical textural feature extraction

techniques on MI data. While, on the other hand, we have used WT for MA as a feature

set. The results gained by using both of the approaches is given below.

5.1 Microscopic Imaging

In this research, we have evaluated the accuracy of the proposed approach using two

different experiments; one is using LOO cross-validation and the other one is using Holdout

Validation (HV). LOO is a validation technique that uses N-1 samples as training set and

the remaining one sample as test set from a total of N samples. While the HV validation

method is based on the use of training and testing datasets separately. As in real-time, we

do not have any information about which of the environmental factors have affected the

SPP that we have to test. Therefore, for this reason, we have created three new combined

datasets based on humidity, moisture and temperature affected SPPs. To avoid an element

of bias we have extracted the subsets of five SPPs from each of the datasets H1, H2 and H3

individually and placed them into a new dataset named ‘H1,H2,H3’. Same for the moisture

and temperature datasets and these are named as ‘T1,T2,T3’ and ‘M1,M2,M3’ respectively.

So now, each of these three datasets consists of fifteen NSPPs and fifteen DSPPs. A fourth

dataset was also created which is, in fact, again a combination of randomly selected thirty

SPPs from original nine datasets named as ‘HTM’.

In Experiment I, Leave-one-out (LOO) Cross Validation method is used for the evaluation

of the proposed approach. LOO cross-validation is firstly applied on each individual

dataset, then on combined datasets of each environmental factor and in the last, over a

combined dataset of all environmental factors. Three classifiers are used in this experiment

82

namely, SVM, KNN and NB. In Experiment II, using separate training and testing datasets

(HV model), we have evaluated the accuracy of the proposed approach. Each dataset is

divided into two equal halves, so 50% of the data is used for training the proposed approach

and 50% of the remaining data is used for the testing. Classification accuracy of the

proposed approach is measured using four different types of classifiers (SVM, KNN, NB

and EC). The feature vector is formed using 281 texture-based features extracted from the

preprocessed images.

In Experiment I, firstly, we have used whole 281 features as feature vector and evaluated

the performance of the proposed approach using all of the classifiers based on LOO cross-

validation. Further, we have applied the classification process on each SPP dataset

individually and then on combined datasets. Table 5.1 contains the results of LOO cross

validation using 281 features.

A graphical representation of the accuracy of each classifier is shown in Figure 5.1. Results

show that maximum accuracy is achieved by using SVM classifier for most of the datasets.

Classification accuracies against moisture-affected SPPs are higher than the other two

factors. From humidity affected SPP datasets, it can be seen that the humidity affects the

surface of the solid SPPs very slowly that is why they have low classification rate. Same

results are being reflected by the accuracies of the combined datasets.

83

Table 5.1: LOO results for all individual and combined datasets using 281 features

Datasets No. of

Features

SVM KNN NB

Acc Sn Sp Acc Sn Sp Acc Sn Sp

H1 281 57 65 45 68 65 73 50 47 55

H2 281 57 59 54 50 41 62 53 47 62

H3 281 63 65 62 37 29 46 47 35 62

T1 281 57 59 54 53 59 46 60 47 77

T2 281 72 71 73 78 82 73 66 65 67

T3 281 69 71 68 64 59 68 69 88 53

M1 281 81 82 79 68 59 79 84 82 86

M2 281 72 76 67 78 76 80 69 65 73

M3 281 88 82 94 88 88 88 85 94 75

H1, H2, H3 281 67 53 73 63 18 84 57 35 68

T1, T2, T3 281 69 59 72 75 41 87 61 71 57

M1,M2, M3 281 85 71 91 82 59 91 84 76 87

HTM 281 84 53 88 86 12 95 64 59 65

From Table 5.1 to Table 5.12, ‘Acc’ is for accuracy, ‘Sn’ for sensitivity and ‘Sp’ for

specificity.

84

Figure 5.1: LOO results against all individual and combined datasets using 281 features

After that, LOO cross-validation is applied on the selected top 15 features. Against most of

the datasets, SVM provides the maximum level of accuracies. Classification accuracies are

calculated again using three classifiers against top 15 selected features and it can be seen

from results that feature sets extracted from the CS provide higher accuracies as compared

to GR. The comparison of results using top 15 features is shown in Table 5.2. Overall SVM

and KNN provide higher accuracies using CS for the classification for all individual

datasets of the SPPs. SVM provides maximum 90.32% accuracy for M1 Dataset using CS

while KNN provides 90.91% accuracy for M3 using GR. From the results it can be

highlighted that moisture affected SPPs have higher classification rate.

In case of combined datasets of SPPs, high rate of correct classification achieved for

moisture affected SPPs using CS and SVM. Temperature and humidity affected SPPs have

relatively lower classification accuracies as compared to moisture. In case of completely

combined datasets, maximum 86.30% accuracy achieved using KNN classifier. Figure 5.2

0

20

40

60

80

100A

CC

UR

AC

IES

TABLET DATASETS

LOO RESULTS USING ALL 281 FEATURES

SVM KNN NB

85

shows the accuracies of individual and combined SPP datasets by using CS and GR feature

sets.

In the last of Experiment I, we have evaluated the accuracy of the proposed approach

against top two features selected from 281 features. As we already mentioned that these

two features are selected by making combinations of two from 281 features and then

selecting a pair of features providing maximum accuracy.

Figure 5.2: LOO results against all individual and combined datasets using top 15 features

0

20

40

60

80

100

AC

CU

RA

CIE

S

TABLET DATASETS

LOO RESULTS USING TOP 15 FEATURES

THROUGH CS AND GR

CS-SVM CS-KNN CS-NB

GR-SVM GR-KNN GR-NB

86

Table 5.2: LOO results for all individual and combined datasets using top 15 features

Datasets No. of Features FR-Algo

SVM KNN NB


H1 15 CS 54 47 64 71 82 55 57 41 82

H2 15 CS 53 47 62 57 59 54 53 41 69

H3 15 CS 50 47 54 53 65 38 60 47 77

T1 15 CS 70 82 54 63 71 54 53 35 77

T2 15 CS 72 94 47 72 76 67 53 41 67

T3 15 CS 78 94 63 75 71 79 67 53 79

M1 15 CS 90 88 93 71 76 64 58 53 64

87


SVM KNN NB


M2 15 CS 78 94 60 72 76 67 56 35 80

M3 15 CS 88 94 81 73 71 75 67 47 88

H1, H2, H3 15 CS 67 41 78 74 53 84 63 35 76

T1, T2, T3 15 CS 66 94 55 80 53 89 67 41 77

M1,M2, M3 15 CS 85 94 82 77 59 84 69 29 84

HTM 15 CS 64 94 60 86 35 93 75 29 81

H1 15 GR 61 65 55 68 76 55 43 47 36

H2 15 GR 43 47 38 57 71 38 53 47 62

88


SVM KNN NB


H3 15 GR 43 35 54 40 47 31 43 35 54

T1 15 GR 60 59 62 57 59 54 57 47 69

T2 15 GR 72 71 73 75 82 67 59 53 67

T3 15 GR 72 88 58 64 65 63 67 82 53

M1 15 GR 81 76 86 74 76 71 77 76 79

M2 15 GR 72 71 73 72 71 73 63 65 60

M3 15 GR 79 88 69 91 94 88 85 94 75

H1, H2, H3 15 GR 48 35 54 59 35 70 61 35 73

89


SVM KNN NB


T1, T2, T3 15 GR 66 71 64 73 47 83 61 47 66

M1,M2, M3 15 GR 77 71 80 79 65 84 79 65 84

HTM 15 GR 61 65 60 83 24 91 77 47 81

90

Table 5.3 shows the accuracies of individual and combined datasets using top two features.

LOO cross-validation using top two features again provide maximum classification rates

for moisture affected datasets through SVM. In case of combined dataset NB provide

maximum classification accuracy i.e. 91% but with low sensitivity rate, that is 29%. This

is depicted in Figure 5.3.

Figure 5.3: LOO results against all individual and combined datasets using top 2 features

Similarly in Experiment II, we have evaluated the accuracies of the proposed approach

through the HV model against all 281, selected top 15 and top 2 features. All accuracies in

this experiment are calculated by providing the test datasets to the already trained models.

0

20

40

60

80

100

AC

CU

RA

CIE

S

TABLET DATASETS

LOO RESULTS USING TOP TWO FEATURES

SVM KNN NB

91

Table 5.3: LOO results for all individual and combined datasets using top 2 features

Datasets No. of

Features Feature Name

SVM KNN NB


H1 2 A189 and A226 61 41 91 46 59 27 46 47 45

H2 2 A189 and A226 53 47 62 67 76 54 37 41 31

H3 2 A189 and A226 50 47 54 47 53 38 40 47 31

T1 2 A189 and A226 63 53 77 50 53 46 50 47 54

T2 2 A189 and A226 56 65 47 66 65 67 69 59 80

T3 2 A189 and A226 67 59 74 75 71 79 67 65 68

M1 2 A189 and A226 81 71 93 61 71 50 65 59 71

M2 2 A189 and A226 72 76 67 47 53 40 63 59 67

M3 2 A189 and A226 76 76 75 61 59 63 76 65 88

H1,H2,H3 2 A189 and A226 61 35 73 63 35 76 72 29 92

92

Datasets No. of

Features Feature Name

SVM KNN NB


T1,T2,T3 2 A189 and A226 69 53 74 80 47 91 75 35 89

M1,M2,M3 2 A189 and A226 76 71 78 65 41 73 73 29 89

HTM 2 A189 and A226 73 71 73 83 24 91 91 29 99

93

Table 5.4 shows the test results against all 281 features. In case of combined humidity

dataset, 69% accuracy achieved through SVM and EC. Higher accuracies achieved using

combined datasets of temperature and moisture affected SPPs i.e. 81% for both of them

using EC. However, the classification rate of proposed approach reduces when applied on

over all combined dataset. Against individual datasets like humidity, temperature and

moisture, EC provides accuracies higher than all other three classifiers except for T3. NB

and EC both provide 94% accuracy for M1 and for T2, 88%. In case of H1, SVM and EC

both provides maximum accuracy i.e. 79%. Figure 5.4 shows the results in graphical form.

Figure 5.4: HV results against all individual and combined datasets using all 281 features

0

20

40

60

80

100

AC

CU

RA

CIE

S

DATASETS

HV RESULTS USING ALL 281 FEATURES

SVM KNN NB EC

94

Table 5.4: Accuracies for test datasets using 281 features

Datasets

No. of

features

SVM KNN NB EC

Acc Sn Sp Acc Sn Sp Acc Sn Sp Acc Sn Sp

H1 281 79 80 43 57 60 29 57 40 57 79 80 75

H2 281 67 56 83 60 67 50 60 67 50 73 67 83

H3 281 60 67 50 40 33 50 67 78 50 67 67 67

T1 281 73 67 83 53 44 67 60 56 67 73 67 83

T2 281 81 78 86 69 67 71 88 78 100 88 89 86

T3 281 61 75 50 61 63 60 61 75 50 56 63 50

M1 281 88 89 86 75 56 100 94 100 86 94 89 100

95

Datasets

No. of

features

SVM KNN NB EC


M2 281 75 78 71 75 67 86 75 78 71 81 78 86

M3 281 88 100 75 82 78 88 88 100 75 88 100 75

H1,H2,H3 281 69 100 38 63 63 63 50 63 38 69 100 38

T1,T2,T3 281 75 63 88 75 88 63 56 88 25 81 88 75

M1,M2,M3 281 75 63 88 81 88 75 75 63 88 81 75 88

HTM 281 44 63 25 69 63 75 56 63 50 63 75 50

96

The test results against selected top 15 features are shown in Table 5.5. In some

situations, features selected from CS outer performs while for others GR provide

higher accuracies. NB provide relatively low accuracies than SVM and KNN. A

maximum of 94% accuracy achieved against T2 and M1 datasets using GR and

SVM. In case of M2 and M3, 88% accuracy achieved again by using GR. SVM and

EC outer-performs against both of these datasets. The correct classification rate of

humidity affected SPPs is relatively lower than those of the others. When the trained

model is tested on combined datasets, maximum 88% accuracy is achieved against

moisture and temperature affected SPP datasets using both CS and GR. An accuracy

of 75% achieved when the proposed approach is tested for overall combined

datasets (HTM). This is achieved using CS-EC and GR-KNN. The graphical

representation of these results is shown in Figure 5.5.

The accuracies against top two selected features using test datasets are provided in

Table 5.6. It can be seen from results against for almost all of the datasets, SVM

performs better except humidity. In case of humidity-affected datasets, KNN

provides better results. For M3, SVM provides 88% accuracy with 89% sensitivity

and 88% specificity. In case of overall combined dataset, SVM provides maximum

accuracy i.e. 69%. Humidity and temperature affected combined datasets results

75% accuracy by using NB and KNN. Highest level of accuracy is achieved from

moisture affected SPPs through SVM and i.e. 88%. These results are also shown in

Figure 5.6.

97

Table 5.5: Accuracies for test datasets using top 15 selected features

Datasets

No. of

features

FR-Algo

SVM KNN NB EC


H1 15 CS 50 30 57 71 80 29 50 30 57 50 30 100

H2 15 CS 47 33 67 60 44 83 60 44 83 53 44 67

H3 15 CS 40 33 50 60 67 50 60 44 83 47 44 50

T1 15 CS 80 67 100 60 67 50 53 33 83 67 56 83

T2 15 CS 69 44 100 69 78 57 63 44 86 69 44 100

T3 15 CS 56 75 40 78 88 70 56 38 70 67 63 70

M1 15 CS 88 100 71 81 78 86 69 78 57 88 100 71

98

Datasets

No. of

features

FR-Algo

SVM KNN NB EC


M2 15 CS 81 100 57 69 67 71 56 44 71 75 78 71

M3 15 CS 88 100 75 76 78 75 71 44 100 82 89 75

H1, H2, H3 15 CS 56 50 63 69 63 75 69 63 75 63 50 75

T1, T2, T3 15 CS 75 88 63 75 100 50 75 88 63 75 88 63

M1,M2, M3 15 CS 88 75 100 56 75 38 63 63 63 75 75 75

HTM 15 CS 63 75 50 63 75 50 69 63 75 75 75 75

H1 15 GR 29 20 29 64 80 14 57 40 57 50 50 50

H2 15 GR 60 56 67 53 67 33 67 67 67 53 56 50

99

Datasets

No. of

features

FR-Algo

SVM KNN NB EC


H3 15 GR 53 67 33 40 33 50 67 78 50 60 67 50

T1 15 GR 60 67 50 60 44 83 67 67 67 67 67 67

T2 15 GR 94 89 100 56 44 71 81 67 100 88 78 100

T3 15 GR 61 88 40 39 50 30 61 75 50 56 75 40

M1 15 GR 94 100 86 81 89 71 88 89 86 88 89 86

M2 15 GR 88 89 86 56 56 57 75 78 71 88 89 86

M3 15 GR 88 100 75 82 89 75 88 100 75 88 100 75

H1, H2, H3 15 GR 56 100 13 50 63 38 56 75 38 50 88 13

100

Datasets

No. of

features

FR-Algo

SVM KNN NB EC


T1, T2, T3 15 GR 63 75 50 88 88 88 44 75 13 75 88 63

M1,M2, M3 15 GR 75 75 75 81 75 88 75 63 88 88 88 88

HTM 15 GR 44 88 0 75 75 75 50 63 38 50 63 38

101

Figure 5.5: HV results against all individual and combined datasets using top 15 features

0

20

40

60

80

100

AC

CU

RA

CIE

S

DATASETS

HV RESULTS USING TOP 15 FEATURES THROUGH CS AND GR

CS-SVM CS-KNN CS-NB CS-EC GR-SVM GR-KNN GR-NB GR-EC

102

Table 5.6: Accuracies for test datasets using top 2 features

Datasets

No. of

Features

Feature Name

SVM KNN NB EC


H1 2 A189 and A226 50 30 57 57 70 14 50 50 29 50 50 50

H2 2 A189 and A226 53 56 50 67 67 67 60 56 67 67 67 67

H3 2 A189 and A226 53 56 50 53 67 33 47 33 67 47 56 33

T1 2 A189 and A226 80 78 83 73 89 50 60 44 83 80 78 83

T2 2 A189 and A226 69 44 100 63 56 71 50 44 57 56 44 71

T3 2 A189 and A226 56 75 40 56 75 40 56 63 50 56 63 50

M1 2 A189 and A226 75 67 86 75 100 43 63 67 57 69 67 71

M2 2 A189 and A226 81 89 71 56 67 43 69 67 71 69 78 57

M3 2 A189 and A226 88 89 88 65 56 75 76 56 100 76 56 100

103

Datasets

No. of

Features

Feature Name

SVM KNN NB EC


H1,H2,H3 2 A189 and A226 63 63 63 75 75 75 75 75 75 63 63 63

T1,T2,T3 2 A189 and A226 69 88 50 75 88 63 75 88 63 69 88 50

M1,M2,M3 2 A189 and A226 88 75 100 44 75 13 63 63 63 63 63 63

HTM 2 A189 and A226 69 75 63 63 88 38 69 63 75 69 75 63

104

Figure 5.6: HV results against all individual and combined datasets using top 2 features

0

20

40

60

80

100

AC

CU

RA

CIE

S

DATASETS

HV RESULTS USING TOP TWO FEATURES

SVM KNN NB EC

105

5.2 Multispectral Analysis

In this part of the research, three different experiments are performed to evaluate the best

wavelet function that can effectively classify the environmentally affected defective or non-

defective SPPs for the proposed approach. The first experiment is to find out the best

wavelet function that can provide maximum accuracy against each individual dataset of

three environmental factors. In the second experiment, we have identified three wavelet

functions corresponding to each of the three environmental factors. This identification is

performed to achieve maximum accuracies for the classification of SPPs affected by any

of these factors. In the third experiment, we have tested that selected wavelet function

against combined datasets.

Both of these experiments are performed using HV model (training and testing data).

Again, the datasets are divided into two halves, each part containing 50% of the whole

dataset. Each of the four classifiers is trained using the training dataset based on detailed

WCs. After training, the performance of the proposed approach is evaluated using unknown

test datasets. Four different classification techniques (SVM, NB, KNN and EC) are used

on extracted feature vector to evaluate the accuracy of the proposed approach.

The Table 5.7 describes the accuracy, sensitivity and specificity of each dataset against

experiment I. Each dataset based on different environmental factors is tested in the UV, IR

and visible ranges of wavelengths, using the specified wavelet functions and classifiers. In

the case of humidity, the experiment gave the best accuracies using UV wavelength while

wavelet function and classifier were Coif1 and KNN respectively. Similar were the results

for sensitivity and specificity as well. When the environmental factor was temperature, the

experiment showed around 100% accuracy for both UV and IR wavelengths. Against T1,

Rbio1.3 was the wavelet function; KNN was the classifier when the accuracy was 100%

106

for UV wavelength. Against infrared wavelength, accuracy was maximum for Rbio3.5

wavelet, KNN classifier and T3 was the dataset. Figure 5.7 and Figure 5.8 summarizes the

accuracies and sensitivity of the results shown in Table 5.7 respectively.

For the moisture, again the infrared and ultraviolet wavelengths showed 100% accuracy for

all datasets. In case of UV wavelength, Rbio1.3 and EC were the wavelet function and

classifier correspondingly. For IR, sets for wavelet function and classifier for M1, M2 and

M3 were Dmey and SVM, Rbio2.8 and KNN, and Rbio4.4 and EC respectively. Results

for sensitivity and specificity were also according to the accuracy as mentioned. Visible

wavelength showed the lower accuracies for every dataset as compared to the infrared and

ultraviolet.

Figure 5.7: Comparison of accuracies against UV, IR and visible for experiment I

0

20

40

60

80

100

120

H1 H2 H3 T1 T2 T3 M1 M2 M3

AC

CU

RA

CIE

S

DATASETS

COMPARISON OF ACCURACIES AGAINST UV,

IR AND VISIBLE FOR EXPERIMENT I

UV IR Visible

107

Figure 5.8: Comparison of sensitivity against UV, IR and visible for experiment I

0

20

40

60

80

100

120

H1 H2 H3 T1 T2 T3 M1 M2 M3

SE

NS

ITIV

ITY

DATASETS

COMPARISON OF SENSITIVITIES AGAINST

UV, IR AND VISIBLE FOR EXPERIMENT I

UV IR Visible

108

Table 5.7: Results achieved by experiment I

Environmental

Factor Dataset Wavelength

No. of

Features

Wavelet Parameter

Classifier Accuracy Sensitivity Specificity Wavelet

Function Level

Humidity

H1

UV

26 Coif1 2 KNN 77 71 83

H2 26 Coif1 2 KNN 93 100 86

H3 26 Coif1 2 KNN 100 100 100

H1

IR

116 Rbio3.9 3 KNN 77 71 83

H2 115 Rbio2.8 3 KNN 93 86 100

H3 115 Rbio2.8 3 KNN 86 86 86

H1 Visible 35 Coif1 2 KNN 77 86 67

109

Environmental


No. of

Features

Wavelet Parameter


Function Level

H2 23 Bior2.4 3 KNN 93 86 100

H3 23 Rbio4.4 3 KNN 71 86 57

Temperature

T1

UV

15 Rbio1.3 3 KNN 100 100 100

T2 25 Bior3.9 3 KNN 93 100 86

T3 26 Bior6.8 3 KNN 93 86 100

T1

IR

58 Rbio2.4 4 EC 93 86 100

T2 116 Rbio3.9 3 NB 93 100 86

T3 109 Rbio3.5 3 KNN 100 100 100

110

Environmental


No. of

Features

Wavelet Parameter


Function Level

T1

Visible

42 Bior3.7 2 KNN 93 100 86

T2 21 Rbio3.3 3 KNN 79 86 71

T3 42 Bior2.4 4 EC 86 86 86

Moisture

M1

UV

15 Rbio1.3 3 EC 100 100 100

M2 15 Rbio1.3 3 EC 100 100 100

M3 15 Rbio1.3 3 EC 100 100 100

M1

IR

277 Dmey 2 SVM 100 100 100

M2 115 Rbio2.8 3 KNN 100 100 100

111

Environmental


No. of

Features

Wavelet Parameter


Function Level

M3 108 Rbio4.4 3 EC 100 100 100

M1

Visible

41 Rbio2.6 2 EC 100 100 100

M2 35 Coif1 2 EC 93 100 86

M3 28 Rbio3.7 3 KNN 93 100 86

112

In experiment II, KNN and EC outperform the other two classifiers. UV wavelength range

provides high accuracies as compared to IR and visible against almost all of the datasets

except H1. In the case of UV Coif1, Sym5 and Rbio1.3 provided maximum accuracies

against humidity, temperature and moisture affected SPP respectively. An accuracy of

100% achieved against all three datasets of moisture affected SPPs. While Rbio2.8, Rbio2.2

and Rbio3.5 resulted in high accuracies against humidity, temperature and moisture

affected SPP respectively for IR. Similarly, Bior2.4, Rbio3.3 and Rbio3.7 provide

maximum accuracies for the spectrums lies under visible wavelength range. Results for

sensitivity and specificity were almost in the same trend that were of accuracy for all

datasets. Table 5.8 describes the complete results of the second experiment. The

comparison between accuracies and sensitivities achieved from the experiment 2 are shown

in Figure 5.9 and Figure 5.10 correspondingly.

Figure 5.9: Comparison of accuracies against UV, IR and visible for experiment II

0

20

40

60

80

100

120

H1 H2 H3 T1 T2 T3 M1 M2 M3

AC

CU

RA

CIE

S

DATASETS

COMPARISON OF ACCURACIES AGAINST UV,

IR AND VISIBLE FOR EXPERIMENT II

UV IR Visible

113

Figure 5.10: Comparison of sensitivity against UV, IR and visible for experiment II

The results of the experiment III are shown in Table 5.9. In experiment III, we have tested

the selected wavelet functions from experiment II on the combined datasets. The results

show that in case of UV wavelengths, Coif1 wavelet function outperforms using EC

classifier. In case of overall combined dataset HTM, an accuracy of 94% achieved using

UV data. KNN with Rbio3.5 provides maximum accuracies for IR data and KNN in

combination with Rbio3.3 for visible data. These results are also represented in Figure 5.11.

0

20

40

60

80

100

120

H1 H2 H3 T1 T2 T3 M1 M2 M3

SE

NS

TIV

ITY

DATASETS

COMPARISON OF SENSITIVITIES AGAINST

UV, IR AND VISIBLE FOR EXPERIMENT II

UV IR Visible

114

Figure 5.11: Comparison of accuracies against UV, IR and visible for experiment III

0

20

40

60

80

100

H1,H2,H3 T1,T2,T3 M1,M2,M3 HTM

AC

CU

RA

CIE

S

DATASETS

ACCURACIES FOR COMBINED DATASETS

UV IR Visible

115

Table 5.8: Results achieved by experiment II

Environmental


No. of

Features

Wavelet

Parameter

Classifier Accuracy Sensitivity Specificity

Wavelet

Function Level

Humidity

H1

UV 26 Coif1 2 KNN

77 71 83

H2 93 100 86

H3 100 100 100

H1

IR 115 Rbio2.8 3 KNN

69 86 50

H2 93 86 100

H3 86 86 86

H1

Visible 23 Bior2.4 3 KNN

69 71 67

H2 93 86 100

H3 57 86 29

116

Environmental


No. of

Features

Wavelet

Parameter


Wavelet

Function Level

Temperature

T1

UV 29 Sym5 2 EC

93 100 86

T2 86 100 71

T3 86 100 71

T1


86 100 71

T2 79 71 86

T3 79 71 86

T1

Visible 21 Rbio3.3 3 KNN

86 100 71

T2 79 86 71

T3 64 86 43

117

Environmental


No. of

Features

Wavelet

Parameter


Wavelet

Function Level

Moisture

M1

UV 15 Rbio1.3 3 EC

100 100 100

M2 100 100 100

M3 100 100 100

M1

IR 109 Rbio3.5 3 EC

86 86 86

M2 93 100 86

M3 93 100 86

M1


64 71 57

M2 86 86 86

M3 93 100 86

118

Table 5.9: Results against combined datasets using experiment III

Dataset Wavelength No. of

Features

Wavelet

Parameter


Wavelet

Function Level

H1,H2,H3

UV 26 Coif1 2 EC

81 75 88

T1,T2,T3 81 75 88

M1,M2,M3 94 88 100

HTM 94 100 88

H1,H2,H3


81 75 88

T1,T2,T3 56 75 38

M1,M2,M3 88 88 88

HTM 75 75 75

119

Dataset Wavelength No. of

Features

Wavelet

Parameter


Wavelet

Function Level

H1,H2,H3


56 63 50

T1,T2,T3 69 88 50

M1,M2,M3 75 88 63

HTM 63 100 25

120

5.3 Hybrid

In this research, we have proposed two different methodologies for the classification of

defective and non-defective SPPs. One is based on MI and the other one is MA. From the

results of imaging based approach (Chapter 3), we can conclude that statistical textural

features extracted using different feature extraction techniques are beneficial for the

classification of the different environment affected SPPs. Top two from overall 281 features

and top 15 features extracted either from CS or GR results in better accuracies against

different datasets. The results also describe that the use of MI approach provides promising

results in the case of individual environment affected dataset. However, the results against

combined datasets are lower than those of the individual datasets. While in case of MA

based approach (Chapter 4), the use of UV wavelength spectra for the analysis of SPPs is

more suitable than the other two wavelengths. WC extracted from UV data proves very

useful for the classification of SPPs lying in individual datasets. The results of the MA

approach also concludes that the results achieved against combined datasets are better than

the results of MI approach. Both of the proposed approaches perform differently against

different environment affected SPPs.

The main purpose of this research is to propose an approach that can produce accurate

results for individual datasets as well as for combined datasets. The combination of both of

the proposed approaches (spatial and spectral) may produce results that are more accurate.

For the experimentation of this hybrid approach, we have selected data of such SPPs that

were present in both MI and MA. Therefore, we have formed nine new datasets related to

all three environmental factors based on the SPPs that lie in both of the analysis. Now each

new dataset consists of 30 SPPs, from which 15 SPPs were defective and 15 were non-

defective.

121

Figure 5.12 explains the complete flow of the hybrid approach. The whole processing of

the hybrid approach consists of two parts. In the first part of the processing, the statistical-

textural features are extracted from the preprocessed microscopic image of the SPP. The

feature reduction techniques applied to the extracted feature vector finalizes the feature set.

In the second part of the hybrid approach, spectrum acquisition is performed for the same

SPP. After that, the preprocessing phase divides the raw input spectrum into UV, IR and

visible wavelength.

Figure 5.12: Basic flow of the hybrid approach

Combined Feature Vector

Feature Vector

Enhanced Image

Gray Image

SPP Image

SPP

Image Acquisition Spectrum Acquisition

Contrast

Enhancement Preprocessing

Feature Extraction

Feature Extraction Feature Reduction

Classification

Grayscale

Conversion SPP Raw Spectra

Reduced

Feature Vector Feature Vector

IR/UV/Visible

Spectra

DSPP NSPP

122

Feature extraction phase results in a feature vector based on WCs extracted from each

wavelength. These two extracted feature vectors finally became the input to the

classification phase, which will classify the input SPP into one of the two defined classes

(DSPP and NSPP).

In this part of the research, we have combined top two statistical textural features extracted

from the MI approach with WCs extracted from UV wavelength. Similarly, these WCs are

also combined with top 15 statistical textural features, extracted from CS and GR. We have

selected WCs extracted using coif1 (level 2), rbio1.3 (level 3), and sym5 (level 2. All of

these wavelet functions and levels are selected by analyzing the results of second

experiment of the MA approach. Hence, each of the three feature vectors chosen from the

MI approach are combined with three feature vectors selected from the MA approach. This

provides nine feature vectors for further processing.

We have evaluated the accuracy of the proposed hybrid approach using all of these feature

vectors. The analysis of the results shows that WC calculated using rbio1.3 combined with

CS top fifteen features provides higher accuracies for individual, as well as combined

datasets. These results are shown in Table 5.10. This table shows the comparison of the

accuracies, sensitivities and specificities achieved from each of the four classifiers in order

to test the performance of the proposed hybrid approach on individual datasets. The

analysis of the results describes that the proposed approach gives maximum results for all

of the individual datasets using KNN as classifier except M2. All of the achieved accuracies

are greater than or equals to 92%. This can be seen from Figure 5.13. In case of M2, SVM

provides maximum accuracy that is of 100%.

123

Table 5.10: Test accuracies using hybrid approach for individual datasets

Datasets

No. of

Features

SVM KNN NB EC


H1 33 67 100 0 92 88 100 67 63 75 92 100 75

H2 33 79 88 67 86 100 67 71 88 50 79 88 67

H3 33 79 88 67 93 100 83 64 63 67 79 88 67

T1 33 93 88 100 100 100 100 57 75 33 93 88 100

T2 33 87 88 86 93 100 86 73 75 71 87 88 86

T3 33 100 100 100 100 100 100 76 75 78 100 100 100

M1 33 100 100 100 100 100 100 79 75 83 100 100 100

M2 33 100 100 100 93 100 86 80 75 86 93 100 86

M3 33 87 88 86 93 88 100 73 75 71 87 88 86

124

Figure 5.13: HV results using hybrid approach for individual datasets

After evaluating the performance of the proposed hybrid approach on individual datasets,

we have tested it for the combined datasets. The results achieved by the use of these

combined datasets are also very promising. Once again, these datasets consist of features,

which are the combination of CS top fifteen and WC with a wavelet function rbio1.3. These

results are shown in Table 5.11.

Here again, accuracies achieved through KNN are higher than all others. For a combined

dataset of humidity, maximum 81% accuracy with 88% sensitivity and 75% specificity

achieved. Similarly, for Temperature and moisture 100% accuracy achieved. In the case of

completely combined datasets, the one that is a combination of SPPs affected by all

environmental factors, 100% accuracy achieved through EC and 94% through NB. These

results are also shown in Figure 5.14.

If we compare the results achieved from both of the proposed approaches and their

combination, we can see that hybrid approach provides more accurate results, with high

sensitivity and specificity.

0

20

40

60

80

100

120

H1 H2 H3 T1 T2 T3 M1 M2 M3

AC

CU

RA

CIE

S

DATASETS

HV RESULTS USING HYBRID APPROACH FOR

INDIVIDUAL DATASETS

SVM KNN NB EC

125

Table 5.11: Test accuracies using hybrid approach for combined datasets

Datasets No. of

Features

SVM KNN NB EC


H1,H2,H3 33 81 100 63 81 88 75 69 63 75 81 88 75

T1,T2,T3 33 81 88 75 100 100 100 69 88 50 81 88 75

M1,M2,M3 33 88 88 88 100 100 100 75 75 75 100 100 100

HTM 33 88 100 75 94 88 100 81 75 88 100 100 100

126

Figure 5.14: Test accuracies using hybrid approach for combined datasets

Our focus in this research is to develop an approach that can accurately classify any of the

environmentally affected SPPs. For this purpose, we have proposed two approaches and

then a hybrid version of the both. Table 5.12 provides a comparison of the highest

accuracies achieved using all three approaches when used against combined datasets. The

analysis of the results shows that in case of MI approach the features extracted using GR

are more promising for the classification of SPPs in comparison with other feature sets used

in that experiment. While in case of MA, the wavelet function coif1 provides maximum

accuracies when used with UV data. The results achieved with this approach are better than

the previous approach. In the last when the hybrid approach was used it provides more

accurate results than the other two against all datasets. These results are also shown in

Figure 5.15.

0

20

40

60

80

100

120

H1,H2,H3 T1,T2,T3 M1,M2,M3 HTM

AC

CU

RA

CIE

S

DATASETS

HV ACCURACIES USING HYBRID

APPROACH FOR COMBINED

DATASETS

SVM KNN NB EC

127

Table 5.12: Comparison of highest accuracies achieved using MI, MA and hybrid approaches

Dataset Features No. of

Features Approach

SVM KNN NB EC


H1,H2,H3

GR 15 MI

56 100 13 50 63 38 56 75 38 50 88 13

T1,T2,T3 63 75 50 88 88 88 44 75 13 75 88 63

M1,M2,M3 75 75 75 81 75 88 75 63 88 88 88 88

HTM 44 88 0 75 75 75 50 63 38 50 63 38

H1,H2,H3 WC

Wavelet

Function =

coif1) and

Level = 2

28 MA

69 75 63 69 88 50 56 13 100 81 75 88

T1,T2,T3 75 75 75 88 100 75 81 75 88 81 75 88

M1,M2,M3 94 88 100 100 100 100 81 75 88 94 88 100

HTM 81 88 75 94 100 88 81 88 75 94 100 88

128

Dataset Features No. of

Features Approach

SVM KNN NB EC


H1,H2,H3

Chi + WC

(Wavelet

Function =

rbio1.3) and

Level = 3

33 Hybrid (H)

81 100 63 81 88 75 69 63 75 81 88 75

T1,T2,T3 81 88 75 100 100 100 69 88 50 81 88 75

M1,M2,M3 88 88 88 100 100 100 75 75 75 100 100 100

HTM 88 100 75 94 88 100 81 75 88 100 100 100

129

Figure 5.15: Comparison of accuracies achieved using all three approaches

CHAPTER NO. 6

CONCLUSION AND FUTURE

RECOMMENDATIONS

130

In this research, we have proposed a new approach for the classification of defective and

non-defective SPPs using image processing, signal processing and machine learning

techniques. We have proposed two approaches for this purpose and then tested their

performances independently and in a hybrid manner. We have considered three

environmental factors that can affect the surface of the SPPs. These factors are humidity,

temperature and moisture. The performances of the proposed approaches were also tested

using independent datasets individually, as well as in combined manner.

The first proposed approach is based on microscopic imaging that uses textural features

extracted from the surface of the preprocessed images. A comparison analysis is performed

using all 281, top 15 (extracted using CS, GR and RF) and top 2 features. Classification is

performed using SVM, KNN, NB and EC classifiers. Analysis shows that higher accuracies

are achieved on moisture-affected SPPs as moisture has a quick reaction with the APIs of

the SPP. In different types of experiments, the proposed approach using SVM for most of

the individual datasets is better than the other classifiers. In case of combined datasets, GR

provides more accurate results but still its performance is not promising.

In second proposed approach, we have used wavelet transformations along with Machine

Learning algorithms on Multispectral data for the classification of the environment affected

solid pharmaceutical products. A correct choice of wavelet function and decomposition

level provides promising results in the required area. In our case, two or three level

decomposition of multispectral data leads towards better results. Again, four different

classifiers are used to test the effectiveness of the wavelet coefficients in the proposed

approach. Comparison tables show that KNN and EC are more suitable as compared to NB

and SVM. We have also compared the use of WC extracted from MA of UV, IR and visible

wavelengths and results describe that UV wavelength is more suitable for the classification

131

of defective and non-defective SPPs. In case of combined datasets, UV data provides more

accurate results than Microscopic imaging based approach.

In the last, we have tested a hybrid approach based on both imaging and spectral data. The

comparison of results shows that this hybrid approach provides maximum accuracies

against combined dataset. An accuracy of 81% is achieved against combined datasets of

SPPs affected by humidity. Similarly, for temperature and moisture affected SPPs an

accuracy of 100% is achieved. The proposed hybrid approach provides 94% accuracy when

used against overall combined dataset named HTM.

The proposed approach provides an accurate solution for the classification of defective and

non-defective solid pharmaceutical products. However, there are some aspects which can

also be dealt with in future. Currently, we are dealing with three environmental factors i.e.

humidity, temperature and moisture. The analysis can be extended on some more factors

like pressure, light, etc. Secondly, the spectral analysis can also be extended on semisolid

pharmaceutical products. Thirdly, both of the proposed approaches are based on the

analysis of single point data acquired from the input product. In future, this can be extended

towards the analysis based on multi-point data, which can produce results that will be more

accurate.

132

References

1 Sahoo, P. K., Pharmaceutical technology - tablets. 2007, Delhi Institute of

Pharmaceutical Sciences and Research: New Delhi.

2 Harbir, K. Processing technologies for pharmaceutical tablets- a review.

International Research Journal of Pharmacy: (2012).

3 Mahato, R. I., Dosage forms and drug delivery systems. In the apha complete

review for pharmacy, Castle Connolly Graduate Medical Publishing, New York.

2007, (2007)

4 Christian, L., Collins, L., Kiatgrajai, M., Merle, A., Mukherji, N.and Quade, A.,

The problem of substandard medicines in substandard countries. 2012, Workshop

in International Public Affairs.

5 Clift, C., Combating counterfeit, falsified and substandard medicines. 2010, Centre

on Global Health Security.

6 WHO. Medicines: Spurious/falsely-labelled/ falsified/counterfeit (SFFC)

medicines. Fact sheet n°275. World Health Organization 2012 May; Available

from: http://www.who.int/mediacentre/factsheets/fs275/en/.

7 WHO, Counterfeit drugs. 1999, Department of Essential Drugs and Other

Medicines: Geneva, Switzerland.

8 Islam, S. M. A., Hossain, M. A., Kabir, A. N. M. H., Kabir, S.and Hossain, M. K.

Study of moisture absorption by ranitidine hydrochloride: Effect of % rh,

excipients, dosage forms and packing materials. Dhaka University Journal of

Pharmaceutical Sciences, 7 (1): 59-64 (2008).

9 Szakonyi, G. and Zelkó, R. The effect of water on the solid state characteristics of

pharmaceutical excipients: Molecular mechanisms, measurement techniques, and

quality aspects of final dosage form. International Journal of Pharmaceutical

Investigation, 2 (1): 18 (2012).

10 Antikainen, J., New techniques for spectral image acquisition and analysis. 2012,

PhD diss., PhD thesis publications of the university of eastern finland dissertations

in forestry and natural sciences: Finland.

11 NASA, The electromagnetic spectrum. 2013, NASA.

http://www.who.int/mediacentre/factsheets/fs275/en/

133

12 Gowen, A. A., O'Donnel, C., Cullen, P.and Bell, S. E. J. Recent applications of

chemical imaging to pharmaceutical process monitoring and quality control.

European Journal of Pharmaceutics and Biopharmaceutics, 69 (1): 10-22 (2008).

13 Pathuri, R., Muthukumaran, M., Krishnamoorthy, B.and Nishat, A. A review on

analytical method development and validation of pharmaceutical technology.

Current Pharma Research, 3 (2): 855-870 (2013).

14 Ferrer, I. and Thurman, E. M. Analysis of 100 pharmaceuticals and their degradates

in water samples by liquid chromatography/quadrupole time-of-flight mass

spectrometry. Journal of Chromatography A, 1259 148-157 (2012).

15 Deisingh, A. K. Pharmaceutical counterfeiting. The Analyst, 130 (3): 271-279

(2005).

16 Perkinelmer, Using Near-IR spectroscopy to better understand tablet uniformity and

properties. 2003.

17 Holzgrabe, U., Diehl, B. W. K.and Wawer, I. NMR spectroscopy in pharmacy.

Journal of Pharmaceutical and Biomedical Analysis, 17 (4): 557-616 (1998).

18 ThermoScientific, Nir spectroscopy for pharmaceutical analysis. 2010.

19 Vogeser, M., Kobold, U.and Seidel, D. Mass spectrometry in medicine- the role of

molecular analysis. Dtsch Arztebl, 104 (31-32): 2194-200 (2007).

20 Görög, S. Identification in drug quality control and drug research. TrAC Trends in

Analytical Chemistry: (2015).

21 Nuhu, A. A. Recent analytical approaches to counterfeit drug detection. Journal of

Applied Pharmaceutical Science, 1 (5): 06-13 (2011).

22 Siddiqui, M. R., AlOthman, Z. A.and Rahman, N. Analytical techniques in

pharmaceutical analysis: A review. Arabian Journal of Chemistry: (3013).

23 Sacré, P.-Y., Deconinck, E., Beer, T. D., Courselle, P., Vancauwenberghe, R.,

Chiap, P., Crommen, J.and Beer, J. O. D. Comparison and combination of

spectroscopic techniques for the detection of counterfeit medicines. Journal of

pharmaceutical and biomedical analysis, 53 (3): 445-453 (2010).

24 Kazeminy, A., Hashemi, S., Williams, R. L., Ritchie, G. E., Rubinovitz, R.and Sen,

S. A comparison of near infrared method development approaches using a drug

134

product on different spectrophotometers and chemometric software algorithms.

Journal of Near Infrared Spectroscopy, 17 (5): 233 (2009).

25 Buice, J., G., R., Gold, T. B., Lodder, R. A.and Digenis, G. A. Determination of

moisture in intact gelatin capsules by near-infrared spectrophotometry.

Pharmaceutical Research, 12 (1): 161-163 (1995).

26 Morisseau, K. M. and Rhodes, C. T. Near-infrared spectroscopy as a nondestructive

alternative to conventional tablet hardness testing. Pharmaceutical Research 14 (1):

108-111 (1997).

27 Roggo, Y., Chalus, P., Maurer, L., Leme-Martinez, C., Edmond, A.and Jent, N. A

review of near infrared spectroscopy and chemometrics in pharmaceutical

technologies. Journal of Pharmaceutical and Biomedical Analysis, 44 (3): 683-700

(2007).

28 Barnes, T. J., Kempson, I. M.and Prestidge, C. A. Surface analysis for

compositional, chemical and structural imaging in pharmaceutics with mass

spectrometry: A tof-sims perspective. International journal of pharmaceutics 417

(1): 61-69 (2011).

29 Culzoni, M. J., Dwivedi, P., Green, M. D., Newton, P. N.and Fernández, F. M.

Ambient mass spectrometry technologies for the detection of falsified drugs.

MedChemComm, 5 (1): 9-19 (2014).

30 Chen, H., Talaty, N. N., Takáts, Z.and Cooks, R. G. Desorption electrospray

ionization mass spectrometry for high-throughput analysis of pharmaceutical

samples in the ambient environment. Analytical Chemistry, 77 (21): 6915-6927

(2005).

31 Holzgrabe, U. Quantitative nmr spectroscopy in pharmaceutical applications. Prog

NMR Spectrosc, 57: 229-240 (2010).

32 Holzgrabe, U., Deubner, R., Schollmayer, C.and Waibel, B. Quantitative nmr

spectroscopy—applications in drug analysis. Journal of pharmaceutical and

biomedical analysis, 38 (5): 806-812 (2005).

33 Holzgrabe, U. and Malet-Martino, M. Analytical challenges in drug counterfeiting

and falsification—the nmr approach. Journal of pharmaceutical and biomedical

analysis, 55 (4): 679-687 (2011).

135

34 Malet-Martino, M. and Holzgrabe, U. NMR techniques in biomedical and

pharmaceutical analysis. Journal of pharmaceutical and biomedical analysis, 55

(1): 1-15 (2011).

35 Croker, D. M., Hennigan, M. C., Maher, A., Hu, Y., Ryder, A. G.and Hodnett, B.

K. A comparative study of the use of powder x-ray diffraction, raman and near

infrared spectroscopy for quantification of binary polymorphic mixtures of

piracetam Journal of pharmaceutical and biomedical analysis 63: 80-86 (2012).

36 Maurin, J. K., Pluciński, F., Mazurek, A. P.and Fijałek, Z. The usefulness of simple

x-ray powder diffraction analysis for counterfeit control—the viagra® example.

Journal of pharmaceutical and biomedical analysis, 43 (4): 1514-1518 (2007).

37 Scoutaris, N., Vithani, K., Slipper, I., Chowdhry, B.and Douroumis, D. Sem/edx

and confocal raman microscopy as complementary tools for the characterization of

pharmaceutical tablets. International journal of pharmaceutics, 470 (1): 88-98

(2014).

38 Klang, V., Valenta, C.and Matsko, N. B. Electron microscopy of pharmaceutical

systems. Micron, 44 45-74 (2013).

39 Ruotsalainen, M., Heinämäki, J., Guo, H., Laitinen, N.and Yliruusi, J. A novel

technique for imaging film coating defects in the film-core interface and surface of

coated tablets. European journal of pharmaceutics and biopharmaceutics, 56. (3):

381-388 (2003).

40 Gendrin, C., Roggo, Y.and Collet, C. Pharmaceutical applications of vibrational

chemical imaging and chemometrics: A review. Journal of pharmaceutical and


41 De Beer, T., Anneleen Burggraeve, Margot Fonteyne, Saerens, L., Remon, J. P.and

Vervaet, C. Near infrared and raman spectroscopy for the in-process monitoring of

pharmaceutical production processes. International journal of pharmaceutics, 417

(1): 32-47 (2011).

42 Luypaert, J., Massart, D. L.and Heyden, Y. V. Near-infrared spectroscopy

applications in pharmaceutical analysis. Talanta, 72 (3): 865-883 (2007).

136

43 Jamrógiewicz, M. Application of the near-infrared spectroscopy in the

pharmaceutical technology. Journal of Pharmaceutical and Biomedical Analysis,

66: 1-10 (2012).

44 Luukkonen, P., Fransson, M., Björn, I. N., Hautala, J., Lagerholm, B.and Folestad,

S. Real-time assessment of granule and tablet properties using in-line data from a

high-shear granulation process. J. Pharm. Sci., 97: 950-959 (2008).

45 Chalus, P., Walter, S.and Ulmschneider, M. Combined wavelet transform–artificial

neural network use in tablet active content determination by near-infrared

spectroscopy. Analytica chimica acta, 591 (2): 219-224 (2007).

46 Svensson, O., Abrahamsson, K., Engelbrektsson, J., Nicholas, M., Wikström, H.and

Josefson, M. An evaluation of 2d-wavelet filters for estimation of differences in

textures of of pharmaceutical tablets. Chemometrics and Intelligent Laboratory

Systems, 84: 3–8 (2006 ).

47 Dowell, F. E., Maghirang, E. B., Fernandez, F. M., Newton, P. N.and Green, M. D.

Detecting counterfeit antimalarial tablets by near-infrared spectroscopy. Journal of


48 Bleye, C. D., Chavez, P.-F., Mantanus, J., Marini, R., Hubert, P., Rozet, E.and

Ziemons, E. Critical review of near-infrared spectroscopic methods validations in

pharmaceutical applications. Journal of pharmaceutical and biomedical analysis,

69: 125-132 (2012).

49 Shah, R. G., Patel, N. K.and Pancholi, S. S. Near infrared spectroscopy: An

advanced technique in spectroscopy. Int J Pharm Bioanal Res, 1 (1): 1-11 (2014).

50 Rodionova, O. Y. and Pomerantsev, A. L. NIR-based approach to counterfeit-drug

detection. TrAC Trends in Analytical Chemistry 29 (8): 795-803 (2010).

51 Rodionova, O. Y., Houmøller, L. P., Pomerantsev, A. L., Geladi, P., Burger, J.,

Dorofeyev, V. L.and Arzamastsev, A. P. NIR spectrometry for counterfeit drug

detection: A feasibility study. Analytica Chimica Acta, 549 (1): (2005).

52 Storme-Paris, I., Rebiere, H., Matoga, M., Civade, C., Bonnet, P.-A., Tissier, M.

H.and Chaminade, P. Challenging near infrared spectroscopy discriminating ability

for counterfeit pharmaceuticals detection. Analytica chimica acta 658 (2): 163-174

(2010).

137

53 Vredenbregt, M. J., Blok, L.-T., Hoogerbrugge, R., Barends, D. M.and Kaste, D. D.

Screening suspected counterfeit viagra® and imitations of viagra® with near-

infrared spectroscopy. Journal of Pharmaceutical and Biomedical analysis 40 (4):

840-849 (2006).

54 Candolfi, A., Wu, W., Massart, D. L.and Heuerding, S. Comparison of

classification approaches applied to nir-spectra of clinical study lots. Journal of


55 Clarke, F. Extracting process-related information from pharmaceutical dosage

forms using near infrared microscopy. Vibrational Spectroscopy 34 (1): 25-35

(2004).

56 Bikiaris, D., Koutri, I., Alexiadis, D., Damtsios, A.and Karagiannis, G. Real time

and non-destructive analysis of tablet coating thickness using acoustic microscopy

and infrared diffuse reflectance spectroscopy. International journal of

pharmaceutics, 43 (1): 33-44 (2012).

57 Boiret, M., Gut, Y., Duval, H.and Ginot, Y. M. 2013. Use of near infrared chemical

imaging and 3d visualisation of a pharmaceutical tablet for formulation selection

during drug product development, in NIR 2013 proceedings, Orléans.

58 Van Eerdenbrugh, B. and Taylor, L. S. Application of Mid-IR spectroscopy for the

characterization of pharmaceutical systems. International journal of

pharmaceutics, 417 (1): 3-16 (2011).

59 Reich, G. Near-infrared spectroscopy and imaging: Basic principles and

pharmaceutical applications Advanced drug delivery reviews, 57 (8): 1109-1143

(2005).

60 Blanco, M. and Alcalá, M. Content uniformity and tablet hardness testing of intact

pharmaceutical tablets by near infrared spectroscopy: A contribution to process

analytical technologies. Analytica chimica acta, 557 (1): 353-359 (2006).

61 Sulub, Y., LoBrutto, R., Vivilecchia, R.and Wabuyele, B. W. Content uniformity

determination of pharmaceutical tablets using five near-infrared reflectance

spectrometers: A process analytical technology (pat) approach using robust

multivariate calibration transfer algorithms. Analytica chimica acta, 611 (2): 143-

150 (2008).

138

62 Moes, J. J., Ruijken, M. M., Gout, E., Frijlink, H. W.and Ugwoke, M. I. Application

of process analytical technology in tablet process development using nir

spectroscopy: Blend uniformity, content uniformity and coating thickness

measurements. International journal of pharmaceutics, 357 (1): 108-118 (2008).

63 Li, W., Bashai-Woldu, A., Ballard, J., Johnson, M., Agresta, M., Rasmussen, H.,

Hu, S., Cunningham, J.and Winstead, D. Applications of NIR in early stage

formulation development: Part i. Semi-quantitative blend uniformity and content

uniformity analyses by reflectance nir without calibration models. International

journal of pharmaceutics, 340 (1): 97-103 (2007).

64 Li, W., Bagnol, L., Berman, M., Chiarella, R. A.and Gerber, M. Applications of nir

in early stage formulation development. Part ii. Content uniformity evaluation of

low dose tablets by principal component analysis. International journal of

pharmaceutics, 380 (1): 49-54 (2009).

65 Sulub, Y., LoBrutto, R., Vivilecchia, R.and Wabuyele, B. Near-infrared

multivariate calibration updating using placebo: A content uniformity

determination of pharmaceutical tablets. Vibrational Spectroscopy, 46 (2): 128-134

(2008).

66 Ely, D., Chamarthy, S.and Carvajal, M. T. An investigation into low dose blend

uniformity and segregation determination using nir spectroscopy. Colloids and

Surfaces A: Physicochemical and Engineering Aspects 288 (1): 71-76 (2006).

67 Vankeirsbilck, T., Vercauteren, A., Baeyens, W.and Weken, G. V. d. Applications

of raman spectroscopy in pharmaceutical analysis. trends in analytical chemistry,

21 (12): 869-877 (2002).

68 Vankeirsbilck, T., Vercauteren, A., Baeyens, W., Weken, G. V. d., Verpoort, F.,

Vergote, G.and Remon, J. P. Applications of raman spectroscopy in pharmaceutical

analysis. TrAC trends in analytical chemistry 21 (12): 869-877 (2002).

69 Feng, L., Xinxin, W., Yifeng, C., Yongjian, Y., Yinjia, Y.and Gengli, D. A novel

identification system for counterfeit drugs based on portable raman spectroscopy.

Chemometrics and Intelligent Laboratory Systems: (127) 63-69 (2013).

139

70 Li, Y., Du, G., Cai, W.and Shao, X. Classification and quantitative analysis of

azithromycin tablets by raman spectroscopy and chemometrics. American Journal

of Analytical Chemistry, 2: 135-141 (2011).

71 Romero-Torres, S., Pérez-Ramos, J. D., Morris, K. R.and Grant, E. R. Raman

spectroscopic measurement of tablet-to-tablet coating variability. Journal of


72 Romero-Torres, S., Pérez-Ramos, J. D., Morris, K. R.and Grant, E. R. Raman

spectroscopy for tablet coating thickness quantification and coating characterization

in the presence of strong fluorescent interference. Journal of pharmaceutical and


73 Müller, J., Knop, K., Thies, J., Uerpmann, C.and Kleinebudde, P. Feasability of

raman spectroscopy as pat tool in active coating. Drug development and industrial

pharmacy, 36 (2): 234-243 (2010).

74 Gao, Q., Liu, Y., Li, H., Chen, H., Chai, Y.and Lu, F. Comparison of several

chemometric methods of libraries and classifiers for the analysis of expired drugs

based on raman spectra. Journal of pharmaceutical and biomedical analysis 94:

58-64 (2014).

75 Veij, d. M., Deneckere, A., Vandenabeele, P., Kaste, D. d.and Moens, L. Detection

of counterfeit viagra® with raman spectroscopy. Journal of pharmaceutical and


76 Eliasson, C. and Matousek, P. Noninvasive authentication of pharmaceutical

products through packaging using spatially offset raman spectroscopy. Analytical

chemistry, 79 (4): 1696-1701 (2007).

77 Ricci, C., Eliasson, C., Macleod, N. A., Newton, P. N., Matousek, P.and Kazarian,

S. G. Characterization of genuine and fake artesunate anti-malarial tablets using

fourier transform infrared imaging and spatially offset raman spectroscopy through

blister packs. Analytical and bioanalytical chemistry, 389 (5): 1525-1532 (2007).

78 Zhang, L., Henson, M. J.and Sekulic, S. S. Multivariate data analysis for raman

imaging of a model pharmaceutical tablet. Analytica Chimica Acta, 545 (2): 262-

278 (2005).

140

79 Gordon, K. C. and McGoverin, C. M. Raman mapping of pharmaceuticals.

International journal of pharmaceutics, 417 (1): 151-162 (2011).

80 Hédoux, A., Guinet, Y.and Descamps, M. The contribution of raman spectroscopy

to the analysis of phase transformations in pharmaceutical compounds.

International journal of pharmaceutics 417 (1): 17-31 (2011).

81 Matousek, P. and Parker, A. W., Non-invasive bulk analysis of pharmaceutical

tablets and capsules using the transmission raman method. 2006/2007, Central

Laser Facility Annual Report: Didcot.

82 Strachan, C. J., Rades, T., Gordon, K. C.and Rantanen, J. Raman spectroscopy for

quantitative analysis of pharmaceutical solids. Journal of pharmacy and

pharmacology, 59: 179–192 (2007).

83 O'Connell, M. L., Howley, T., Ryder, A. G., Leger, M. N.and Madden, M. G.

Classification of a target analyte in solid mixtures using principal component

analysis, support vector machines, and raman spectroscopy. In Opto-Ireland: 340-

350 (2005).

84 Andreas, H. and Clemens, A. 2010. Computer-vision based pharmaceutical pill

recognition on mobile phones, in 14th Proceedings of CESCG.

85 Andreas, H., Clemens, A.and Dieter, S. 2010. Instant segmentation and feature

extraction for recognition of simple objects on mobile phones, in Computer Vision

and Pattern Recognition Workshops (CVPRW), IEEE Computer Society

Conference.

86 Ramya, S., Suchitra, J.and Nadesh, R. K. Detection of broken pharmaceutical drugs

using enhanced feature extraction technique. International Journal of Engineering

and Technology, 5 (2): 1407-1411 (2013).

87 Špiclin, Ž., Bukovec, M., Pernuš, F.and Likar, B. Image registration for visual

inspection of imprinted pharmaceutical tablets. Machine Vision and Applications,

22 (1): 197-206 (2011).

88 Bukovec, M., Špiclin, Ž., Pernuš, F.and Likar, B. Automated visual inspection of

imprinted pharmaceutical tablets. Measurement Science and Technology, 18 (9):

2921 (2007).

141

89 Bukovec, M., Spiclin, Z., Pernus, F.and Likar, B. 2007. Geometrical and statistical

visual inspection of imprinted tablets, in MVA2007 IAPR Conference on Machine

Vision Applications, Tokyo: p. 412-415.

90 Tahir, F. and Fahiem, M. A. A statistical-textural-features based approach for

classification of solid drugs using surface microscopic images. Computational and

mathematical methods in medicine, 2014 (2014 ).

91 Možina, M., Tomaževič, D., Pernuš, F.and Likar, B. Automated visual inspection

of imprint quality of pharmaceutical tablets. Machine vision and applications, 24

(1): 63-73 (2013).

92 Lee, Y.-B., Park, U.and Jain, A. K. 2010. Pill-id: Matching and retrieval of drug pill

imprint images, in International Conference on Pattern Recognition, Istanbul: p.

2632-2635.

93 Yu, C.-C., Wen, C.-Y., Lu, C.-P.and Chen, Y.-F. The drug tablet image retrieal

system based on content-based image retrieval. International journal of innovative

computing, information and control, 8 (7(A)): 4497-4508 (2012).

94 Jung, C. R., Ortiz, R. S., Limberger, R.and Mayorga, P. A new methodology for

detection of counterfeit viagra® and cialis® tablets by image processing and

statistical analysis. Forensic science international 216 (1): 92-96 (2012).

95 Gowen, A. A., O’Donnell, C. P., Cullenb, P. J., Downey, G.and Frias, J. M.

Hyperspectral imaging- an emerging process analytical tool for food quality and

safety control. Trends in Food Science & Technology, 18: 590-598 (2007).

96 de Juan, A., Tauler, R., Dyson, R., Marcolli, C., Rault, M.and Maeder, M.

Spectroscopic imaging and chemometrics: A powerful combination for global and

local sample analysis. TrAC Trends in Analytical Chemistry 23 (1): 70-79 (2004).

97 Ravn, C., Skibstedb, E.and Bro, R. Near-infrared chemical imaging (NIR-CI) on

pharmaceutical solid dosage forms - comparing common calibration approaches.

Journal of Pharmaceutical and Biomedical Analysis, 48: 554–561 (2008).

98 Dubois, J., Wolff, J.-C., Warrack, J. K., Schoppelrei, J.and Lewis, E. N. Nir

chemical imaging for counterfeit pharmaceutical products analysis. Spectroscopy,

22 (2): 40-50 (2007).

142

99 Cruz, J. and Blanco, M. Content uniformity studies in tablets by NIR-CI. Journal

of Pharmaceutical and Biomedical Analysis, 56 (2): 408– 412 (2011).

100 Puchert, T., Lochmann, D., Menezes, J. C.and Reich, G. Near-infrared chemical

imaging (NIR-CI) for counterfeit drug identification—a four-stage concept with a

novel approach of data processing (linear image signature). Journal of

Pharmaceutical and Biomedical Analysis 51: 138–145 (2010).

101 Lewis, E. N., Kidder, L. H.and Lee, E. Nir chemical imaging- near-infrared

spectroscopy on steroids. NIR News, 16 (5): (2005).

102 Hamilton, S. J. and Lodder, R. A. Hyperspectral imaging technology for

pharmaceutical analysis. International Society for Optics and Photonics., 4626:

136-147 (2002).

103 Doub, W. H., Adams, W. P., Spencer, J. A., Buhse, L. F.and Nelson, M. P. Raman

chemical imaging for ingredient-specific prticle size characterization of aqueous

suspension nasal spray formulations: A progress report. . Pharmaceutical research,

24 (5): 934-945 (2007).

104 Moor, J. Application of NIR imaging and multivariate data analysis for

pharmaceutical products. 2010.

105 Wu, Z., Tao, O., Dai, X., Du, M., Shi, X.and Qiao, Y. Monitoring of a

pharmaceutical blending process using near infrared. Vibrational Spectroscopy 63

371-379 (2012).

106 Sacré, P.-Y., Bleye, C. D., Chavez, P.-F., Netchacovitch, L., Hubert, P.and

Ziemons, E. Data processing of vibrational chemical imaging for pharmaceutical

applications. Journal of pharmaceutical and biomedical analysis 101: 123-140

(2014).

107 Amigo, J. M. and Ravn, C. Direct quantification and distribution assessment of

major and minor components in pharmaceutical tablets by nir-chemical imaging.

European Journal of Pharmaceutical Sciences, 37 (2): 76-82 (2009).

108 Franch-Lage, F., Amigo, J. M., Skibsted, E., Maspoch, S.and Coello, J. Fast

assessment of the surface distribution of api and excipients in tablets using NIR-

hyperspectral imaging. International journal of pharmaceutics 411 (1): 27-35

(2011).

143

109 Carneiro, R. L. and Poppi, R. J. A quantitative method using near infrared imaging

spectroscopy for determination of surface composition of tablet dosage forms: An

example of spirolactone tablets. Journal of the Brazilian Chemical Society, 23 (8):

1570-1576 (2012).

110 Palou, A., Cruz, J., Blanco, M., Tomàs, J., Ríos, J. d. l.and Alcalà, M. Determination

of drug, excipients and coating distribution in pharmaceutical tablets using nir-ci.

Journal of Pharmaceutical Analysis 2(2): 90-97 (2012).

111 Offroy, M., Roggo, Y.and Duponchel, L. Increasing the spatial resolution of near

infrared chemical images (NIR-CI): The super-resolution paradigm applied to

pharmaceutical products. Chemometrics and Intelligent Laboratory Systems, 117

183-188 (2012).

112 Osorio, J. G., Stuessy, G., Kemeny, G. J.and Muzzio, F. J. Characterization of

pharmaceutical powder blends using in situ near-infrared chemical imaging

Chemical Engineering Science, 108 244-257 (2014).

113 Lyon, R. C., Lester, D. S., Lewis, E. N., Lee, E., Lawrence, X. Y., Jefferson, E.

H.and Hussain, A. S. Near-infrared spectral imaging for quality assurance of

pharmaceutical products: Analysis of tablets to assess powder blend homogeneity.

AAPS PharmSciTech, 3 (3): 1-15 (2002).

114 Lee, E., Huang, W. X., Chen, P., Lewis, E. N.and Vivilecchia, R. V. High-

throughput analysis of pharmaceutical tablet content uniformity by near-infrared

chemical imaging. Spectroscopy, 21 (11): 24 (2006).

115 Westenberger, B. J., Ellison, C. D., Fussner, A. S., Jenney, S., Kolinski, R. E., Lipe,

T. G., Lyon, R. C., Moore, T. W., Revelle, L. K., Smith, A. P., Spencer, J. A.and

Story, K. D. Quality assessment of internet pharmaceutical products using

traditional and non-traditional analytical techniques. International journal of

pharmaceutics, 306 (1): 56-70 (2005).

116 Gendrin, C., Roggo, Y.and Collet, C. Content uniformity of pharmaceutical solid

dosage forms by near infrared hyperspectral imaging: A feasibility study. Talanta,

73 (4): 733-741 (2007).

117 Li, W., Woldu, A., Kelly, R., McCool, J., Bruce, R., Rasmussen, H., Cunningham,

J.and Winstead, D. Measurement of drug agglomerates in powder blending

144

simulation samples by near infrared chemical imaging. International journal of

pharmaceutics, 350 (1): 369-373 (2008).

118 Hilden, L. R., Pommier, C. J., Badawy, S. I. F.and Friedman, E. M. Nir chemical

imaging to guide/support bms-561389 tablet formulation development.

International journal of pharmaceutics, 353 (1): 283-290 (2008).

119 Sasic, S. Raman mapping of low-content api pharmaceutical formulations. I.

Mapping of alprazolam and alprazolam/xanax tablets. Pharmaceutical Research,

24 (1): 58-65 (2007).

120 Bell, S. E. J., Beattie, J. R., McGarvey, J. J., Peters, K. L., Sirimuthu, N. M. S.and

Speers, S. J. Development of sampling methods for raman analysis of solid forms

of therapeutic and illicit drugs. Journal of Raman Spectroscopy, 35 (5): 409–417

(2004).

121 Vidal, M. and Amigo, J. M. Pre-processing of hyperspectral images. Essential steps

before image analysis. Chemometrics and Intelligent Laboratory Systems, 117:

138-148 (2012).

122 Šašić, S. Chemical imaging of pharmaceutical granules by raman global

illumination and near-infrared mapping platforms. Analytica chimica acta, 611 (1):

73-79 (2008).

123 Shippert, P. Introduction to hyperspectral image analysis. Online Journal of Space

Communication, 3: (2003).

124 Vagni, F., Survey of hyperspectral and multispectral imaging technologies. 2007,

Research and Technology Organization.

125 Malik, I., Poonacha, M., Moses, J.and Lodder, R. A. Multispectral imaging of

tablets in blister packaging. AAPS PharmSciTech, 2 (2): 38-44 (2001).

126 Nippolainen, E., Fauch, L., Miridonov, S.and Kamshilin, A. A. Novel multispectral

imaging system for pharmaceutical applications. Pacific Science Review, 12 (2):

203~207 (2011).

127 Tahir, F., Fahiem, M. A., Tauseef, H.and Farhan, S. A survey of multispectral high

resolution imaging based drug surface morphology validation techniques. Life

Science Journal, 10 (7s): 1050-1059 (2013).

145

128 Cullen, P., Edelman, G. J., Van Leeuwen, T. G., Aalders, M. C.and Gaston, E.

Hyperspectral imaging for non-contact analysis of forensic traces.: (2012).

129 Ramchandra, A. Filters a image enhancement and smoothing techniques. Paripex -

Indian Journal Of Research, 2 (7): 31-33 (2013).

130 Efstathiou, C. E., Signal smoothing algorithms. chem.

131 Rinnan, Å., Berg, F. v. d.and Engelsen, S. B. Review of the most common pre-

processing techniques for near-infrared spectra. TrAC Trends in Analytical

Chemistry 28 (10): 1201-1222 (2009).

132 Luo, J., Ying, K., He, P.and Bai, J. Properties of savitzky–golay digital

differentiators. Digital Signal Processing, 15: 122–136 (2005).

133 Jain, A. K., Fundamentals of digital image processing, Englewood Cliffs: prentice-

Hall. 1989, (1989)

134 Maini, R. and Aggarwal, H. A comprehensive review of image enhancement

techniques Journal of Computing, 2 (3): 8-13 (2010).

135 Pohl, C. and Van Genderen, J. L. Review article multisensor image fusion in remote

sensing: Concepts, methods and applications. International journal of remote

sensing, 19 (5): 823-854 (1998).

136 Pal, N. R. and Pal, S. K. A review on image segmentation techniques. Pattern

recognition, 26 (9): 1277-1294 (1993).

137 Shakti, S. Comparative study of various image segmentation methods. International

Journal in Multidisciplinary and Academic Research (SSIJMAR), 2 (3): (2013).

138 Nixon, M. and Aguado, A. S., Feature extraction & image processing, Replika

Press Pvt Ltd, Delhi. 2002, (2002)

139 Wechsler, H. Texture analysis—a survey. Signal Processing, 2 (3): 271-282

(1980).

140 Zhang, J. and Tan, T. Brief review of invariant texture analysis methods. Pattern

recognition, 35 (3): 735-747 (2002).

141 Reed, T. R. and DuBuf, J. H. A review of recent texture segmentation and feature

extraction techniques. CVGIP: Image understanding, 57 (3): 359-372 (1993).

146

142 Xie, X. A review of recent advances in surface defect detection using texture

analysis techniques. Electronic Letters on Computer Vision and Image Analysis, 7

(3): 1-22 (2008).

143 Chen, Y. Q., Nixon, M. S.and Thomas, D. W. Statistical geometrical features for

texture classification. Pattern Recognition, 28 (4): 537-552 (1995).

144 Bharati, M. H., Liu, J. J.and MacGregor, J. F. Image texture analysis: Methods and

comparisons. Chemometrics and intelligent laboratory systems, 72 (1): 57-71

(2004).

145 Srinivasan, G. N. and G, S. Statistical texture analysis. Proceedings Of World

Academy Of Science, Engineering And Technology, 36: 1264-1269 (2008).

146 Aioanei, S., Kurani, A., Xu, D.-H.and Undergraduates, C. T. I., Texture analysis for

computed tomography studies. 2002, DePaul University.

147 Materka, A. and Strzelecki, M., Texture analysis methods–a review. 1998,

Technical university of lodz, institute of electronics, COST B11 report, : Brussels.

p. 9-11.

148 Cai, T. T., Zhang, D.and Ben-Amotz, D. Enhanced chemical classification of raman

images using multiresolution wavelet transformation. Applied spectroscopy, 55 (9):

1124-1130 (2001).

149 Li, P., Du, Guorong, Cai, W.and Shao, X. Rapid and nondestructive analysis of

pharmaceutical products using near-infrared diffuse reflectance spectroscopy.

Journal of pharmaceutical and biomedical analysis, 70: 288-294 (2012).

150 Saeys, Y., Inza, I.and Larrañaga, P. A review of feature selection techniques in

bioinformatics. bioinformatics 23 (19): 2507-2517 (2007).

151 Novaković, J., Strbac, P.and Bulatović, D. Toward optimal feature selection using

ranking methods and classification algorithms. The Yugoslav Journal of Operations

Research, 21 (1): 2334-6043 (2011).

152 Hall, M. A. and Smith, L. A. 1998. Practical feature subset selection for machine

learning, in Proceedings of the 21st Australian Computer Science Conference.

153 Holte, R. C. Very simple classification rules perform well on most commonly used

datasets. Machine learning, 11 (1): 63-90 (1993).

147

154 Chatcharaporn, K., Kittidachanupap, N., Kerdprasop, K.and Kerdprasop, N.,

Comparison of feature selection and classification algorithms for restaurant

dataset classification.

155 Jorgensen, A., Clustering excipient near infrared spectra using different

chemometric methods. 2000, Pharmaceutical Technology Division Department of

Pharma University of Helsinki: Helsinki.

156 O'Connell, M.-L., Howley, T., Ryder, A. G., Leger, M. N.and Madden, M. G.

Classification of a target analyte in solid mixtures using principal component

analysis, support vector machines, and raman spectroscopy. International Society

for Optics and Photonics: 340-350 (2005).

157 Rajalahti, T. and Kvalheim, O. M. Multivariate data analysis in pharmaceutics: A

tutorial review. International journal of pharmaceutics 417 (1): 280-290 (2011).

158 Van der Meer, F. The effectiveness of spectral similarity measures for the analysis

of hyperspectral imagery. International journal of applied earth observation and

geoinformation, 8 (1): 3-17 (2006).

159 Sedgwick, P. Pearson’s correlation coefficien. BMJ, 345: (2012).

160 Hall, G. Pearson’s correlation coefficient. 2015. Available from:

http://www.hep.ph.ic.ac.uk/~hallg/UG_2015/Pearsons.pdf.

161 Massart, D. L., Vandeginste, B. G. M., Deming, S. N., Michotte, Y.and Kaufman,

L., Chemometrics: A textbook, Elsevier Science Publishers B.V, Netherlands.

1988, (1988)

162 Wang, L., Zhang, Y.and Feng, J. On the euclidean distance of images. Pattern

Analysis and Machine Intelligence, IEEE Transactions, 27 (8): 1334-1339. (2005).

163 Sugiyama, M. Advanced data analysis: K-mean clustering. Available from:

http://www.google.com.tr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=4&ved=

0CC8QFjAD&url=http%3A%2F%2Fwww.cis.upenn.edu%2F~cis519%2Ffall201

4%2Flectures%2F13_Unsupervised%2520Learning.pdf&ei=T3JYVY2pIubIsQSll

4DoBA&usg=AFQjCNE1EMrs5__RPisNIwZhwIdkEG6MuQ&bvm=bv.9.

164 Mirkin, B., Core concepts in data analysis: Summarization, correlation and

visualization: Summarization, correlation and visualization, Springer Science &

Business Media. 2011, (2011)

http://www.hep.ph.ic.ac.uk/~hallg/UG_2015/Pearsons.pdf

http://www.google.com.tr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=4&ved=0CC8QFjAD&url=http%3A%2F%2Fwww.cis.upenn.edu%2F~cis519%2Ffall2014%2Flectures%2F13_Unsupervised%2520Learning.pdf&ei=T3JYVY2pIubIsQSll4DoBA&usg=AFQjCNE1EMrs5__RPisNIwZhwIdkEG6MuQ&bvm=bv.9




148

165 Chevallier, S., Bertrand, D., Kohler, A.and Courcoux, P. Application of pls‐da in

multivariate image analysis. Journal of Chemometrics 20 (5): 221-229 (2006).

166 Nikam, S. S. A comparative study of classification techniques in data mining

algorithms. Orient.J. Comp. Sci. and Technol, 8 (1):

167 Gupta, M. and Aggarwal, N. 2010. Classification techniques analysis in In

NCCI2010-National Conference on Computational Instrumentation, CSIO,

Chandigarh, India.

168 Kotsiantis, S. B., Zaharakis, I. D.and Pintelas, P. E. Machine learning: A review of

classification and combining techniques. Artificial Intelligence Review, 26 (3):

159-190 (2006).

169 Wu, W., Walczak, B., Massart, D. L., Erni, S. H., F., Last, I. R.and Prebble, K. A.

Artificial neural networks in classification of nir spectral data: Design of the

training set. Chemometrics and intelligent laboratory systems, 33 (1): 35-46

(1996).

170 Kotsiantis, S. B. Supervised machine learning: A review of classification

techniques. Informatica (03505596), 31 (3): 249-268 (2007).

171 Li, D., Chen, L., Li, Y., Tian, S., Sun, H.and Hou, T. Admet evaluation in drug

discovery. 13. Development of in silico prediction models for p-glycoprotein

substrates. Molecular pharmaceutics, 11 (3): 716-726 (2014).

172 Sheng, T., Wang, J., Li, Y.and Xu, X. Drug-likeness analysis of traditional chinese

medicines: Prediction of drug-likeness using machine learning approaches.

Molecular pharmaceutics, 9 (10): 2875-2886 (2012).

173 Lei, C., Li, Y., Zhao, Q., Peng, H.and Hou, T. Adme evaluation in drug discovery.

10. Predictions of p-glycoprotein inhibitors using recursive partitioning and naive

bayesian classification techniques. Molecular pharmaceutics, 8 (3): 2011 (889-

900).

174 Wang, S., Li, Youyong, Wang, J., Chen, L., Zhang, L., Yu, H.and Hou, T. Admet

evaluation in drug discovery. 12. Development of binary classification models for

prediction of herg potassium channel blockage. Molecular pharmaceutics 9(4):

996-1010 (2012).

149

175 Phyu, T. N. 2009. Survey of classification techniques in data mining, in In

Proceedings of the International MultiConference of Engineers and Computer

Scientists, Hong Kong: p. 18-20.

176 Anzanello, M. J., Ortiz, R. S., Limbergerb, R. P.and Mayorga, P. A multivariate-

based wavenumber selection method for classifying medicines into authentic or

counterfeit classes. Journal of pharmaceutical and biomedical analysis 83: 209-

214 (2013).

177 Hou, T., Li¸Nan, , L., Youyongand Wang, W. Characterization of domain–peptide

interaction interface: Prediction of sh3 domain-mediated protein–protein

interaction network in yeast by generic structure-based models. Journal of

proteome research, 11 (5): 2982-2995 (2012 ).

178 Roggo, Y., Degardin, K.and Margot, P. Identification of pharmaceutical tablets by

raman spectroscopy and chemometrics. Talanta, 81 (3): 988-995 (2010).

179 Dégardin, K., Roggo, Y., Been, F.and Margot, P. Detection and chemical profiling

of medicine counterfeits by raman spectroscopy and chemometrics. Analytica

chimica acta, 705 (1): 334-341 (2011).

180 Ramirez, J. L., Bellamy, M. K.and Romañach, R. J. A novel method for analyzing

thick tablets by near infrared spectroscopy. (2001). AAPS PharmSciTech, 2 (3):

15-24 (2001).

181 Laitinen, N., Antikainen, O.and Yliruusi, J. Characterization of particle sizes in bulk

pharmaceutical solids using digital image information. AAPS PharmSciTech, 4 (4):

383-391. (2003).

182 Cruz, J. and Blanco, M. Content uniformity studies in tablets by nir-ci. Journal of

Pharmaceutical and Biomedical Analysis, 56: 408– 412 (2011).

183 Lopes, M. B. and Wolff, J. C. Investigation into classification/sourcing of suspect

counterfeit heptodin™ tablets by near infrared chemical imaging Analytica chimica

acta, 633 (1): 149-155 (2009).

184 Laksmana, F. L., Van Vliet, L. J., Kok, P. H., Vromans, H., Frijlink, H. W.and Van

der Voort Maarschalk, K. Quantitative image analysis for evaluating the coating

thickness and pore distribution in coated small particles Pharmaceutical research,

26 (4): 965-976 (2009).

150

185 Lopes, M. B., Wolff, J. C., Bioucas-Dias, J. M.and Figueiredo, M. A. Determination

of the composition of counterfeit heptodin™ tablets by near infrared chemical

imaging and classical least squares estimation. Analytica chimica acta, 641 (1): 46-

51 (2009).

186 Nikon, Nikoninstruments. Nikon.

187 Fichera, L. An implementation of imadjust in c#. 2012 October 11; Available from:

http://lorisfichera.github.io/blog/2012/10/11/an-implementation-of-imadjust-in-c-

number/.

188 Szczypinski, P. M., Strzelecki, M., Materka, A.and Klep, A. Mazda—a software

package for image texture analysis. Computer methods and programs in

biomedicine, 94 (1): 66-76 (2009).

189 Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P.and Witten, I. H. The

weka data mining software: An update. SIGKDD Explorations, 11 (1): 11-18

(2009).

190 MATLAB, Matlab and statistics toolbox release r2013a. The MathWorks, Inc.,

Natick, Massachusetts: United States.

191 Dongwoo Optron Co., L. Dongwoo optron. 2012. Available from:

http://www.dwoptron.com/lib/download.asp?aDir=datas&file=2012+DW+catalog

.pdf.

http://lorisfichera.github.io/blog/2012/10/11/an-implementation-of-imadjust-in-c-number/

http://lorisfichera.github.io/blog/2012/10/11/an-implementation-of-imadjust-in-c-number/

http://www.dwoptron.com/lib/download.asp?aDir=datas&file=2012+DW+catalog.pdf

http://www.dwoptron.com/lib/download.asp?aDir=datas&file=2012+DW+catalog.pdf

xi

Plagiarism Report

xii

List of Publications and Reprints

1. Tahir, F., and Fahiem, M. A. A Statistical-Textural-Features Based Approach for

Classification of Solid Drugs Using Surface Microscopic Images. Computational

and mathematical methods in medicine, 2014: (2014).

2. Tahir, F., Fahiem, M. A., Tauseef, H. Farhan, S. A Survey of Multispectral High

Resolution Imaging Based Drug Surface Morphology Validation Techniques. Life

Science Journal - Acta Zhengzhou University Overseas Edition, 10 (7s): 1050-

1059 (2013).

3. Farhan, S., Fahiem, M. A., Tahir, F. Tauseef, H. A Comparative Study of

Neuroimaging and Pattern Recognition Techniques for Estimation of Alzheimer’s

Disease. Life Science Journal - Acta Zhengzhou University Overseas Edition,

10 (7s): 1030-1039 (2013).

4. Tauseef, H., Fahiem, M. A., Farhan, S., Tahir, F. A Review of Image and

Phylogenetic Analysis Based Techniques for Ischemic Stroke Risk Estimation.

Life Science Journal - Acta Zhengzhou University Overseas Edition, 10 (7s):

1040-1049 (2013).

formulating offline nondestructive validation of...

Documents