automation of preprocessing and recognition of historical document images
TRANSCRIPT
Automation of Preprocessing and Recognition of
Historical Document Images
A Thesis submitted to
VISVESVARAYA TECHNOLOGICAL UNIVERSITY
Belgaum
for the award of degree of
Doctor of Philosophy inComputer Science & Engineering
by
B Gangamma
Department of Computer Science & Engineering,
P E S Institute of Technology - Bangalore South Campus(formerly P E S School of Engineering), Bangalore, Karnataka, India.
2013
2
Department of Computer Science & Engineering,
P E S Institute of Technology - Bangalore South Campus
(formerly P E S School of Engineering),
Bangalore, Karnataka, India.
CERTIFICATE
This is to certify that B Gangamma has worked under my supervision
for her doctoral thesis titled “Automation of Preprocessing and
Recognition of Historical Document Images”. I also certify that
the work is original and has not been submitted to any other University
wholly or in part for any other degree.
Dr. Srikanta Murthy K
Professor & Head,
Department of Computer Science & Engineering,
P E S Institute of Technology - Bangalore South Campus
(formerly P E S School of Engineering),
Bangalore, Karnataka, India.
i
Department of Computer Science & Engineering,
P E S Institute of Technology - Bangalore South Campus
(formerly P E S School of Engineering),
Bangalore, Karnataka, India
DECLARATION
I hereby declare that the entire work embodied in this doctoral thesis
has been carried out by me at Research Centre, Department of Com-
puter Science & Engineering, P E S Institute of Technology - Bangalore
South Campus(formerly P E S School of Engineering) under the super-
vision of Dr. Srikanta Murthy K. This thesis has not been submitted
in part or full for the award of any diploma or degree of this or any
other University.
B Gangamma
Research scholar
Department of Computer Science & Engineering
P E S Institute of Technology - Bangalore South Campus,
(formerly P E S School of Engineering), Bangalore.
ii
Acknowledgements
Any accomplishment requires the efforts of many people and
this work is not an exception. I will be failing in my duty if I
do not express my gratitude to those who have helped in my
endeavor.
With deep gratitude and reverence, I would like to express
my sincere thanks to my research supervisor Dr. Srikanta
Murthy K, Professor & Head, Department of Computer Sci-
ence & Engineering, P E S Institute of Technology - Banga-
lore South Campus, Bangalore, for his constant and untiring
efforts to guide right through the research work. His tremen-
dous enthusiasm, inspiration, and constant support through-
out my research work have encouraged me to complete this
dissertation work. His wide knowledge and logical way of
thinking, detailed and constructive comments have provided
a good basis for the research work and thesis. I would like
to thank Dr. J Suryaprasad, Principal & Director, P E S In-
stitute of Technology - Bangalore South Campus, Bangalore,
for his constant support.
I owe special thanks and sincere gratitude to Mrs. Shylaja
S S, Professor & Head, Department of Information Science &
Engineering, P E S Institute of Technology, Bangalore for mo-
tivating, encouraging and providing necessary support to the
complete research and thesis work. I am also thankful to Dr.
S Natarajan, Professor, Department of Information Science
and Engineering, P E S Institute of Technology, Bangalore,
for providing proper directions to my research work. I wish
to express my warm and sincere thanks to Dr. K. N. Bala-
subramanya Murthy, Principal & Director, P E S Institute of
Technology, Bangalore for inspiring me to take up research
and work towards a doctoral degree. I would like to express
sincere thanks to P E S management for providing motivation
and a platform to carry out the research.
I thank whole heartedly Mr. Jayasimha, Mythic Society of
India, Bangalore, for providing me the scanned copies of the
palm leaf manuscripts. My warm thanks are due to Mr. M
P Shelva Thirunarayana, R Narayana Iyangar, Academy of
Sanskrit Research Center, Melukote and Sri. S N Cheluva-
narayana, Principal, Sanskrit College, Melukote, Karnataka,
for providing knowledge about the historical documents along
with sample manuscripts of paper and palm leaf.
I need to put my sincere effort in thanking Dr. Veeresh
Badiger, Professor, Kannada University Hampi, Karnataka,
for providing information about resources and guiding my re-
search work. Further I would like to extend special thanks to
him for providing digitized samples of palm leaf manuscripts.
iv
I would like to thank Dr. G. Hemantha Kumar, Professor
& Chairman, Department of Studies in Computer Science,
University of Mysore, for his valuable suggestions and direc-
tions during pre Ph.D. viva voce. I whole heartedly thank
Dr. M Ashwath Kumar, Professor, Department of Infor-
mation Science & Engineering, M S R Institute of Technol-
ogy, Bangalore, for his valuable directions given during pre
Ph.D. viva-voce. I warmly thank Dr. Bhanumathi, Reader
at Manasa Gangothri, Mysore, for providing useful informa-
tion about palm leaf manuscripts. Detailed discussion about
manuscripts and interesting explorations with her has been
very helpful for my work.
I wish to thank Dr. Suryakantha Gangashetty, Assistant
Professor, IIIT Hyderabad, for his suggestions, and Dhanan-
jaya, Archana Ramesh, Dilip, research scholars at IIIT Hy-
derabad, for their valuable discussions. I am grateful to Dr.
Basavaraj Anami, Principal, K L E Institute of Technology,
Hubli, for his guidance and wonderful interactions, which
helped me in shaping my research work properly.
I would like to express my heartfelt thanks to Dr. Punitha
P Swamy, Professor & Head, Department of Master of Com-
puter Application, P E S Institute of Technology, Bangalore,
for her detailed review, constructive criticism and excellent
advice throughout my research work and also during prepa-
ration of the thesis. My sincere thanks to Dr. Avinash N.
v
Professor, Department of Information Science & Engineering,
P E S Institute of Technology, Bangalore, for his valuable dis-
cussions during thesis write up.
I owe my most sincere thanks to my brother-in-law Dr.
Mallikarjun Holi, Professor & Head, Department of Bio-medical
Engineering, Bapuji Institute of Engineering & Technology,
Davanagere, for reviewing my thesis and giving valuable sug-
gestions.
I owe my loving thanks to my husband Suresh Holi and
my children Anish and Trisha, who have extended constant
support in completing my work. Without their encourage-
ment and understanding it would have been impossible for
me to finish this work. I express deepest sense of gratitude
to my father-in-law Prof. S. M. Holi, who has motivated me
towards research. His inspiring and encouraging nature has
stimulated me to take up research. I would like to express
my heartfelt thanks to my mother-in-law, Mrs. Rudramma
Holi for her loving support. I also extend my sincere thanks
to my sister-in-laws Dr. Prema S Badami, Mrs. Shivaleela S
Patil, Sharanu Holi, brother-in-law Mr. Sanganna Holi and
their families for giving me moral support.
I express my heartfelt thanks to my parents Mr. Somaraya
Biradar and Mrs. Shivalingamma Biradar for encouraging
and helping me in my activities. I would like to place my grat-
vi
itude to my sisters Mrs. Nirmala Marali, Suvarna Patil and
brothers Manjunath Biradar and Vishwanath Biradar along
with their family for providing moral support during my re-
search work.
During this work, I have collaborated with many colleagues
for whom I have great regard, and wish to extend my warmest
thanks to all faculty colleagues, Department of Information
Science and Engineering in P E S Institute of Technology,
Bangalore. I wish to thank my team mates Mr. Arun Vikas,
Jayashree, Mamatha H R, Karthik S and friends Sangeetha
J, Suvarna Nandyal, Srikanth H R for their support. Lastly,
and most importantly, I am indebted to my faculty colleagues
for providing a stimulating and healthy environment to learn
and grow. It is a pleasure to thank many people who have
helped me directly or indirectly and who made this thesis
possible. I also place my sincere gratitude to external review-
ers for providing critical comments which significantly helped
in improving the standard of the thesis. I take this oppor-
tunity to thank VTU e-learning center for having given me
an opportunity to present the template used to prepare my
doctoral thesis using Latex.
B Gangamma
vii
DEDICATED TO MY FAMILY,MENTORS AND WELL
WISHERS
Abstract
Historical documents are the priceless property of any country and they
provide insight and information about, ancient culture and civilization.
These documents are found in the form of inscriptions on variety of
hard and fragile materials like stone, pillar, rocks, metal plates, palm
leaves, birch leaves, clothes, and papers. Most of these documents are
nearing the end of their natural lifetime and are posed with various
problems due to climatic condition, method of preservation, materials
used to inscribe etc. Some of the problems are due to the worn out
conditions of the material such as brittleness, strained and stained,
sludge and smudge, fossil deposition, fungus attack, dust accumulation,
wear and tear of the material, broken, damaged etc. These damages
create problems in processing the historical documents and make the
inscriptions illegible for reading and make the historical documents
indecipherable.
Although preservation through digitization is in progress by various
organizations, deciphering the documents is very difficult and demands
the expertise of Paleographers and Epigraphists. Since such experts are
less in number and could become extinct in the near future, there is a
need to automate the process of deciphering these document images.
The problems and complexities posed by these documents have led
to the design of a robust system which automates the processing and
deciphering of these document images, and hence demands thorough
preprocessing algorithms to enhance these.
The accuracy of the recognition system always depends on the seg-
mented characters and its extracted features. Historical document im-
ages usually pose uneven line space, inscriptions over curved lines, over-
lapping text lines etc., making segmentation of the document difficult.
In addition, the documents also pose challenges like low contrast; dark
and uneven background, blotched (stained) characters etc, usually re-
ferred to as noise. Presence of noise also leads to erroneous segmen-
tation of the document image. Therefore there is a need for thorough
preprocessing techniques to eliminate the noise and enhance the doc-
ument image. To decipher the documents belonging to various era,
we need a character set pertaining to that era. Hence this warrants a
recognition system to recognize the era of the character.
In this context, this research work focuses on developing algorithms:
to preprocess and enhance historical document images of Kannada -
a South Indian language, to eliminate noise, to segment the enhanced
document image into lines and characters and to predict the era of the
scripts.
To preprocess the noisy document images, three image enhancement
algorithms in spatial domain and two algorithms in frequency domain
are proposed. Out of these spatial domain methods, the first method
utilizes the morphological reconstruction technique to eliminate the
dark uneven noisy background. This algorithm is used as background
elimination technique in the other four algorithms proposed for image
enhancement. Although, the gray scale morphological operations elim-
inate noisy dark background, this method fails to enhance, severely
degraded document image and is unable to preserve the sharp edges.
ii
To enhance the image by eliminating the noise without smoothing the
edges, a second algorithm is developed using bilateral filter, which com-
bines domain and range filtering. The third algorithm is a non local
means filter algorithm based on similarity measure between non local
windows and it is proposed to denoise the document images.
Frequency domain based transforms and its varied versions are used
in image denoising, feature extraction, compression and reconstruction.
An algorithm based on wavelet transform is developed to analyze and
restore the degraded document images. However wavelet transform
works well in handling the point discontinuity, but fails to handle curve
discontinuity. To overcome the problem of handling curve discontinuity,
curvelet transform based approach is proposed, which provides better
results in comparison with the wavelet transformed approach. The
performances of all the image enhancement techniques are compared
using Peak Signal Noise Ratio (PSNR), computational time and human
visual perception.
Two segmentation algorithms have been developed to address the
problem of segmenting historical document image, one is based on
piecewise projection profile method and the other is based on mor-
phological closing and connected component analysis (CCA). The first
method addresses the uneven line spacing by dividing the image into
vertical pieces, extracting each line from each piece and combining lines
of all the vertical pieces. The second method addresses the problems of
both uneven spacing and the touching (overlapping) lines using closing
operation and CCA.
iii
Document skew might be introduced during image capture and needs
to be deskewed. Since the historical documents usually contain uneven
spacing between lines, correcting document skew will not help in seg-
menting the handwritten document image correctly. Uneven line spac-
ing will usually cause multiple skews within the document. To correct
the skew within the document lines, an extended version of the second
segmentation algorithm is developed.
To predict the era of the script/character, curvelet transform based
algorithm is designed to extract the characteristic features and mini-
mum distance classifier is employed to recognize the era of the charac-
ters. To sum up, in this research work: three spatial domain techniques,
two frequency domain based approaches have been implemented for
denoising and enhancing the degraded historical document images and
two segmentation algorithms have been designed to segment the lines
and characters from the document images, one algorithm is designed to
detect and correct the multiple skews within the document and another
algorithm is presented to predict the era of the segmented character so
that the respective character set belonging to that particular era can
be referred in order to decipher the documents.
iv
Contents
1 Preface 1
1.1 Preamble . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Historical Documents . . . . . . . . . . . . . . . . . . . 3
1.2.1 Kannada Scripts/Character . . . . . . . . . . . 6
1.3 Motivation for the Research Work . . . . . . . . . . . . 7
1.3.1 Data Collection . . . . . . . . . . . . . . . . . . 7
1.3.2 Enhancement/Preprocessing . . . . . . . . . . . 10
1.3.3 Segmentation . . . . . . . . . . . . . . . . . . . 12
1.3.4 Feature Extraction and Recognition . . . . . . . 13
1.4 Contribution . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5 Organization of the Thesis . . . . . . . . . . . . . . . . 16
2 Literature Survey 17
2.1 Computer Vision . . . . . . . . . . . . . . . . . . . . . 17
2.2 Preprocessing and Segmentation . . . . . . . . . . . . . 18
2.2.1 Enhancement of Historical Document Image . . 24
2.2.2 Segmentation of Historical Documents . . . . . 26
2.3 Character Recognition . . . . . . . . . . . . . . . . . . 28
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 34
i
3 Enhancement of Degraded Historical Documents : Spa-
tial Domain Techniques 35
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Gray Scale Morphological Reconstruction (MR) Based
Approach . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.1 Overview of Mathematical Morphology . . . . . 38
3.2.2 Adaptive Histogram Equalization(AHE) . . . . 42
3.2.3 Gaussian Filter . . . . . . . . . . . . . . . . . . 42
3.2.4 Proposed Methodology . . . . . . . . . . . . . . 43
3.2.5 Results and Discussion . . . . . . . . . . . . . . 48
3.3 Bilateral Filter (BF) Based Approach . . . . . . . . . . 54
3.3.1 Overview of Bilateral Filter . . . . . . . . . . . 55
3.3.2 Proposed Methodology . . . . . . . . . . . . . . 56
3.3.3 Results and Discussion . . . . . . . . . . . . . . 59
3.4 Non Local Means Filter (NLMF) Based Approach . . . 66
3.4.1 Overview of Non Local Means Filter . . . . . . 67
3.4.2 Proposed Algorithm . . . . . . . . . . . . . . . 68
3.4.3 Results and Discussion . . . . . . . . . . . . . . 73
3.5 Discussion of Three Spatial Domain Techniques . . . . 77
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 82
4 Enhancement of Degraded Historical Documents : Fre-
quency Domain Techniques 84
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 84
4.2 Wavelet Transform (WT) Based Approach . . . . . . . 85
4.2.1 Overview of Wavelet Transform . . . . . . . . . 86
4.2.2 Denoising Method . . . . . . . . . . . . . . . . 88
4.2.2.1 Thresholding Algorithms . . . . . . . . 88
ii
4.2.3 Proposed Methodology . . . . . . . . . . . . . . 90
4.2.3.1 Stage 1: Mathematical Reconstruction 92
4.2.3.2 Stage 2: Denoising by Wavelet Transform 93
4.2.3.3 Stage 3: Postprocessing . . . . . . . . 94
4.2.3.4 Algorithm . . . . . . . . . . . . . . . . 94
4.2.4 Results and Discussions . . . . . . . . . . . . . 94
4.3 Curvelet Transform (CT) Based Approach . . . . . . . 98
4.3.1 Overview of Curvelet Transform . . . . . . . . . 100
4.3.2 Proposed Method . . . . . . . . . . . . . . . . 104
4.3.2.1 Denoising Using Curvelet Transform . 104
4.3.2.2 Algorithm . . . . . . . . . . . . . . . . 104
4.3.3 Results and Discussions . . . . . . . . . . . . . 106
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.5 Discussion on Enhancement Algorithms . . . . . . . . . 108
5 Segmentation of Document Images 116
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 116
5.2 Proposed Methodologies . . . . . . . . . . . . . . . . . 117
5.3 Method 1: Piece-wise Horizontal Projection Profile Based
Approach . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.3.1 Division into Vertical Strips . . . . . . . . . . . 120
5.3.2 Horizontal Projection Profile of a Strip . . . . . 120
5.3.3 Reconstruction of the Line Using Vertical Strips 120
5.3.4 Character Extraction . . . . . . . . . . . . . . . 122
5.3.5 Algorithm for Document Image Segmentation. . 122
5.3.6 Results and Discussion . . . . . . . . . . . . . . 125
5.4 Method 2: Mathematical Morphology and Connected
Component Analysis(CCA) Based Approach . . . . . . 126
iii
5.4.1 Morphological Closing Operation . . . . . . . . 128
5.4.2 Line Extraction Using Connected Components
Analysis . . . . . . . . . . . . . . . . . . . . . . 129
5.4.3 Finding the Height of Each Line and Checking
the Touching Lines. . . . . . . . . . . . . . . . . 130
5.4.4 Character Extraction . . . . . . . . . . . . . . . 130
5.4.5 Algorithm for Segmentation of the Document Im-
age into Lines. . . . . . . . . . . . . . . . . . . . 131
5.4.6 Results and Discussion . . . . . . . . . . . . . . 132
5.5 Discussion on Method 1 and Method 2 . . . . . . . . . 133
5.6 Skew Detection and Correction Algorithm . . . . . . . 135
5.6.1 Skew Angle Detection . . . . . . . . . . . . . . 137
5.6.2 Skew Correction . . . . . . . . . . . . . . . . . . 138
5.6.3 Algorithm for Deskewing . . . . . . . . . . . . . 140
5.6.4 Results and Discussion . . . . . . . . . . . . . . 140
5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 144
6 Prediction of Era of Character Using Curvelet Trans-
form Based Approach 146
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 146
6.2 Related Literature . . . . . . . . . . . . . . . . . . . . 147
6.3 Proposed Method . . . . . . . . . . . . . . . . . . . . . 151
6.3.1 Data Set Creation . . . . . . . . . . . . . . . . 152
6.3.2 Preprocessing . . . . . . . . . . . . . . . . . . . 152
6.3.3 Feature Extraction using FDCT . . . . . . . . . 153
6.3.4 Classification . . . . . . . . . . . . . . . . . . . 153
6.3.5 Algorithm for Era Prediction . . . . . . . . . . 153
6.4 Experimentation and Results . . . . . . . . . . . . . . 154
iv
6.4.1 Experimentation 1 . . . . . . . . . . . . . . . . 154
6.4.2 Experimentation 2 . . . . . . . . . . . . . . . . 155
6.4.3 Experimentation 3 . . . . . . . . . . . . . . . . 156
6.4.4 Discussion . . . . . . . . . . . . . . . . . . . . . 157
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 159
7 Conclusion and Future Work 160
7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 160
7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . 164
A Palm Leaf Images 167
B Paper Images 170
C Stone Inscription Images 174
D Author’s Publications 178
v
List of Figures
1.1 6th Century Ganga Dynasty Inscription. . . . . . . . . 4
1.2 13th Century Hoysala Dynasty Inscription. . . . . . . . 5
1.3 Inscriptions on palm leaf belonging to 16th − 18th century. 6
1.4 Stone inscription belonging to 3rd century BC. . . . . . 7
3.1 (a) Input image. (b) Result of binary morphological
dilation operation. (c) Result of binary morphological
erosion operation. . . . . . . . . . . . . . . . . . . . . . 39
3.2 (a) Input image. (b) Result of binary morphological
opening operation. (c) Result of binary morphological
closing operation. . . . . . . . . . . . . . . . . . . . . . 40
3.3 (a) Original Gray scale image. (b) Result of gray scale
dilate operation on image. (c) Result of gray scale ero-
sion operation on image. . . . . . . . . . . . . . . . . . 41
3.4 (a) Original Gray scale image. (b) Result of gray scale
closing operation on image. (c) Result of gray scale
opening operation on image. . . . . . . . . . . . . . . . 41
3.5 Noisy palm leaf document image belonging to 16th century. 43
3.6 Binarized noisy images of Figure(3.5). . . . . . . . . . . 43
3.7 Original image of palm leaf script belonging to 16th century. 44
3.8 Binarized noisy image of Figure(3.7). . . . . . . . . . . 44
vi
3.9 Flow chart for MR based method. . . . . . . . . . . . . 45
3.10 AHE result on images shown in Figure(3.5) and Figure(3.7) 46
3.11 Result of stage 2. (a), (b) are results of opening opera-
tion on images shown in Figure(3.10)(a), (b). and (c),
(d) are results of reconstruction technique. . . . . . . . 47
3.12 Result of stage 3. (a), (b) Results of closing operation
on stage 2 output images shown in Figure(3.11)(a), (b).
(c), (d) Subtraction of R1 from R4. (e), (f) Subtraction
of result of previous step from R2. . . . . . . . . . . . . 47
3.13 (a), (b) Results of Gaussian filter on images shown in
Figure(3.12((e), (f). . . . . . . . . . . . . . . . . . . . . 48
3.14 Morphological reconstruction technique on images shown
in Figure(3.13)(a), (b). . . . . . . . . . . . . . . . . . . 48
3.15 Binarized images of Figure(3.14)(a),(b). . . . . . . . . 49
3.16 (a), (b), (c), (d) Results of MR based method paper im-
ages shown in Appendix 1 Figure(B.1), Figure(B.2), Fig-
ure(B.3) and Figure(B.4) belonging to nineteenth and
beginning of twentieth century. . . . . . . . . . . . . . 51
3.17 (a), (b) Results of MR based method on image of palm
leaf shown in Appendix 1 Figure(A.1) and (A.3) belong-
ing to 16th to 18th century . . . . . . . . . . . . . . . . 52
3.18 Result of MR based method on sample image taken from
Belur temple inscriptions Figure(C.2) belonging to 17th
century AD. . . . . . . . . . . . . . . . . . . . . . . . . 53
3.19 (a), (b) Result of MR based method on stone inscriptions
shown in Appendix 1 Figure(C.1), Figures(C.3) belong-
ing to 14− 17th century. . . . . . . . . . . . . . . . . . 53
vii
3.20 Comparison of proposed method with Gaussian, Aver-
age and Median filter. Figures (a), (b), (c), (d) show the
result of respective methods and figures (e), (f), (g), (h)
show the binarized images of (a), (b), (c), (d). . . . . . 54
3.21 Flow chart for BF based method. . . . . . . . . . . . . 57
3.22 (a) Input image of the palm leaf manuscript belonging
to 18th century. (b) Its binarized version. . . . . . . . . 58
3.23 (a) Filtered image using BF method. (b) Final result
of the BF method. (c) Binarized version of enhanced
image. . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.24 (a), (b),(c),(d) Results of BF based method on input
paper images in Figure(B.1), Figure(B.2), Figure(B.3)
and Figure(B.4) respectively. . . . . . . . . . . . . . . . 62
3.25 (a), (b) Results of BF based method Figure(A.4 and
Figure(A.5. . . . . . . . . . . . . . . . . . . . . . . . . 63
3.26 (a) Input image of palm leaf manuscript. (b) Result of
MR based method. (b) Enhanced image using BF based
method. . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.27 (a) (b) are results of BF based method on input image
in Figure(A.2) and Figure(3.7). . . . . . . . . . . . . . 64
3.28 Result of BF based method on image Figure(A.6) . . . 64
3.29 (a), (b) Results of BF based method on image in Fig-
ure(C.1) and Figure(C.3). . . . . . . . . . . . . . . . . 65
3.30 Result of BF based method on Figure(C.2) Belur temple
inscriptions belonging to 17th century AD. . . . . . . . 65
viii
3.31 Non Local Mean Filter Approach. Small patch of size
2p + 1 by 2p + 1 centred at x is the candidate pixel, y
and y′ are the non local patch within search window size
2k + 1 by 2k + 1. . . . . . . . . . . . . . . . . . . . . . 66
3.32 Input palm script image with low contrast. . . . . . . . 68
3.33 Result of NLMF method with residual image on Fig-
ure(3.32). . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.34 (a) Result of NLMF based method on image shown in
Figure(3.32). (b) Binarized image. . . . . . . . . . . . . 70
3.35 Flow chart for NLMF based method. . . . . . . . . . . 71
3.36 (a) Original image. (b) Filtered image using NLMF. (c)
Binarized image of the proposed NLMF method. (d)
Binarized noisy image using Otsu method. . . . . . . . 72
3.37 Results of NLMF based method on input images in Ap-
pendix 1 Figure(B.1), Figure(B.2), Figure(B.3) and Fig-
ure(B.4) . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.38 (a) Result of MR based method, (b) enhanced image of
using BF based method, and (c) result of NLMF based
method on input image shown in Figure(3.26). . . . . . 76
3.39 (a) and (b) Results of NLMF based method on input
images shown in Figure(A.2) and Figure(A.1). . . . . . 76
3.40 Result of NLMF based method on input image in Fig-
ure(A.6). . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.41 Results of NLMF nased method on images Figure (C.1
and Figure(C.3). . . . . . . . . . . . . . . . . . . . . . 77
3.42 (a), (b) Results of NLMF based method on images shown
in Figure(C.2) and Figure(C.4). . . . . . . . . . . . . 78
ix
4.1 Comparison of all thresholding methods . . . . . . . . 92
4.2 (a) Paper manuscript image-3 of previous century. (b)
Enhanced image using WT based approach. . . . . . . 95
4.3 Enhanced images using WT based approach on (a) Pa-
per manuscript image of shown in Appendix 1 (a) Fig-
ure(B.2) and (b) Figure(B.3 . . . . . . . . . . . . . . . 96
4.4 (a) Palm leaf manuscript image belonging to 16th - 18th
century. (b) Enhanced image using WT based approach. 96
4.5 (a) Palm leaf manuscript image belonging to 18th cen-
tury. (b) Enhanced image using WT based approach. . 97
4.6 (a) Palm leaf manuscript image belonging to 18th cen-
tury. (b) Enhanced image using WT based approach. . 98
4.7 (a) Palm leaf manuscript image belonging to 18th cen-
tury. (b) Enhanced image using WT based approach. . 99
4.8 (a) Stone inscription image belonging to seventeenth cen-
tury. (b) Result of WT based approach. . . . . . . . . 100
4.9 (a) and (c) Stone inscription images belonging to 14th -
17th century. (b) and (d) Results of WT based approach. 101
4.10 Result of WT based approach on stone inscription be-
longing to seventeenth century shown in Appendix 1 Fig-
ure (C.2). . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.11 (a)Wrapping data, initially inside a parallelogram, into
a rectangle by periodicity(Figures reproduced from pa-
per [172]). The shaded region represents trapezoidal
wedge.(b) Discrete curvelet frequency tiling. . . . . . . 102
4.12 (a), (c) and (e) Input images paper, palm leaf and stone.
(b), (d) and (f) Result of CT based approach. . . . . . 103
x
4.13 (a)-(b) Input images. (c)-(d) Results of first and second
stage of curvelet based approach. (e)-(f) Result of last
stage(image 15-49). . . . . . . . . . . . . . . . . . . . . 105
4.14 (a) Palm leaf manuscript image belonging in between
16th to 18th century. (b) Enhanced image using WT
based approach. (c) Result of CT based approach. . . . 106
4.15 (a) Input image of palm script. (b) Result of WT based
method. (c) Result of CT method. . . . . . . . . . . . 107
4.16 (a) Input image of palm script. (b) Result of WT based
method. (c) Result of CT method. . . . . . . . . . . . 107
4.17 (a) Result of WT based approach, (b) result of CT based
approach on image shown in Figure(4.8)(a). . . . . . . 108
4.18 Results of WT based method shown in (a), (c) and result
of CT based method shown in (b)-(d) for stone inscrip-
tion images shown in Figure(4.9)(a) and (c). . . . . . . 109
5.1 (a) Handwritten Kannada document image. (b) Hori-
zontal projection profile of handwritten document image. 118
5.2 Handwritten Kannada document image. . . . . . . . . 119
5.3 Horizontal projection profile of the input image Fig-
ure(5.2). . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.4 Non-Zero Rows (NZRs) and rows labelled NZR1 and
NZR2. . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.5 Horizontal projection profile of a strip. . . . . . . . . . 121
5.6 Extracted text lines. . . . . . . . . . . . . . . . . . . . 122
5.7 Character extraction from line. . . . . . . . . . . . . . 123
5.8 (a), (c), (e) are the extracted lines and (b),(d),(f) are
showing extracted characters from lines(a), (c), (e). . . 123
xi
5.9 Input handwritten image and extracted Lines. . . . . . 124
5.10 Extracted characters. . . . . . . . . . . . . . . . . . . . 124
5.11 Input image with uneven spacing between lines . . . . 126
5.12 Result of method 1 on the image shown in Figure(5.11). 126
5.13 Result of closing operation. . . . . . . . . . . . . . . . 127
5.14 Extracted text lines. . . . . . . . . . . . . . . . . . . . 127
5.15 (a) Line and extracted characters from line (a). . . . . 128
5.16 Input image. . . . . . . . . . . . . . . . . . . . . . . . . 128
5.17 Result of closing operation. . . . . . . . . . . . . . . . 130
5.18 Result of extraction of connected components(lines). . . 131
5.19 Result of binarization operation. . . . . . . . . . . . . . 133
5.20 Result of closing operation. . . . . . . . . . . . . . . . 133
5.21 Result of extraction of connected components and cor-
responding lines. . . . . . . . . . . . . . . . . . . . . . 134
5.22 (a) Touching line portion. (b) Result of closing and
opening operation. . . . . . . . . . . . . . . . . . . . . 135
5.23 Extraction of lines. . . . . . . . . . . . . . . . . . . . . 135
5.24 Input skewed image. . . . . . . . . . . . . . . . . . . . 137
5.25 Horizontal projection profile of the input image(5.24). . 138
5.26 Result of closing operation. . . . . . . . . . . . . . . . 139
5.27 Skew angle calculation from single connected component. 139
5.28 Result of deskewing. . . . . . . . . . . . . . . . . . . . 141
5.29 Reconstructed image of Figure(5.24). . . . . . . . . . . 142
5.30 (a) Input Image. (b) Deskewed image. . . . . . . 143
5.31 Input skewed image. . . . . . . . . . . . . . . . . . . . 143
5.32 Deskewed image. . . . . . . . . . . . . . . . . . . . . . 144
6.1 Sample epigraphical characters belonging to different era. 148
xii
6.2 Prediction Rate for Gabor, Zernike and proposed method.157
A.1 Original image of palm leaf script of 18th century. . . . 167
A.2 Input images of palm leaf document belonging to 17th
century. . . . . . . . . . . . . . . . . . . . . . . . . . . 168
A.3 Palm leaf image belonging to 18th century. noisy input
image. . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
A.4 Input image of palm leaf document belonging to 17th
century. . . . . . . . . . . . . . . . . . . . . . . . . . . 168
A.5 Input image of palm leaf document belonging to 17th
century. . . . . . . . . . . . . . . . . . . . . . . . . . . 169
A.6 Input images of palm leaf document belonging to 17th
century. . . . . . . . . . . . . . . . . . . . . . . . . . . 169
B.1 Sample paper image belonging to previous century. . . 170
B.2 Original paper image -1 belonging to nineteenth and be-
ginning of twentieth century. . . . . . . . . . . . . . . . 171
B.3 Original paper image -2 belonging to nineteenth and be-
ginning of twentieth century. . . . . . . . . . . . . . . . 172
B.4 Original paper image-3 belonging to nineteenth and be-
ginning of twentieth century. . . . . . . . . . . . . . . 173
C.1 Stone inscription image belonging to 14− 17th century. 174
C.2 Digitized image of Belur temple inscription belonging to
17th century AD. . . . . . . . . . . . . . . . . . . . . . 175
C.3 Digitized image of Belur temple inscriptions belonging
to 17th century AD. . . . . . . . . . . . . . . . . . . . 176
C.4 Digitized image of Shravanabelagola temple inscriptions
belonging to 14th century AD. . . . . . . . . . . . . . . 177
xiii
List of Tables
1.1 Evolution of Kannada Character . . . . . . . . . . . . . 8
3.1 Comparison of PSNR values and execution time for three
spatial domain methods to enhance the paper document
images of 512× 512 size. . . . . . . . . . . . . . . . . . 79
3.2 Comparison of PSNR values and execution time for three
spatial domain methods to enhance the palm leaf docu-
ment images of 512× 512 size. . . . . . . . . . . . . . . 80
3.3 Comparison of PSNR values and execution time for three
spatial domain methods to enhance the stone inscription
images of 512× 512 size. . . . . . . . . . . . . . . . . . 81
4.1 Comparison of various wavelet thresholding methods for
five images along with PSNR values. . . . . . . . . . . 91
4.2 PSNR values obtained from five different thresholding
methods for few images. . . . . . . . . . . . . . . . . . 93
4.3 Result of Curvelet Transform based approach. . . . . . 111
4.4 Comparison of PSNR Values and execution time for Wavelet
and Curvelet Transform based methods on paper images. 112
xiv
4.5 Comparison of PSNR Values and execution time for Wavelet
and Curvelet Transform based methods on palm leaf im-
ages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.6 Comparison of PSNR Values and execution time for Wavelet
and Curvelet Transform based methods on stone inscrip-
tion images. . . . . . . . . . . . . . . . . . . . . . . . . 114
4.7 Comparison of PSNR values of two frequency domain
based approaches. . . . . . . . . . . . . . . . . . . . . . 115
5.1 Result of skew detection and correction. . . . . . . . . 141
5.2 Skew angle detected for each line in the document image. 145
6.1 Confusion Matrix and Recognition Rate(RR) for char-
acter image size 100× 50. . . . . . . . . . . . . . . . . 155
6.2 Confusion Matrix and Recognition Rate (RR) for char-
acter image size 40× 40 with first scale. . . . . . . . . 156
6.3 Recognition Rate(RR) of the data set 64× 64 and Con-
fusion Matrix for character image size 64× 64 with first
scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
6.4 Comparison of the Recognition Rates(RR) for various
character image sizes 40× 40, 64× 64, 100× 50. . . . 157
xv
Chapter 1
Preface
1.1 Preamble
Documents are the major source of data, information and knowledge, which are writ-
ten, printed, circulated and stored for future use. Nowadays computers are gaining
dominion as they are used virtually everywhere to store information from handwritten
as well as printed documents and also produce printed documents [1], [2]. The often
repeated slogan of the paperless office for all organizations has now given way to a
different objective. In order to achieve such a paperless office, information needs to be
entered into a computer manually. Due to the substantial amount of labor required
to do so, the only solution is to make computers capable of reading paper documents
efficiently without the intervention of human operators. There exists massive scope
for research in the field of document image processing, particularly in the conversion
of document images into editable forms [3].
For the past few years, a lot of ambitious large-scale projects have been proposed
to make all written material available online in a digital form. Universities initiated
Million Book Project and industry initiated projects such as Google Books Library
in order to make this goal achievable and a lot of challenges still need to be handled
in the processing of these documents [4]. The main purpose of the digital library is
to consolidate all the documents that are spread across the globe and enable access
to their digital contents. The Optical Character Recognition (OCR) technology has
helped in converting document images into machine editable format. Even though
1
the OCR system adequately recognizes the documents, the recognition of handwrit-
ten documents is not completely reliable and is still an open challenge to researchers.
Inaccurate recognition is due to many factors like scanning errors, lighting conditions,
quality of the documents etc. Further inaccuracies stem from the age of these docu-
ments and the condition of the materials these documents are inscribed upon. Some
operations that can be performed on document images include: pre-processing of the
noisy image, enhancement of the low contrast image, de-blurring of the blurred im-
age, estimation of the skew introduced during image acquisition, segmentation of the
document image into lines, words, and characters and recognition of the character.
Historical documents are documents which contain vital information about our an-
cestors. They encompass every aspect of their life, religion, education etc. These are
inscribed or printed on a variety of materials and they substantially differ from vari-
ous other documents that are prevalent today mainly because of the major differences
in their layout structure. Due to their variable structure, extraction of the contents
of historical documents is a complicated task. Additional complexity is posed by
the various states of degradation that the historical documents are found in. The
primary causes for this degradation are factors like aging, faint typing, ink seepage
and bleeding, holes, spots, ornamentation and seals. Historical documents consist of
additional abnormalities like the presence of narrow spaced lines (with overlapping
and touching components) and the unusual and varying shapes in which the charac-
ters and words are found, due to differences in writing techniques and variations in
location and the particular period in which they were drafted. These problems also
create complications in segmenting the document image into lines, words and char-
acters, which is required to extract characteristic features for recognition purposes.
Thus, the removal of noise in the input document image and segmentation of the
document image into lines, words and characters are important factors in improving
the efficiency of OCR. Since processing of degraded documents plays a significant
role in deciding the overall result of the recognition product, it is essential that it be
handled effectively. With this background, in this thesis, we explore some efficient
image enhancement algorithms to enhance the degraded historical document images,
segment the enhanced image into lines, words and characters and documents belong-
2
ing to different eras. In this thesis, the terms document images and documents are
used interchangeably to refer to historical document images.
In the subsequent section, we present a brief introduction to historical documents,
its relevance and need for preservation. In the next succeeding section, we present
the motivation for the research work with brief introduction to document image pro-
cessing techniques: data acquisition/collection, pre-processing, segmentation, feature
extraction and recognition. Contribution of the research work and organization of
the thesis is presented in the last two sections.
1.2 Historical Documents
Written scripts have been the primary mode of communication and information stor-
age for hundreds of centuries. Prehistoric humans inscribed on stones, rocks and cave
walls. While some of these were used as a means of communication, others inscribe
a more religious or ceremonial purpose to them. Over the ages, evolving from primi-
tive objects like stones and rocks, novel mechanisms like palm or birch leaves, clothes
and paper became prevalent mediums for information storage. In later centuries,
they were more predominantly used to record information about education, religion,
health and socio-political advancement. These ancient artifacts are conventionally
referred to as historical documents and are a crucial part of any nations cultural
heritage. Some of the sample images shown in Figure(1.1), Figure(1.2) are stone
inscriptions of 6th and 13th centuries and Figure(1.3) is a palm leaf document.
According to Sircar [5] it has been confidently estimated that, about 80 percent of
all knowledge about Indian history (before 10th century A.D) has been derived from
inscriptional sources. Commonly found inscriptions are usually found inscribed on
walls of caves, pillars, big rocks, metal plates, coins etc. The remarkable durability
of these materials compelled ancestors to record vital information imperative for
future generations. Many of these inscriptions were inscribed to preserve truths
about battles and recognize acts of bravery and courage pertaining to our ancestors.
Some of them are : Edicts of the rulers: Achievements of rulers, Eulogies: awards
given to persons in praise, Commemorative inscriptions: this type again has five sub
3
Figure 1.1: 6th Century Ganga Dynasty Inscription.
categories. Donatory Inscriptions, Hero stones, Sathi stone, Epitaphs(inscriptions on
Tomb) and Miscellaneous.
These inscriptions not only comprise of text/characters, but also contain paintings
and carvings of humans, animals, nature and spiritual deities. An expert is required
to study and decipher their contents in the context in which they were envisioned in
a particular era. The study of such inscriptions is known as Epigraphy and an expert
involved in deciphering inscriptions is known as an Epigraphist. The inscriptions
on rocks, stones, caves and metals are vital resources which enlighten the present
generation about our past [6].
Stones, rocks, and metals were also used to inscribe significant community mes-
sages to people. Detailed information and stories could not be inscribed on materials
like rocks and stone. Therefore early ancestors used palm leaves and birch leaves
as a medium for imparting such information. They comprise of mythological stories,
spiritual teachings, and knowledge which spans a plethora of fields like science, educa-
tion, politics, law, medicine, literature etc. It has been estimated that India has more
than a hundred lakh palm and birch leaf documents available in various conditions.
Literature has revealed that the first usage of paper discovered through excavations
was in China from the 2nd century BC[7]. People in India, started writing on paper
4
Figure 1.2: 13th Century Hoysala Dynasty Inscription.
during 17th century. As these documents contain vital information pertaining to our
past and are reminiscent of our cultural integrity, there is a dire need to preserve
them and prevent any further degradation.
It is rightly said that the nation or the society, which does not know its heritage,
cannot fully comprehend its present and hence is unable to lead its future. This
heritage encompasses almost every aspect of human inquiry, be it culture, spiritual-
ity, philosophy, astronomy, medicine, religion, literature or education that prevailed
during different ages [8]. Majority of the details about a civilization can be obtained
from their ancient scriptures which help in understanding the past. Since these docu-
5
Figure 1.3: Inscriptions on palm leaf belonging to 16th − 18th century.
ments have degraded due to various factors like: weather conditions, fossil deposition,
fungus attacks, wear and tear, strain and stain, brittleness due to dry weather, ink
seepage, bleeding through and scratches etc., they cannot be preserved in their origi-
nal form for prolonged duration. Therefore, automated tools are required to capture
the document, enhance the documents images, recognize the era to which they belong
and finally convert them into digital form for long-term preservation. In our research
work, we have considered Kannada historical document images for experimentation.
Hence information about Kannada script and its evolution is provided in the next
section.
1.2.1 Kannada Scripts/Character
In South East Asia, East Asia including India, inscriptions are found in one of the
three scripts namely Indus valley, Brahmi and Kharosti. The Kannada script, a
South Indian language script is one among the many evolved versions of Brahmi and
are shown in Figure 1.1, an instance of Kannada script inscribed during 3rd century
BC. The image shown in Figure(1.4) shows the evolution of Kannada script since 3rd
century. The evolution of the script has brought changes in the structure and shape
of the script, mainly due to factors like writing materials, writing tools, method of
inscribing and the background of the inscriber [9], [10], [11],
Kannada script has a history of more than 2000 years and has taken shape from
early Brahmi script to the present Kannada as shown in Table(1.1). It has undergone
various changes and modifications during the dynasty of Satavahana(2nd century A
6
Figure 1.4: Stone inscription belonging to 3rd century BC.
.
D), Kadamba(4th−5th century A D), Ganga(6th century A D), Badami Chalukya(6th
century A D), Rastrakuta(9th century A D), Kalyani Chalukya(11th century A D),
Hoysala(13th century A D), Vijayanagar (15th century A D), Mysore(18th century A
D). Since experts are few in number and are fast decreasing, it is the need of the hour
to preserve and automate the process of deciphering these inscriptions.
1.3 Motivation for the Research Work
Historical documents are national treasures and provide valuable insight into past
cultures and civilizations, the significance of which has been extensively discussed in
the previous sections. The preservation of these documents is of vital importance and
is being strenuously carried out with the help of an assortment of advanced tools and
technologies. These kinds of documents are being digitized, processed and preserved
using a noteworthy set of image processing and pattern recognition techniques. The
major steps involved in the processing of an image are: image acquisition/collection,
preprocessing, segmentation, feature extraction and recognition[12],[13]. These and
other related works are discussed in the following sub sections.
1.3.1 Data Collection
The historical documents considered for this research work were collected from var-
ious libraries and universities across Karnataka; one of the prominent South Indian
7
Table 1.1: Evolution of Kannada Character
Character ’a’ Century
Ashoka, 3rd Century B C
Saathavahana, 2nd Century A D
Kadamba, 4th - 5th Century A D
Ganga, 6th] Century A D
Badami Chalukya, 6th Century A D
RashtraKuta, 9th Century A D
Kalyani Chalukya 11th Century A D
Hoysala, 13th Century A D
Vijayanagara 15th Century A D
Mysore 18th Century A D
States. These digitized documents are inscribed/written in Kannada, which is the
regional and official language of Karnataka. About 2700 digitized document images
were considered for our study. Majority of these are palm leaf documents and the
rest are paper and stone inscriptions span different eras from 13th to 19th centuries.
Since these images are collected using different setups i.e. either using a camera
8
or a scanner, the particular resolution details are unavailable. Differences in setup
cause significant variations in image size and resolution and introduce complexities
in setting up parameter values for experimentation. Therefore, each image set has
to be manually inspected and adjusted to get suitable image and character size. The
image set consists of documents inscribed by different individuals and also length of
the palm leaves used to inscribe varies across the collection.
Paper documents are categorized into two groups : Good-quality images and Noisy
images. Uneven illumination, brown colored and low contrast paper images without
spots, stains, and smears etc are grouped under Good-quality images. Images with
spots, stains, smears or smudges, with less or more background noise, wrinkles due
to humidity, illumination variation, ink seeping from the other side of the page,
oily pages, thin pen strokes, breaks, dark lines due to folding, de-coloring, etc are
grouped under Noisy images. Approximately 200 documents were collected with
varying resolutions. During experimentation, the images are divided into different
sizes depending on its overall size and also into 512 × 512 sized images. Higher
resolution images are re-sized and divided into smaller sized images. Lower resolution
images are divided without re-sizing. Large images are not capable of being processed
using computers due to hardware constraints, therefore images have to be divided into
smaller sized images. About 500 plus images were created out of 200 images.
Palm leaves are classified into two groups viz. Degraded and Severely Degraded.
Leaves with low contrast due to repeated application of preservatives (oil), stains
due to uneven application of oils, accumulation of dust, holes introduced due to
tying of the leaves together are classified as Degraded. Subsequently, leaves with
dark and brown colored lines introduced due to cracks, strains, breaks, wear and
tear and noise due to scanning errors are grouped under Severely Degraded. These
documents are hard to enhance and segment. About 1000 palm leaf documents were
collected with their sizes varying from 2cm to 24cm in length and 2cm to 6cm in
width. Furthermore, the lengthy images(of size more than 10 cm in length) were
re-sized and divided into smaller size, based on the size and character size within the
document. Therefore images were divided into two to three segments and used for
9
subsequent experimentation. Approximately 2000 images were obtained from 1000
images.
The percentage of degradation was found to be significantly higher in earlier stone
inscriptions, particularly those from 3rd century BC to 13th century AD. Capturing
stone inscriptions under different lighting conditions creates illumination and inten-
sity problems along with scratches, cracks, breaks and also leads to erased characters
due to wear and tear. So, stone inscriptions tend to be more severely degraded than
palm leaves and paper. Therefore, it is difficult to enhance the entire image. Ap-
proximately 200 digitized images of stone inscriptions were collected. Even though
more than 400 images were created out of 200, we have considered only 200 resized
samples for our study.
Some of the sample images belonging to paper, palm leaves and stone inscriptions
used for experimentation are shown in Appendix A, Appendix B and Appendix C.
1.3.2 Enhancement/Preprocessing
The primary objective of pre-processing is to improve the image quality by adequately
suppressing unwanted distortions and suitably enhance the part of the full image
features that are important for further processing. Even though we have a myriad
of advanced photography and scanning equipment at our disposal, natural aging and
perpetual deterioration have rendered many historical document images thoroughly
unreadable. Aging of these documents have led to the deterioration of the writing
media employed, due to influences like seepage of ink, smearing along the cracks,
damage to the leaf due to holes used for binding the manuscript leaves and other
extraneous factors such as dirt and discoloration.
In order to suitably preserve these fragile materials, digital images are predomi-
nantly captured using High Definition(HD) digital cameras in presence of an appropri-
ate light source instead of platen scanners. Digitizing palm leaf and birch manuscripts
pose a variety of problems. They cannot be forced flat and the light source used for
digital cameras are usually uneven and the very process of capturing a digital image of
10
the leaf introduces many complications. These factors lead to poor contrast between
the background and the foreground text. Therefore, innovative digital image pro-
cessing techniques are necessary to improve the legibility of the manuscripts. To sum
up, historical document images pose several challenges to preprocessing algorithms,
namely low contrast, nonuniform illumination, noise, scratches, holes, etc.
It has been observed from literature that many spatially linear, nonlinear and
spectral filters are used to denoise the image [14], [15], [16], [17]. Gatos et al.[18] pro-
posed a novel noise reduction technique called Wiener filter and adaptive binarization
method. Unsharp masking was proposed to enhance the edge detail information in
the degraded document. These filters eliminate noise, smoothen the image and give
a blurring effect[19]. In degraded documents, the text information is very crucial
for subsequent stages of character recognition and therefore losing out text informa-
tion while smoothing, is unacceptable. Therefore a suitable algorithm is required to
eliminate noise, without losing out much of the textual content.
Literature survey reveals that very little work has been reported on Indian historical
document processing owing to the fact that preservation of ancient physical resources
has taken precedence quite lately. India is a country of vast cultural heritage and
is one of the largest repositories of cultural heritages in the world. It houses an
estimated 5 million ancient manuscripts available in various archives and museums
throughout the country. The preservation of these resources was never a priority
subject in the past, so large resources have either vanished or gone out of our country.
Furthermore, even the ones which have survived have undergone massive degradation.
Therefore preservation of these historical heritages through digitization is of utmost
importance. However, any degradedness in the original document will be transferred
directly to their digitized versions rendering them illegible. To improve the legibility
of the document, images have to be pre-processed in order to get an enhanced copy.
So this warrants the development of novel image processing algorithms to preprocess
the digitized images.
11
1.3.3 Segmentation
Image segmentation is the process of splitting a digital image into multiple groups of
pixels, each of which are assigned unique labels so that pixels with the same label share
certain visual characteristics. In general terms, it can be considered as simplifying
the representation of an image into something that is more meaningful and easier
to analyze. Image segmentation is typically used to trace objects, boundaries and
regions of interest. In case of document images, segmentation refers to extraction of
lines, words, and characters from the given document. Segmentation of a document
image into text, lines and words is a critical phase in moving towards unconstrained
handwritten document recognition. Extracting lines from handwritten documents
is more complicated, as these documents contain non uniform line spacing, narrow
spacing between lines, scratches, holes and other factors which are elaborated in the
previous section on historical documents. Apart from variations of the skew angle
between text lines or along the same text line, the existence of overlapping or touching
lines, uneven character size and non-Manhattan layout pose considerable challenges
to text line extraction.
Due to inconsistency in writing styles, scripts, etc., methods that do not use any
prior knowledge adapt to the properties of the document image, as the proposed,
would be more robust. Line extraction techniques may be categorized as projection
based, grouping, smearing and Hough transform based [20]). Global projections based
approaches are very effective for machine printed documents but cannot handle text
lines with variable skew angles. However, they can be applied for skew correction in
documents with constant skew angle[21]. Hough transformed based methods handle
documents with variation in the skew angle between text lines, but are not very
effective when the skew of a text line varies along its width [22].
The most known of these segmentation algorithms are the following: X-Y cuts or
projection profiles based [23], Run Length Smoothing Algorithm(RLSA) [24], compo-
nent grouping [25], document spectrum [26], constrained text lines[27], Hough trans-
form [28], [29], and Scale space analysis [30]. All of the above segmentation algorithms
are mainly devised for present-day documents. For historical and handwritten docu-
ment segmentation, projection profiles [31], Run Length Smoothing Algorithm [32],
12
Hough transform[33] and scale space analysis algorithms [34] are mainly used. As
segmentation of the historical document images is another focus of our research work,
a detailed literature survey is given in next chapter and algorithms developed for line
segmentation are detailed in chapter 5.
1.3.4 Feature Extraction and Recognition
Feature extraction involves simplifying the amount of resources required to describe
a large set of data accurately. When performing analysis of complex data, one of the
major problem stems from the number of variables involved. Analysis involving a
large number of variables generally requires a large amount of memory, computation
power and/or the presence of a classification algorithm which over-fits the training
sample and generalizes poorly to new samples.
Feature extraction is a general term used for methods which involve constructing
combinations of the variables to get around these problems, while still describing
the data with sufficient accuracy. Features are used as input to classifiers in order
to classify and recognize the object. To recognize the character, features have to
be extracted from the segmented document. Literature survey reveals a wide array
of creative works in the diverse field of Document image processing and recognition.
Many authors have developed efficient algorithms for segmentation of the document
into lines, words, characters [35][36], feature extraction and classification of charac-
ters [37]. Feature extraction and recognition is an important part of the recognition
system. Major feature extraction algorithms are based on structural features, statis-
tical features and spectral methods. Structural features are based on topological and
geometrical characteristics such as, maxima and minima, reference lines, ascenders,
descenders, strokes and their direction between two points, horizontal curves at top
or bottom, cross points, end points, branch points etc [38]. A detailed literature
survey on the enhancement, segmentation and recognition stages is presented in the
next chapter.
Although significant efforts have been made to digitize the historical content, the
understanding of these documents is beyond the scope of any common man. The
13
underlying reason for this is that the character set has evolved and changed from
ancient times to what it is now. The scripts/characters used to inscribe the contents
are no longer prevalent. Hence expert knowledge is required to decipher these docu-
ments. In the present scenario, the number of expert Epigraphists are few in number
and are fast decreasing which could lead to a major problem in deciphering these pre-
cious resources in the future. Hence there is a need to develop supplementary tools
to recognize the era of the character which in turn helps to refer the corresponding
character set to understand the document through applications of computer vision
techniques.
Only few authors have attempted to recognize Brahmi scripts and predict the
corresponding era. Fewer still have worked on deciphering of South Indian Kannada
language epigraphical (Stone inscriptions) scripts and proposed algorithms for predic-
tion of the era of the script [8]. In our research work, palm leaf and paper manuscripts
belonging to various eras are considered to predict the era of the document and the
algorithms devised for the prediction of the era is provided in chapter 6.
1.4 Contribution
In this research work, the severe degradation of the documents has been addressed
by developing spatial and frequency domain based algorithms. In spatial domain,
three algorithms have been designed based on 1) Gray Scale Morphological Recon-
struction(MR); 2) Bilateral filtering and 3) Non Local Means filtering in combination
with morphological operations. In the frequency domain, two algorithms have been
devised using wavelet and curvelet transforms.
In spatial domain, gray scale morphological reconstruction technique is devised
using gray scale opening and closing operations. Gray scale opening is applied to
compensate for non uniform background intensity and suppress bright details smaller
than the structural element, while closing operation suppresses the darker details.
This algorithm is further used as background elimination method in combination
with remaining algorithms in this thesis. This method works well for the images
14
with less degradation. Severely degraded images are handled using a Bilateral Fil-
ter (BF) with a combination of gray scale morphological reconstruction technique.
Bilateral filter based method along with the MR algorithm is employed to eliminate
noise, enhance the contrast and eliminate dark background. The bilateral filter is
a non linear filter which uses a combination of range filtering and domain filtering.
A combination of Non Local Means filter (NLMF) and MR technique is employed
in designing enhancement algorithm to de-noise the documents based on similarity
measure between non local windows.
Since simple spatial domain techniques cannot handle all types of degradations, it
becomes necessary to transform the problem into another domain to get better results.
An attempt has been made to eliminate the noise using frequency domain based
methods to achieve the desired results. An algorithm based on wavelet transform is
devised to analyse and enhance the image. Since wavelet transform is unable to handle
curve discontinuity, an extended wavelet transform known as curvelet transform based
approach is used to design the second algorithm to enhance the degraded documents.
Due to the presence of uneven spaces, curved lines and touching lines in a histor-
ical document, the segmentation of the document becomes quite complicated. To
address this problem, two segmentation algorithms have been proposed. First algo-
rithm, based on piecewise projection profile, is suitable for extracting the curved lines,
but fails to segment the touching lines. Therefore the second algorithm, based on
mathematical morphology and Connected Component Analysis(CCA) is developed
to segment the touching lines. The second algorithm segments the touching line as
well as curved lines. The extended version of the second algorithm: a combined ap-
proach of morphology and CCA, is designed to detect the skewed lines and correct the
lines within the document. Usually handwritten documents contain uneven spacing
causing skewed lines in the document. The detection and correction of the individual
line skews will make the segmentation task simple.
The segmented characters are used in further stages of image processing viz. feature
extraction, recognition and classification. To recognize and classify the characters,
features of the individual characters have to be extracted and used as input to the
15
classifiers. In this research work, recognizing the era of the character is taken up
so that character set belonging to that era can be used to decipher the document.
Hence, algorithms for era prediction of the segmented characters is devised using
curvelet transform.
1.5 Organization of the Thesis
The thesis is organized into seven chapters. Chapter one provides an introduction to
historical document image processing, motivation for the research and contribution
of the thesis. Chapter two presents the literature survey. Chapter three provides
the algorithms which are designed based on spatial domain techniques. Chapter four
explains the algorithms developed to enhance the historical document images based
on frequency (wavelet) domain techniques. Chapter five presents the algorithms
developed for segmentation of handwritten documents into lines and characters and
skew detection and correction algorithms. Chapter six deals with the development
of the algorithms for feature extraction and recognition of the era of the character.
Chapter seven provides conclusion and future scope of the work.
16
Chapter 2
Literature Survey
2.1 Computer Vision
Visual system has been the greatest source of information to all living things since
beginning of the history. To interact effectively with the world, the vision system must
be able to extract, process, and recognize a large variety of visual structures from the
captured images [1]. One picture is worth a thousand words is well known sentence
to describe the importance of the visual data. Visual information transmitted in
the form of digital images is becoming the major method of communication in the
present scenario. This has resulted into a new field of computer technology known
as Computer Vision [2]. It is a rapidly growing field with increasing applications
in science and engineering and holds the responsibility of developing the suitable
machine that could perform the visual functions of an eye. It is mainly concerned
with modeling and replicating human vision using computer software and hardware
[12], [13]. It combines the knowledge of all fields of engineering in order to understand
and simulate the operation of the human vision system.
Computer vision finds its applications in various fields like: military, medicine,
remote sensing, forensic science, transportations etc. Some of these applications are:
content based image retrieval, automated image and video annotation, semantics re-
trieval, document image processing, mining, warehouse, augmented reality, biometric,
non-photorealistic rendering, and knowledge extraction etc. These applications in-
volve, various sub fields of Computer Vision such as Digital Image Processing, Pattern
17
Classification and/or Object Recognition, Video Processing, Data Mining and Arti-
ficial Intelligence etc. These sub fields are required to process the image/video data
in various combinations to get desired output.
One sub field of computer vision is Document Image Analysis and Recognition
(DIAR), which aims to develop techniques to automatically read and understand
the contents of document through machines. The DIAR system consists of four
major stages: document image acquisition, image preprocessing, feature extraction
and recognition. Document image acquisition deals with the capturing the document
image using scanners and cameras. Image preprocessing mainly deals with noise
elimination, restoration, segmentation. Feature extraction deals with the extraction
of the characteristic features of the segmented character(document) for recognition
of the character. Pattern recognition or classification is mainly used to recognizing
the object/pattern in the image using features extracted from feature extraction
techniques. In our research work, algorithms for image enhancement of historical
documents, segmentation of the document and prediction/recognition of era of the
document are presented and detailed literature survey of them is given in the following
sections.
2.2 Preprocessing and Segmentation
Often, degraded document creates problems in acquiring better quality images. In
document digitization projects of large volume, the main challenge is to automatically
decide correct and proper enhancement technique. Image enhancement techniques
may adversely influence an image quality if applied to incorrect image. Boutros [39]
proposed a prototype which can automate the image enhancement process. It is
clear that the quality of image acquisition affects the later stages of document image
processing. Hence proper image preprocessing algorithms are needed.
Text line extraction would segment the document images without background noise
and non-textual elements. In practice, it is very difficult to get document images with-
out noise. Some preprocessing techniques need to be performed before segmentation.
Non-textual elements around the text such as book bindings, book sides, and parts of
18
fingers should be removed. On the document itself, holes and stains may be removed
by high-pass filtering. Other non-textual elements (stamps, seals) and also ornamen-
tation, decorated initials can be removed using knowledge about the shape, the color
or the position of these elements. Extracting text from figures (text segmentation)
can also be performed on texture grounds [40], [41], or by morphological filters.
Intensive research work has been found in development of algorithms based on text
line distortion [42], [43], [44] methods. These proposed methods are aimed at solving
nonlinear folding of documents. Folding (warping) can sometimes become serious
and contents of the document become unreadable. Fan et al. [45] proposed hybrid
method by combining two cropping algorithms, first based on line detection and the
second based on text region growing, to achieve robust cropping.
Javadevan et al. [46] presented a survey on bank cheque processing. The work
presented covers the aspects of the document image processing. Almost all documents
which are part of any organization viz., business letters, newspaper, technical reports,
legal documents, bank checks need to be processed to extract information. Authors
have discussed various aspects of check processing techniques. As checks are scanned
in various conditions, low contrast, slanted, tilted are common problems. Cheques
may also contain scratches, lines, overwriting ink marks on the check leaf. These
create problems in recognizing the correct date, account number, amount, check
numbers etc. Cheque writers usually cross the text lines and write above the text
line.
Suen et al. [47] proposed a method to process the bank check in which initially the
image was smoothed using mean filter and then background was eliminated through
an iterative thresholding. Madasu and Lovell [48] proposed bank check process-
ing method based on gradient and Laplacian values which are used to find whether
an image pixel belongs to background or foreground. The binarization approach
proposed in [49] was based on Tsallis entropy to find the best threshold value and
histogram specification was adopted for preprocessing some images. To eliminate
the background from the cheque image in [51], a stored background sample image
was subtracted from the skew corrected test image. Background subtraction method
19
was adapted to extract written information from Indian bank cheques. Erosion and
dilation operations were used to eliminate the background residual noise. Logical
smearing was applied with the help of end-point co-ordinates of detected lines to deal
with broken lines in [57].
Binarization of images is very important step in any recognition systems. A lot
of work in finding suitable thresholding value for binarization has been found from
the literature survey. Sahoo et al. [52] compared the performances of more than 20
global thresholding algorithms using uniformity or shape measures. The comparison
showed that Otsu‘s class separability method [53] performed best.
Sezgin and Sankur [54] discussed various thresholding techniques in their survey
paper. The binarization algorithm proposed in [55] defines an initial threshold value
using percentage of the desired density of black pixels to appear in the final binarized
image. To improve the efficiency of the algorithm, a cubic function was used to
establish relationship between the initial threshold value and the final one. In [56],
the binarization of the grey-scale image was done with a threshold value calculated
dynamically based on the number of connected components in the area of courtesy
amount.
Slant/skew is the deviation of handwritten strokes from the vertical direction (Y
- axis) due to different writing styles. The skew may be introduced while scanning
the documents and can be detected by finding the angle that the baseline makes
with the horizontal direction. It has to be detected and corrected for successful
segmentation and recognition of handwritten user inputs. Skew correction is done
by simply rotating the image in the opposite direction by an angle equal to the
inclination of the guidelines. A comprehensive survey on different skew detection
techniques was reported in [50]. Due to the presence of guidelines, the histogram
with longest peak corresponds to the skew of the image. To correct the rotation
and translation occurred during the image acquisition process, a method based on
projection profile has been used in [51].
20
Kim and Govindaraju [58] proposed a chain code representation for calculating
the slant angle of handwritten information. In [59] and [60], the average slant of
a word was determined by an algorithm based on the analysis of slanted vertical
histograms [61]. The heuristics for finding the average slant was to search for the
greatest positive derivative in all the slanted histograms and then corrected through
a shear transformation in the opposite direction. Also in [62] and [63], the slant of
handwritten information was computed using the histogram of the directions of the
contour pixels.
Many techniques have been developed for page segmentation of printed documents
viz., newspapers, scientific journals, magazines, business letters produced with mod-
ern editing tools [64], [65], [66], [26]. The segmentation of handwritten documents
has also been addressed with the segmentation of address blocks on envelopes and
mail pieces [68], [67], [69], [70] and for authentication or recognition purposes [71],
[72].
There are various methods available for text line extraction. One of the fundamen-
tal methods is projection profile method which is used for printed documents and
handwritten document with proper spacing between lines. The vertical projection
profile is obtained by summing pixel values along the horizontal axis for each y value.
The profile curve can be smoothed by a Gaussian or median filter to eliminate local
maxima [34]. The profile curve was then analyzed to find its maxima and minima.
There are two drawbacks: short lines will provide low peaks, very narrow lines, as
well as those including many overlapping components, will not produce significant
peaks. In case of skew or moderate fluctuations of the text lines, the image may be
divided into vertical stripes and profiles sought inside each stripe [73]. These piece-
wise projections are thus a means of adapting to local fluctuations within a more
global scheme.
In Shapiro et al. [74] paper, the global orientation or skew angle of a handwritten
page was first searched by applying a Hough transform on the entire image. Once this
skew angle was obtained, projections were achieved along this angle. The number of
maxima of the profile gives the number of lines. Low maxima were discarded based
21
on their value, which was compared to the highest maxima. Lines were delimited by
strips, searching for the minima of projection profiles around each maxima.
In the work of Antonacopoulos and Karatzas [75], each minimum of the profile curve
was a potential segmentation point. Potential points were then scored according to
their distance to adjacent segmentation points. The reference distance was obtained
from the histogram of distances between adjacent potential segmentation points. The
highest scored segmentation point was used as an anchor to derive the remaining ones.
The method was applied to printed records of the Second World War which have
regularly spaced text lines. The logical structure was used to derive the text regions
where the names of interest can be found. The RXY cuts method applied in He and
Downton [31] uses alternating projections along the X and Y axes. This results in a
hierarchical tree structure. Cuts were found within white spaces. Thresholds were
necessary to derive inter-line or inter-block distances. This method can be applied
to printed documents (which are assumed to have these regular distances) or well-
separated handwritten lines.
For printed and binarized documents, smearing methods such as the Run-Length
Smoothing Algorithm [76] can be applied. Consecutive black pixels along the hori-
zontal direction were smeared: the white space between them was filled with black
pixels if their distance is within a predefined threshold. The bounding boxes of the
connected components in the smeared image enclose text lines. A variant of this
method adapted to gray level images and applied to printed books from the sixteenth
century consists in accumulating the image gradient along the horizontal direction
[77]. This method has been adapted to old printed documents within the Debora
project [78]. For this purpose, numerous adjustments in the method concern the
tolerance for character alignment and line justification. Shi and Govindaraju [79]
proposed a method for text line separation using fuzzy run length which imitates an
extended running path through a pixel of a document image.
The Hough transform [28] is a very popular technique for finding straight lines in
images. The Hough transform can also be applied to fluctuating lines in handwritten
drafts [80]. An approach based on attractive-repulsive forces was presented by Oztop
22
et al.[81]. It works directly on gray level images and consists of iteratively adapting
the y position of a predefined number of baseline units. Baselines are constructed
one by one from the top of the image to the bottom. Pixels of the image act as
attractive forces for baselines and already extracted baselines act as repulsive forces.
Tseng and Lee [82] presented a method based on probabilistic Viterbi algorithm ,
which derives non-linear paths between overlapping text lines. In Likforman-Sulem
et al. [33] method, touching and overlapping components are detected using Hough
transform method. Pal and Datta [35] proposed a line segmentation method based
on the piecewise projection profile.
Some solutions for separation of units belonging to several text lines can be found
in literature survey for recognition purposes. In Bruzzone and Coffettis method [83],
the contact point between ambiguous strokes was detected and processed from their
external border. An accurate analysis of the contour near the contact point was per-
formed in order to separate the strokes according to two registered configurations: a
loop in contact with a stroke or two loops in contact. Khandelwal et. al. [84] pre-
sented a methodology based on comparison of neighborhood connected components
to check text line belonging to same line or not. Components less than average height
are ignored and addressed later in the postprocessing.
New algorithm for segmentation of overlapping line and multi touching components
has been proposed by Zahour et al. [85] using block covering method which has
three steps. First step classifies the document using fractal analysis and Fuzzy C
means algorithm. Second step classifies the block using statistical analysis of block
height. Last step was a neighboring analysis for constructing text lines. High accuracy
through fractal analysis and a fuzzy C-means classifier were used to find the type of
the document.
Bloomberg’s [87] text line segmentation algorithm was specially designed for sep-
arating text and halftone image from a document image. But it was unable to
discriminate between text and drawing type non-text components and therefore fails
23
to separate them from each other. Hence Syed et al [88] presented a method to over-
come the Bloomberg’s algorithm and was able to separate text and non text regions
properly including halftones, drawings, map, graphs etc.
Bansal and Sihna et al. [89] proposed an algorithm which was based on the struc-
tural properties of the Devanagari script. They have implemented using two pass:
1) words were segmented into characters/composite characters, 2) height and width
of the character box was used to check whether the segmented character is single or
composite. Ashkan et al. [90] proposed skew estimation algorithm using eigen value
technique to detect and correct the skew in the document.
2.2.1 Enhancement of Historical Document Image
Ancient and historical documents strongly differ from the recent documents because
layout structure is completely different. As these documents contain variable struc-
ture, extraction of the contents are complicated. Besides, historical documents are
degraded in nature, due to ageing or faint typing, ink seepage and bleed through.
They include various disturbing facts like holes, spots, ornamentation or seals. Hand-
written pages include narrow spaced lines with overlapping and touching components.
Characters and words have unusual and varying shapes, depending on the writer, the
period and the place.
Relatively good progress can be found in the area of historical document image pro-
cessing. Shi and Govindarahu [91] proposed method for enhancement of historical
degraded document images using background light normalization. In their work, the
method captures the background intensity with the help of best fit linear function and
normalized with respect to the approximation. Shi and Govindaraju [92] also pro-
posed method for segmentation of historical document image using background light
intensity normalization. Yan and Leedham [93] proposed a thresholding technique
for binarization of historical documents. It uses local features vectors for analysis.
Gatos et al. [18] presented new adaptive approach for the binarization and en-
hancement of degraded documents. Proposed method does not require any parameter
24
tuning by the user and can deal with degradations which occur due to shadows, non-
uniform illumination, low contrast, large signal-dependent noise, smear and strain.
It consisted of several distinct steps: pre-processing procedure using low-pass Wiener
filter, rough estimation of foreground regions, and background surface calculation
by interpolating neighboring background intensities, thresholding by combining the
calculated background surface with the original image while incorporating image up-
sampling and finally a post-processing step in order to improve the quality of text
regions and preserve stroke connectivity.
Gatos et al. [95] presented a new approach for document image binarization. The
proposed method was mainly based on the combination of several state-of-the-art bi-
narization methodologies as well as on the efficient incorporation of the edge details
of the gray scale image. An enhancement step based on mathematical morphology
operations were also involved in order to produce a high quality result while preserv-
ing stroke information. The proposed method demonstrated superior performance
against six well-known techniques on numerous degraded handwritten and machine
printed documents.
Shi et al. [96] proposed methods for enhancing digital images of palm leaf and
other historical manuscripts. They have approximated the background of a gray-
scale image using piece-wise linear and nonlinear models. Normalization algorithms
are used on the color channels of the palm leaf image to obtain an enhanced gray-
scale image. Experimental results have shown significant improvement in readability.
An adaptive local connectivity map has been used to segment lines of text from
the enhanced images with the objective of further the techniques such as keyword
spotting or partial OCR and thereby making it possible to index these documents
for retrieval from the digital library.
Probabilistic models for text extraction algorithm from degraded document images
has been presented in [86]. Document image was considered as mixture of Gaussian
densities which corresponds to the group of pixels belonging to foreground and back-
ground of document image. Also Expected maximization (EM) algorithm was used to
estimate the parameters of Gaussian mixtures. Using these parameters, the image is
25
divided into two class: Text foreground and background using Maximum Likelihood
approach.
2.2.2 Segmentation of Historical Documents
Louloudis et al. [94] presented new text line detection method for unconstrained
handwritten documents. The proposed technique was based on the strategy that
consists of three distinct steps. The first step includes preprocessing for image en-
hancement, connected component extraction and average character height estimation.
In the second step, a block-based Hough transform was used for the detection of po-
tential text lines while the third step was used to correct possible false alarms. The
performance of the proposed methodology was based on a consistent and concrete
evaluation technique that relies on the comparison between the text line detection
result and the corresponding ground truth annotation.
Surinta and Chamchong [36] presented paper on image segmentation of historical
handwriting from palm leaf manuscripts. The process composed of following steps:
background elimination to separate text and background by Otsuas algorithm, line
segmentation and character segmentation by histogram of image.
Shi et al. [97], have presented new text line extraction method for handwritten
Arabic documents. The proposed technique was based on generalized adaptive local
connectivity map using a steerable directional filter. The algorithm was designed to
solve particularly complex problems seen in handwritten documents such as fluctu-
ating, touching or crossing text lines.
Nikolaou et al. [98] presented method towards the development of efficient tech-
niques in order to segment document pages resulting from the digitization of histori-
cal machine-printed sources. To address the problems posed by degraded documents,
they implemented an algorithm which uses following steps. First, using Adaptive
Run Length Smoothing Algorithm (ARLSA) to handle the problem of dense and
complex document layout, second to detect the noise areas and punctuation marks
that usually are present in historical machine-printed documents, third deals with de-
tection of possible obstacles created from background areas to separate neighboring
26
text columns or text lines, and last step deals with segmentation using segmentation
paths in order to isolate possible connected characters.
The enhancement of the document with ink bleed through using recursive unsuper-
vised classification technique has been proposed by Fadoua et al. [99]. The presented
method performs recursively K-means algorithm on the degraded image with princi-
pal component analysis of the document image. Then cluster values are taken and
back projected on the space. The iterative method has used for finding logarithmic
histogram and separating background and foreground using K-means algorithm until
clear separation of background and foreground of the document was made.
Kishore and Rege [19] used unsharp masking to enhance the edge detail information
in the degraded document. Gatos et al. [100] proposed method mainly is based on
the combination of several state-of-the-art binarization methodologies as well as on
the efficient incorporation of the edge information of the gray scale source image. An
enhancement step based on mathematical morphology operations was also involved
in order to produce a high quality result while preserving stroke information.
Halabi and Zaid [101] presented an enhanced system for degraded old document.
The developed system was able to deal with degradations which occur due to shadows,
non-uniform illumination, low contrast and noise. Ferhat et al.[102] proposed image
restoration using Singular Value Decomposition and restored even blurred image.
Lu and Tan [14] proposed technique which estimates document background sur-
face using an iterative polynomial smoothing procedure. Various types of document
degradations are then compensated by using the estimated document background
surface intensity. Using L1-norm image gradient, the text stroke edge is detected
from the compensated document image. Finally, the document text is segmented by
a local threshold that is estimated based on the detected text stroke edges. Ntogas
and Ventzas [15] proposed binarization procedure consisted of five discrete steps in
image processing, for different classes of document images.
27
Badekas and Papamarkos [103] proposed new method which estimates the best pa-
rameter values for each one of the document binarization techniques and also estima-
tion of the best document binarization result of all techniques. Laurence Likforman-
Sulem et al. [16] presented novel method for document enhancement which combines
two recent powerful noise-reduction steps. The first based on the total variation
framework and second based on Non-local Means. Non Local Means filter computa-
tional complexity depends on the size of the patch and window. Layout analysis is
required to extract text lines and identify the reading order properly which provides
proper input to classifiers.
Generic layout analysis for variety of typed text, handwritten and ancient Arabic
document image has been proposed in [104] paper. The proposed system performs
text and non text separation, then text line detection, and lastly reading order deter-
mination. This method can be combined with an efficient OCR engine for digitization
of documents. Considerable amount of work can be found on segmentation of histor-
ical documents in [105]. Hanault et al. [106] proposed a method based on linear level
set concept for binrizing the degraded documents. This method takes advantage of
the local probabilistic models and flexible active contour scheme. In the next section,
we present detailed literature survey on character recognition.
2.3 Character Recognition
The history of character recognition can be traced back as far as 1940, when the
Russian scientist Tyuring attempted to develop an aid for the visually handicapped
[107]. The first character recognizers appeared in mid 1940s with the development of
digital computers. Early work on automatic recognition of characters concentrated
either upon machine printed content or on a small set of well distinguished handwrit-
ten texts or symbols. Machine printed OCR systems in that period generally used
template matching in which an image is compared to a library of images. For hand-
written text, low level image processing techniques were used on the binary images to
extract feature vectors, which are then fed to statistical classifiers. With the explo-
sion of information technology, the previously developed methodologies found a very
28
fertile environment for rapid growth in many application areas as well as OCR sys-
tems development [108], [109]. Structural approaches were initiated in many systems
in addition to statistical methods [110], [111].
The character recognition research was focused basically on the shape recognition
techniques without using any semantic information. This led to an upper limit in the
recognition rate, which was not sufficient in many practical applications. Historical
review of OCR research and development during this period can be found in [112] for
offline and online cases, respectively.
Stubberud et al. [113] proposed a method to improve the performance of an optical
character recognition (OCR) system, by using an adaptive technique that restores
touching or broken character images. By using the output from an OCR system
and a distorted text image, this technique trains an adaptive restoration filter and
then applies the filter to the distorted text image that the OCR system could not
recognize.
Indian language character recognition systems are still in the research stage. Most
of the research work is concerned with Devanagari and Bangla script characters, the
two most popular languages in India. Research work on Bangla character recognition
started in the early 90s. Chaudhuri and Pal [114] have discussed different works done
for Indian script identification. They have also discussed the various steps needed to
improve Indian script OCR development and have developed complete OCR system
for printed Bangla script. This approach involved skew correction, segmentation
and removal of noise. A technique with feature and template matching has been
implemented for recognition. A higher recognition rate was achieved in this method.
Sural and Das [115] have proposed a Hough transform based fuzzy feature ex-
traction method for Bangla script recognition. Some studies are reported on the
recognition of other languages like Tamil, Telugu, Oriya, Kannada, Punjabi, Gu-
jrathi, etc. Pal et al. [116] presented an OCR with error detection and correction
technique for a highly inflectional Indian language, Bangla. The technique was based
on morphological parsing where using two separate lexicons of root words and suffixes,
29
the candidate root-suffix pairs of each input string, are detected, their grammatical
agreement was tested and the root/suffix part in which the error occurred was noted.
The correction was made to the corresponding error part of the input string by means
of a fast dictionary access technique.
Pal and Chaudhuri [117] have proposed a system for classification of machine
printed and hand written text line. They have used a method based on structural and
statistical features of the machine printed and handwritten text lines. They achieved
a score of 98.6% in recognition. This technique used string features extracted through
row and column wise scanning of character matrix.
Pal et al. [118] proposed a new method for automatic segmentation of touching
numeral using water reservoir. A reservoir is a metaphor to illustrate the region where
numerals touch. Reservoir could be obtained by considering accumulation of water
poured from the top or from the bottom of the numerals. Touching character position
(top, middle or bottom) can be decided, by considering reservoir location and size.
Next, analyzing the reservoir boundary, touching position and topological features of
the touching pattern, the best cutting point can be determined. By combining with
morphological structural features the cutting path was generated for segmentation.
Structural and topological features based tree classifier and neural network classifier
has been used for most of the Indian Languages [119].
Some work on recognition of Telugu characters could be traced in the literature.
Elastic matching using Eigen deformation for hand character recognition was pro-
posed by Uchida and Sakoe [120]. The accuracy of recognition was found to be
99.47%. The deformations within each character category are of intrinsic nature
and can be estimated by the principal component analysis of the actual deformation
automatically collected by the elastic matching.
Pujari et al. [121] has proposed an algorithm for Telugu character recognition that
uses wavelet multi resolution analysis to extract features and associative memory
model to accomplish the recognition tasks. Multifont Telugu character recognition
algorithm was proposed by Rasanga et al. [122] using spatial feature of histogram
30
of orientation(HOG). Sastry et al. [123] implemented a methodology to extract and
recognize the Telugu character from palm leaf using decision tree concept.
Human machine interaction using optical character recognition for Devanagari
scripts has been designed by [124]. Shelke and Apte [125] proposed a novel method to
recognize handwritten character using feature extraction based on structural features
and the classification was done using their parameters. The final stage of feature
extraction was done by radon transform and classification was carried out with the
combination of Euclidean distance, feed forward and back propagation neural net-
works. The extended version of their paper, feature extraction, employs generation
of kernels using wavelet transform [126] and Neural networks [127]. Malayalam char-
acter recognition was proposed by John et al. [128] using Haar wavelet transform as
feature extraction approach and support vector machine as classifier. Pal et al. [129]
proposed a method to recognize unconstrained Malayalam handwritten numeral using
reservoir method. The main reservoir based features used were number of reservoirs,
positions of reservoirs with respect to bounding box of the touching pattern, height
and width of the reservoirs and water flow direction etc. Topological and structural
features were also used as feature extraction method along with the water reservoir
method.
Nagabhushan and Pai [130] have worked on Kannada Character Recognition area.
They proposed a method for the recognition of Kannada characters, which can have
spread in vertical and horizontal directions. The method uses a standard sized rect-
angle which can circumscribe standard sized characters. This rectangle can be inter-
preted as a 2-dimensional, 3×3 structure of nine parts which is defined as bricks. This
structure was also interpreted as consecutively placed three row structures of three
bricks each or adjacently placed three column structures of three bricks each. The
recognition has been done based on an optimal depth logical decision tree developed
during the Learning phase and did not require any mathematical computation.
Printed Kannada character recognition system was designed by Ashwin and Satry
[131] using zonal approach and support vector machine(SVM). In their zonal ap-
proach, the character image is divided into a number of circular tracks and sectors.
31
Kannada characters are round in appearance. Text(ON) pixels in the radial and the
angular directions are effective in capturing the shape of the characters. Number of
ON pixels in each zone are taken as the feature set for recognition. They claimed
that their method was faster than the Zernike moment based method by eight times.
Kunte and Samuel [37] presented a paper an OCR system developed for the recog-
nition of basic characters (vowels and consonants) in printed Kannada text, which
can handle different font sizes and font types. Huas invariant moments and Zernike
moments that have been progressively used in pattern recognition are used in the
system to extract the features of printed Kannada characters. Neural classifiers have
been effectively used for the classification of characters based on moment features.
An encouraging recognition rate of 96.8% has been obtained.
Chaudhuri and Bera [132] have proposed a method for text line identification of
handwritten Indian scripts, especially of Bangla, as well as English, Hindi, Malay-
alam etc. They have used new dual method based on interdependency between
text-line and inter-line gap. The method draws curves simultaneously through the
text and inter-line gap points found from strip-wise histogram peaks and inter-peak
valleys. The approach worked well on text of different scripts with various geometric
layouts, including poetry. Lakshmi and Patvardhan [133] developed Telugu OCR
system. Kokku and Chakravarthy [134] have developed a complete OCR System for
Tamil magazine documents which uses RBF neural network for text identification and
character recognition. Shashikiran et al. [135] implemented a method to compare
the results of HMM (Hidden Markov Model) and Statistical Dynamic Time Warping
(SDTW) as classifier for Tamil on-line handwritten character recognition and have
shown that SDTW was better than other methods.
Hirabara et al. [136] presented a two-level based character recognition method
where dynamic zoning selection scheme is presented. Then, features are extracted
from a zone for character recognition. Neural network and a look-up-table were
employed to find the best zoning scheme for unknown English character. Zoning
is a simple way to obtain local information and it has been used for extraction of
topological information from patterns [137]. The goal of zoning is to obtain local
32
characteristics as opposed to global characteristics. The resulting partitions allow to
determine the position of specific features of the pattern to be recognized [138], fixed
or symmetrical zoning [139], [140], [141].
Online handwriting recognition of Kannada characters was implemented by com-
bining Direction based Stroke Density principle(DSD) with Kohonen Neural Network
(KNN). DSD principle forms the basis for feature selection whereas the subsequent
classification stage is carried out by K- nearest neighbor [142]. Another work on
online handwritten Kannada character recognition proposed by Prasad et al. [143],
used the divide and conquer technique to reduce the number of combinations in the
compound character as it contained more than one consonants with vowels. The
structural and the dynamic features are used to segment the compound Kannada
characters into 282 distinct symbols. This reduction has helped to overcome the
huge data collection problem and also reduced the computational complexity. In the
next step, these symbols are further divided into three distinct sets of stroke groups,
thus, further reducing the search space for the recognition engine. Combining one
or more of these stroke groups will usually form thousands of Kannada compound
characters. PCA was used as dimensionality reduction method. The subspace fea-
tures of distinct stroke groups are fed to the respective classifiers in an order and the
output of these classifiers are combined to get the unicode of the recognized akshara.
The proposed work is an attempt made for the first time in Kannada language which
considers all possible combinations of symbols, including Kannada numerals.
Kunte and Samuel [144] has proposed another algorithm to address the problem
of Vatthaksharas using connected component analysis(CCA) and projection profile.
Initially CCA was used to extract the individual characters and then vertical projec-
tion profile was employed to extract remaining characters. Authors claimed that the
proposed method works well for other languages. Urolagin et al. [145] proposed a
method for Braille translation of Kannada character. They have employed decision
tree and three modular classifiers. Similar shaped characters were grouped and then
partitioned into categories at various levels to effectively create a decision tree. The
Braille equivalent of Kannada characters was obtained by using translation rules.
Authors claimed 93.8% accuracy in classification and translation. Sheshadri et al.
33
[146] proposed a method for segmenting the Kannada character by decomposing each
character into components from three base classes and K means clustering technique
was employed to recognize the character.
Dandra et al. [147] proposed a method for recognition of handwritten Kannada
and English characters using zonal method. Each character was divided into 64 zones
and pixel densities were calculated for normalized character image size of 32 × 32.
Two different classifiers were employed to classify the character and compared the
performance. Authors claimed that their system works for non thinned and slanted
characters.
The recognition of Indian and Arabic handwriting is drawing increasing attention
in recent years. To test the promise of existing handwritten numeral recognition
methods and provide new benchmarks for future research, Chang et al. [148] pre-
sented some results of handwritten Bangla and Farsi numeral recognition on binary
and gray-scale images. For recognition of gray-scale images, they have proposed a
method with proper image pre-processing and feature extraction. Experiments on
three databases, ISI Bangla numerals, CENPARMI Farsi numerals, and IFHCDB
Farsi numerals have achieved very high accuracies using various recognition meth-
ods.
2.4 Summary
In this chapter, we have given detailed literature survey on image enhancement,
segmentation, skew detection and correction, feature extraction and recognition. In
the next chapter, we present image enhancement algorithms using spatial domain
techniques to enhance historical document images.
34
Chapter 3
Enhancement of Degraded
Historical Documents : Spatial
Domain Techniques
1
3.1 Introduction
A digital image is an image f(x, y) that has been discretized in both spatial coordi-
nates and intensity. Both spatial coordinate and intensity value constitute the pixel
or picture element. Processing of the image is performed through pixel operations
1Some of the material of this chapter appeared in the following research papers
1. B. Gangamma, Srikanta Murthy K, “Enhancement of Degraded Historical Kannada Documents”, Interna-
tional Journal of Computer Applications (0975a 8887), Volume 29 No.11, pages 1-6, September 2011.
2. B. Gangamma, Srikanta Murthy K, “An Effective Technique using Non Local Means and Morphological
Operations to Enhance Degraded Historical Document”, International Journal of Electrical, Electronics and
Computer Systems, Volume 4, Issue 2, pages 1-10, 2011.
3. B. Gangamma, Srikanta Murthy K, “Enhancement of Historical Document Image using Non Local Means
Filtering Technique”, IEEE International Conference on Computational Intelligence and Computing Research
(ICCIC), Kanyakumari, pages 1264-1267, 2011.
4. B. Gangamma, Srikanta Murthy K , Arun Vikas Singh, “Hybrid Approach Using Bilateral Filter and Set
Theory for Enhancement of Degraded Historical Document Image”, CiiT International Journal of Digital
Image Processing, DOI: DIP052012012, Volume 4, No 9, Issue May 2012, pages 488-496, 2012.
35
and any change in the pixel intensity or spatial coordinate values changes the input
image. In general, image processing operations can be divided into four categories:
pixel operations, local operations, global operations, and geometric operations[149].
Pixel operations operate on individual pixels. Some of the examples are image in-
tensity addition, subtraction, contrast stretching, inversion of an image, logarithmic
and power-law transformation. In the local operations, the pixel value is influenced
by neighboring pixel value. The size of the neighborhood depends on the type of ap-
plications. Morphological filters, convolution, edge detection, smoothing filters and
sharpening filters are operations that fall under this category. The global operation
takes the entire image into consideration and processes the pixel. To name a few:
distance transformation of an image, histogram equalization and specification, im-
age warping, Hough transform, spatial-frequency domain transforms, and connected
components analysis etc. Geometric operations takes only required set of pixels that
is calculated by geometric transformation and changes the value of a specified pixel.
However usage of these techniques for enhancing historical document is not straight
forward. As discussed in chapter 1, Kannada Historical documents pose various prob-
lems like: low contrast, uneven background, noise accumulation, broken, erased and
blotched characters, cracks, holes(palm leaf and paper), the enhancement of such de-
graded document images is a real challenge to the research community. Hence there
is a dire need for developing image processing algorithms to enhance these degraded
document image by eliminating noise and uneven backgrounds and also enhance the
character[149] for further recognition. In this chapter, we present three spatial do-
main preprocessing techniques to enhance the degraded historical Kannada document
images.
It has been observed from the literature survey that a reasonable amount of work
has been reported in the area of historical document image processing [18], [95], [15],
[16]. Noticeable amount of work can also be found in the area of historical document
processing of Indic scripts [4], [91], [92], [96], [19], [8]. Few authors have worked
towards noise elimination, segmentation and era prediction of stone inscriptions of
Kannada. However, literature survey reveals that not much work has been carried
out on Kannada documents inscribed on the palm leaf and paper. In this research
work, we consider all three types: stone inscriptions, palm and paper documents
36
under one heading as historical documents. In this chapter, we present three spatial
domain techniques to enhance the degraded documents.
The remaining part of this chapter is organized into following sections; In section
3.2, we explain the methodology which is implemented based on gray scale morpho-
logical reconstruction technique; In section 3.3, we present another image enhance-
ment based on bilateral filter technique; In section 3.4, we explain Non local means
filter technique based approach for image enhancement. Experimental results are
compared for the three techniques in section 3.5 and the summary of the work is
provided in section 3.6.
3.2 Gray Scale Morphological Reconstruction (MR)
Based Approach
The basic goal of any document image enhancement technique is to enhance the im-
age for binarization so that binarized image can be segmented into two classes: one
is foreground containing text and another is clear background. Global threshold-
ing algorithms work well for clean images. Local thresholding methods like Saulova,
Niblack will work fine for low contrast and unevenly illuminated images. However
these methods cannot be used directly on degraded documents. The results of bina-
rization(thresholding) using global thresholding method on the noisy images shown
in Figure(3.5) and Figure(3.7) are shown in Figure(3.6) and Figure(3.8)). These bi-
narized images are not suitable for segmentation of document image into lines, words
and characters which are further used to recognize the character. Any recognition sys-
tem is completely dependent on the output from its previous stages. So, there is need
for image enhancement techniques or combination of these techniques to enhance the
image, so that we can binarize the enhanced image directly using global threshold-
ing methods. In this section, we present an algorithm for degraded document image
enhancement using a combination of Adaptive Histogram Equalization(AHE), MR
technique and Gaussian filer. AHE is used for contrast enhancement of low contrast
image. Gay scale morphological operations, opening and closing are used for elim-
ination of uneven background and for suppressing the finer details. Gaussian filter
37
is employed to suppress the background and to normalize the background intensity
along with smoothing. In the following sub sections, we present, brief explanation
about the techniques used in this method.
3.2.1 Overview of Mathematical Morphology
Image-processing techniques have developed exponentially in the past five decades
and among them, mathematical morphology has received a great deal of interest be-
cause it provides a quantitative description of geometric structure and shape while
also providing a mathematical description of algebra, topology, probability, and in-
tegral geometry [150]. Very few authors have used mathematical morphology for
document image enhancement. Ye et al. [151] proposed a method for extraction of
bank check items using morphological operations. Shetty and Sridhar [152] proposed
a method for background elimination of bank checks using gray scale morphological
operations. Mengucci and Granado [153] implemented a method for separating text
and figures from the book using morphological operations.
Mathematical morphology is a tool used for extracting image components that
are useful for representation and description of region of shape, such as boundaries,
skeletons and the convex hull. It can also be used as a tool for pre or post processing
such as, morphological filtering, thinning, and pruning [154].
The two basic morphological set transformations are erosion and dilation. These
transformations involve the interaction between an image A (the object of interest)
and a structuring set B, called the structuring element. Typically the structuring
element B is a circular disc or rectangle in the plane, but it can be of any shape and
any dimension.
Dilation: With A and B as sets in Z2 (set of Integer), the dilation of A by B
denoted as A⊕ B, is defined as
A⊕B = {Z(B)z∩ 6= ⊘} (3.1)
Erosion: With A and B as sets in Z2 (set of Integer), the erosion of A by B
denoted as A⊖ B, is defined as
A⊖ B = {Z(B)z ⊆ A} (3.2)
38
(a) (b) (c)
Figure 3.1: (a) Input image. (b) Result of binary morphological dilation operation.
(c) Result of binary morphological erosion operation.
where A in equation (3.1) and (3.2), is the object image and B is structuring element
of any size, but less than or equal to the size of A. Dilation expands the region of the
object. Erosion shrinks or thins objects in an image. Figure(3.1) shows the result of
dilation and erosion on object image A by structuring element B.
Opening and Closing: Erosion and dilation can be used in a variety of ways,
in parallel and series, to give other transformations including thickening, thinning,
skeletonization and many others. Opening and closing are two very important trans-
formations. Opening generally smooths contour in an image, breaking narrow isth-
muses and eliminating thin protrusions. Closing tends to narrow smooth sections of
contours, eliminating small holes, filling gaps in contours, fusing narrow breaks and
long thin gulfs [154]. The opening of image A by structuring element B, denoted by
A ◦B, is given by the erosion by B, followed by the dilation by B, that is
A ◦B = (A⊖ B)⊕ B (3.3)
Opening is like rounding from the inside of an object/structure. The opening of
A by B is obtained by taking the union of all translates of B that fit inside A and
parts of A that are smaller than B are removed. Closing is the dual operation of
opening and is denoted by A •B. It is produced by the dilation of A by B, followed
by the erosion by B:
A •B = (A⊕ B)⊖ B (3.4)
Figure(3.2) shows the application of opening and closing operation on input image.
Opening operation smooth the object from the outside. Holes are filled in and narrow
39
(a) (b) (c)
Figure 3.2: (a) Input image. (b) Result of binary morphological opening operation.
(c) Result of binary morphological closing operation.
valleys are closed in closing operation. Because opening suppresses the bright details
smaller than the structuring element and closing suppress dark details, these are
used in combination for image smoothing and noise removal. Opening can be used
to compensate for non uniform background illumination. Also subtracting an opened
image from original image produces even background [154].
In binary morphology both image A and structuring element B are binary images,
the operations applied on the two sets are logical operations such as AND, OR, and
COMPLEMENT, commonly referred as binary morphology. The output is also in the
form of binary image. Gray scale morphology operations are based on finding local
maxima and minima in specified window. In many cases, gray scale morphological
processing adopts symmetrical structuring elements so as to reduce computational
complexity [150]. The erosion of A by structuring element B at any location(x,y) is
defined as the minimum value of the image in the specified local region centred at
(x,y). Gray scale erosion is defined by following equation
[A⊖B](x, y) = min(s,t)∈B{f(x+ s, y + t)} (3.5)
Gray scale erosion computes the minimum intensity value of A in every local region,
eroded image will be darker than original image. Noise smaller than structuring
element will be eliminated. The gray scale dilation of A by B is defined by finding
the maximum value of the image in the window outlined by B and is given by
[A⊕ B](x, y) = max(s,t)∈B{f(x− s, y − t)} (3.6)
40
(a) (b) (c)
Figure 3.3: (a) Original Gray scale image. (b) Result of gray scale dilate operation
on image. (c) Result of gray scale erosion operation on image.
(a) (b) (c)
Figure 3.4: (a) Original Gray scale image. (b) Result of gray scale closing operation
on image. (c) Result of gray scale opening operation on image.
Result of gray closing and opening operation on input image are shown in figure
Figure(3.3).
Gray scale Opening and Closing Formulae for opening and closing for gray
scale morphology are same as binary morphology as specified in equation (3.3) and
(3.4). Figure(3.4) shows the result of gray closing and opening operations. In or-
der to enhance the degraded document, mathematical morphological operations gray
scale opening and closing are used with reconstruction technique. Before applying
these operations contrast of the input image is enhanced using adaptive histogram
equalization.
Morphological image processing also deals with one more concept called Mor-
phological Reconstruction which is based on dilation, erosion, opening and clos-
ing operations. Morphological reconstruction processing is based on two images, a
marker and a mask, rather than an image and a structuring element. Processing is
41
completely based on the concept of connectivity, rather than a structuring element.
In our method we use morphological reconstruction technique based on opening and
closing. The main aim is to get a clear background by suppressing the noise.
3.2.2 Adaptive Histogram Equalization(AHE)
Histogram equalization is a technique used to adjust the image intensities in order
to enhance the contrast. Sometimes, the overall histogram of an image may have a
wide distribution, while the histogram of local regions is highly skewed towards one
end of the gray spectrum. In such cases, it is often desirable to enhance the contrast
of these local regions, rather than entire region using global histogram equalization.
To enhance the contrast effectively, AHE method is employed. In AHE method,
the image is divided into number of regions and different regions of the image are
processed differently depending on local properties.
3.2.3 Gaussian Filter
The smoothing filters are used to smooth the noisy image. Smoothing eliminates
noise and blurs the image. One of the smoothing filters is Gaussian smoothing fil-
ter(operator) is a 2-D convolution operator that is used to blur images and remove
detail and noise. It is like a mean filter, but it uses a different kernel that represents
the shape of a Gaussian (bell-shaped) hump. The Gaussian distribution in 1-D has
the form:
G(X) =1√2πσ
e−x2
2σ2 (3.7)
where σ is the standard deviation of the distribution of the Gaussian kernel. In
2-D, an isotropic (i.e. circularly symmetric) Gaussian has the form:
G(X, Y ) =1
2πσ2e−
x2+y
2
2σ2 (3.8)
The degree of smoothing filter is determined by the standard deviation of the
Gaussian kernel. The Gaussian outputs a weighted average of each pixels neigh-
borhood, with the average weighted more towards the value of the central pixels.
Because of this, a Gaussian provides gentler smoothing and preserves edges better
42
than a similarly sized mean filter. In the next subsection, we present the proposed
methodology to enhance the degraded document using a combination of AHE, gray
scale morphological operations and Gaussian filter is applied to normalize the back-
ground intensity.
Figure 3.5: Noisy palm leaf document image belonging to 16th century.
(a)
Figure 3.6: Binarized noisy images of Figure(3.5).
3.2.4 Proposed Methodology
The proposed method consists of four major stages shown in flow chart Figure(3.9)
and are detailed in the following paragraphs.
Stage 1: The degraded noisy original color image with low contrast and uneven
illumination is taken as input. The color image is converted to gray scale image using
43
Figure 3.7: Original image of palm leaf script belonging to 16th century.
Figure 3.8: Binarized noisy image of Figure(3.7).
equation(3.9).
Y = 0.2126R+ 0.7152G+ 0.0722B (3.9)
where Y is gray scale image and R, G, B are red, green and blue components of
color image. Processing of color image is complex and is not necessary for our work,
as we need to binarize the document image for segmentation and recognition. Also,
we are concentrating on the foreground(character) and background pixel intensities.
Then AHE is applied on gray scale image to get histogram equalized and contrast
enhanced image. AHE calculates the multiple local histograms and equalizes the
image intensity. This image is referred to as R1 and is shown in Figure(3.10)(a) and
(b) on images shown in Figure(3.5) and Figure(3.7).
44
(a) (b)
Figure 3.10: AHE result on images shown in Figure(3.5) and Figure(3.7)
Stage 2: This stage consists of two steps: first morphological gray scale opening
and second reconstruction stage. In first step, morphological gray scale opening
operation is applied on stage 1 output, i.e R1 to get opened image R2. Resulting
image of opening operation is added to adaptive histogram equalized image(R1) to
get image R3 in the second step, with clear background which is further used as
input to next step. Here concept of morphology reconstruction is applied on opened
image which acts as marker and histogram equalized image which acts as mask. Since
the opening operation removes small bright features, the result of opened operation
will be darker than the original image. Opening can also be used to compensate
for non uniform background illumination. Addition of R1 and R2 produces bright
background image. This intermediate result image R3 is an enhanced image with
good contrast showing clear separation of text and background. The images shown
in Figure(3.11)(a), (b) are the opening operation on R1 and Figure(3.11)(c), (d) are
the reconstructed images.
Stage 3: This stage in turn contains two steps : first morphological closing on
R3 that is output of stage 2; and second part consists of reconstruction steps. As
closing operation suppresses dark details, it is used to remove darker pixels smaller
than structuring element. The result of this is shown in Figure(3.12)(a), (b). Further
reconstruction step consists of subtracting R4, the closed image from R1, giving
intermediate reconstructed image R5. Again R5 is subtracted from opened image R2
to get R6. A combination of opening and closing operations are most suitable for
image smoothing and noise removal. The result of this stage is shown in Figure with
uniform background intensity and smoothed image. Results of the stage 3 are shown
in Figure(3.12)(a)-(f).
46
(a) (b)
(c) (d)
Figure 3.11: Result of stage 2. (a), (b) are results of opening operation on images
shown in Figure(3.10)(a), (b). and (c), (d) are results of reconstruction technique.
(a) (b)
(c) (d)
(e) (f)
Figure 3.12: Result of stage 3. (a), (b) Results of closing operation on stage 2 output
images shown in Figure(3.11)(a), (b). (c), (d) Subtraction of R1 from R4. (e), (f)
Subtraction of result of previous step from R2.
47
Stage 4: Enhanced image is further subjected to Gaussian filtering to eliminate
noise that are larger than structuring element, as it provides gentler smoothing and
preserves edges better than a similarly sized mean filter. The result of Gaussian filter
on R6 produces the smoothed image R7. Reconstruction techniques is applied on
R7 and R1 with addition operation to get enhanced image R8 having clear bright
background. Lastly thresholding is applied on R8 to get binarized image using global
threosholding method by Otsu [53]. The proposed method The result of Gaussian
filter, reconstruction step and binarization are shown in Figure(3.13)(a), (b), Fig-
ure(3.14)(a), (b) and Figure(3.15)(a), (b). The proposed method is very fast as it
uses simple linear and non linear filters and computational complexity/cost is very
low which is almost equal to M , where M and N represent the size of image.
(a) (b)
Figure 3.13: (a), (b) Results of Gaussian filter on images shown in Figure(3.12((e),
(f).
(a) (b)
Figure 3.14: Morphological reconstruction technique on images shown in Fig-
ure(3.13)(a), (b).
3.2.5 Results and Discussion
Experimentation is conducted on the Kannada historical document data set contain-
ing 2700 images of palm, paper and stone inscriptions having various sizes. Enhanced
48
(a)
(b)
Figure 3.15: Binarized images of Figure(3.14)(a),(b).
images have clear background due to which it produced proper binarized images. Pa-
per documents pose problems like: decoloring, bleeding through, ink seepage, stains,
holes etc. The proposed method has been applied to enhance the degraded paper
documents. Images shown in Appendix 1 Figure(B.1), Figure(B.2), Figure(B.3) and
Figure(B.4) are paper manuscripts that are nearly one and half centuries old. The
dark brown color, stains, and dust accumulation have resulted in a low contrast image.
The result of the proposed method on these documents are shown in Figure(3.16)(a),
(b), (c), (d) respectively.
Palm scripts will become dark brown after repeated application of disinfectant to
prolong their duration. These scripts usually have low contrast with characters sub-
merged in the background making the smoothing operation difficult. Also this dark
background makes the script illegible. So these scripts need thorough preprocessing
before preservation. The proposed method works well even for the documents and en-
49
hances the images by eliminating noisy background and produces a clear image with
almost white background. The palm leaves are taken from a collection of manuscripts
from 16th to 18th centuries. Images shown in Figure(A.1) and (A.3) are palm leaf
scripts with low contrast and noise. Enhanced images are shown in Figure(3.17)(a),
(b).
The experimentation has also been performed on stone inscription images. Results
of MR based method on stone inscription images are shown in Figure(3.19) (a) and
(b) and Figure(3.18) along with input images of stone inscriptions belonging to 14th
- 17th centuries shown in Appendix 1 Figure(C.1), Figures(C.3) and Figure(C.2).
However this method is unable to enhance the images shown in Figure(C.1) and
Figure(C.2) properly. The stone inscriptions are severely degraded and are difficult
to process and extract the information. Therefore improving the algorithm is a real
challenge.
The proposed method is compared with average, median and Gaussian filters.
These are well known smoothing algorithms used to eliminate noise. But sometimes,
these methods also smooth out the edge information causing a kind of blurring effect.
If the filter size is more, the smoothing algorithm produces a more blurred image and
makes the task of binarization very difficult. Binarized images are used as evaluation
criteria for evaluating the performance of the enhancement technique. Binarized im-
ages of the three smoothing are compared against binarized image of MR method and
results are shown in Figure(3.20) and MR outperforms the compared three smoothing
filters.
The proposed MR method is able to enhance the low contrast and noisy documents
with reasonable complexity, but fails in eliminating the noise completely from severely
degraded document images. The MR method requires proper selection of size of the
structuring element and filter mask. The selection of the parameters is difficult due to
various factors for degradation. Also the size of the inscriptions, size of the character,
status of the inscribed material and the age of the document make the enhancement
process much more complex. Resolution of the camera with which the document
was captured, lighting conditions during image acquisition also contribute to severe
50
(a) (b)
(c) (d)
Figure 3.16: (a), (b), (c), (d) Results of MR based method paper images shown
in Appendix 1 Figure(B.1), Figure(B.2), Figure(B.3) and Figure(B.4) belonging to
nineteenth and beginning of twentieth century.
51
(a)
(b)
Figure 3.17: (a), (b) Results of MR based method on image of palm leaf shown in
Appendix 1 Figure(A.1) and (A.3) belonging to 16th to 18th century .
52
Figure 3.18: Result of MR based method on sample image taken from Belur temple
inscriptions Figure(C.2) belonging to 17th century AD.
(a) (b)
Figure 3.19: (a), (b) Result of MR based method on stone inscriptions shown in
Appendix 1 Figure(C.1), Figures(C.3) belonging to 14− 17th century.
degradation. Hence there is need for an improved technique which addresses few of
these problems and also attempts to enhance the severely degraded documents. In the
next section, we present, one more method based on bilateral filter and combination
of morphological operations and Gaussian filter.
53
(a) (e)
(b) (f)
(c) (g)
(d) (h)
Figure 3.20: Comparison of proposed method with Gaussian, Average and Median
filter. Figures (a), (b), (c), (d) show the result of respective methods and figures (e),
(f), (g), (h) show the binarized images of (a), (b), (c), (d).
3.3 Bilateral Filter (BF) Based Approach
The second spatial domain technique based on bilateral filtering in combination with
first method morphological reconstruction has been developed to address the severely
degraded documents. Bilateral filtering has been introduced by Tomasi and Manduch
[155] as a nonlinear filter which combines domain and range filtering. This bilateral
filtering is widely used to eliminate the noise, without losing the edge information.
54
Many authors have used bilateral filter in various applications. Barash [156] imple-
mented a common framework for nonlinear diffusion, adaptive smoothing, bilateral
filtering, and the mean shift paradigm. Hamarneh and Hradsky [157] extended the
well-known scalar image bilateral filtering method to diffusion tensor(DT) magnetic
resonance images. Bilateral image filtering scalar version was extended to perform
smoothing with edge-preserving of diffusion tensor. Authors applied bilateral DT
filtering in the Log-Euclidean framework to guaranteed valid output tensors. Bazan
and Blomgren [158] proposed a new image smoothing and edge detection technique
by combining nonlinear diffusion and bilateral filtering and developed a simple dif-
fusion criterion function based on second derivative of the fact that depends on the
correlation between noisy image and filtered image. May authors have used bilateral
filters directly, indirectly, in combination with other techniques to denoise the noisy
images and to enhance the degraded images. The following section provides brief
overview of the bilateral filter.
3.3.1 Overview of Bilateral Filter
Bilateral filter is a non linear filter in spatial domain, which does averaging without
smoothing the edges. The bilateral filter takes a weighted sum of the pixels in a
local neighborhood; the weights depend on both spatial and intensity distance. This
weighted sum is nothing but a product of two Gaussian filter weights, one of which
corresponds to average intensity in the spatial domain, and second weight corresponds
to the intensity difference. Hence no smoothing occurs, when one of the weights is
close to 0. It means that the product becomes negligible around the region, where
intensity changes rapidly, which usually represents the sharp edges. As a result, the
bilateral filter preserves sharp edges. Mathematically, at a pixel location p of image
I, the output of a bilateral filter is calculated as follows.
BF [I](p) =1
Wp
∑
q∈S
Gσr(|I(q)− I(p)|)Gσs(‖ p− q ‖)Iq (3.10)
where normalization factor Wp ensures pixel weights sum to 1.0:
W (p) =∑
q∈S
Gσr(|I(q)− I(p)|)Gσs(‖p− q‖) (3.11)
55
where Gσ is Gaussian filter given by the equation
Gσ(x) =1
2πσ2exp(− x2
2σ2) (3.12)
Gaussian filtering is a weighted average of the intensity of the adjacent positions
with a weight decreasing with the spatial distance to the center position p. The
weight for pixel q is defined by the Gaussian Gσ(‖p − q‖), where σ is a parameter
defining the neighborhood size.
Parameters σs and σr are parameters controlling the fall-off of weights in spatial
and intensity domains, respectively. σs controls the amount of the standard Gaussian
spatial filtering while σr controls the discrimination power between true features and
noises. It has been observed that a large σs causes more smoothing [155]. The
bilateral filtering will behave similar to the normal low pass filter for too large σr or
no filtering for too small σr. When σr is infinite, the bilateral filter is reduced to a
normal low pass filter. Therefore, the choice of these two parameters will be essential
as it affects the performance of bilateral filtering. The time complexity of this filter
is O(N ∗M ∗D2), where N ∗M is the size of the image and D2 is the product of two
Gaussian filter weights, while the first corresponds to average intensity in the spatial
domain, the second weight corresponds to the intensity difference.
3.3.2 Proposed Methodology
We present, second enhancement method based on combination of bilateral filter,
mathematical morphology reconstruction techniques and Gaussian smoothing. The
flow chart for the proposed method which consists of three stages is shown in Fig-
ure(3.10) and is detailed in the following paragraph in detail.
Stage 1 : The color image is converted into gray scale image shown in Fig-
ure3.22(a) using the equation(3.9). Bilateral filter is applied on the gray scale image
to get filtered image R1. As mentioned in subsection 3.3.1, bilateral filter denoises
the image without smoothing out the edges. The result of stage 1 is shown in Fig-
ure(3.23)(a).
56
(a)
(b)
Figure 3.22: (a) Input image of the palm leaf manuscript belonging to 18th century.
(b) Its binarized version.
58
Stage 2 : Morphological gray scale opening is applied on R1 output of the first
Stage 1, with disk structuring element and output is R2 image. Output of opening
operation R2 is subjected to closing operation to get result image R4. R1 and R3 are
added to get R4 as morphological reconstruction step. As we explained in the previous
MR method, the opening and closing operations are suitable in suppressing noise
edging gap between objects and filling holes. These are effective in reconstruction of
the objects.
Stage 3 : Intensity normalization is done to suppress the background with uneven
intensity using Gaussian filter with large window size and standard deviation. The
reconstructed image R4 is blurred and the blurred image R5 is subtracted from the
bilateral filtered image R1 to get R6. Again the resulting image R6 is added to the R1,
bilateral filtered image and reconstructed image R in order to get the enhanced image
R7. The morphological dilation is applied to smooth the edges by dilation operation.
The result image R8 is binarized using global thresholding Otsu [53] method. The
result of stage 2 and 3 are shown in Figure(3.23)(a) and (b).
3.3.3 Results and Discussion
Bilateral filter(BF) based approach is applied to enhance the Kannada historical doc-
ument images. Experimentation has been performed on three types of digitized image
with varying size as well as 512×512 size images. The proposed method enhances the
degraded image by eliminating noise and uneven background. The binarized image
of the preprocessed image can be further used to segment the document into lines,
words and characters for recognition purpose. So the preprocessing stage plays a very
important role in pattern recognition. The accuracy of the recognition system com-
pletely depends on the features extracted. Features extraction in turn depends on the
segmentation of the character from the binarized image. So preprocessed image gives
better binarized images and improves the recognition rate of any recognition system.
Therefore preprocessing algorithms are required to enhance the degraded document
image.
59
(a)
(b)
(c)
Figure 3.23: (a) Filtered image using BF method. (b) Final result of the BF method.
(c) Binarized version of enhanced image.
60
The result images of the proposed BF method are shown in Figure(3.23). The
input noisy image shown in Figure(3.22)(a), result of BF method in Figure(3.23)(a),
enhanced image using morphological operations in Figure(3.23)(b). The binarized
version of the enhanced image and binarized version of noisy image are shown in
Figure(3.23)(c). Results of BF method on paper document images are shown in Fig-
ure(3.24)(a), (b), (c), (d) on images shown in Appendix 1 Figure(B.1), Figure(B.2),
Figure(B.3) and Figure(B.4) respectively.
The palm leaf document images are taken from collection of manuscripts from
16th to 18th centuries. The results of BF method on palm leaf images are shown
along with the input images in Figure(3.25) (a) and (b) on input palm leaf images
shown in Figure(A.2) and Figure(A.5). The MR and BF method results are given
in Figure(3.26). Experimentation on Figure(A.2) and Figure(3.7), Figure(A.6) are
shown in Figure(3.27)(a), (b) and Figure(3.28) respectively.
The experimentation on stone inscriptions images are shown in Figure(3.29)(a),
(b) and Figure(3.30)(a) are enhanced images of Figure(C.1) and Figure(C.3) and
Figure(C.2) respectively. The results of BF method are better than MR method.
The edges are sharper than MR method. BF enhances the severely degraded images
by preserving sharp edges. The performance of the proposed method completely
depends on the parameters σd and σr, as these parameters are controlling parameters
in spatial and intensity domains, respectively. Experimentation has been conducted
using Gaussian window size 4, σd = 7 and σr = 0.3. These values are selected based
on experimentation. The choice of these two parameters has produced good results
compared to the first method. However this method is unable to handle all types
of degradations and fails in handling and enhancing the degraded documents. To
enhance the degraded documents, another algorithm is developed in the next section
using Non Local Means filter based approach.
61
(a) (b)
(c) (d)
Figure 3.24: (a), (b),(c),(d) Results of BF based method on input paper images in
Figure(B.1), Figure(B.2), Figure(B.3) and Figure(B.4) respectively.
62
(a)
(b)
Figure 3.25: (a), (b) Results of BF based method Figure(A.4 and Figure(A.5.
(a) (b) (c)
Figure 3.26: (a) Input image of palm leaf manuscript. (b) Result of MR based
method. (b) Enhanced image using BF based method.
63
(a)
(b)
Figure 3.27: (a) (b) are results of BF based method on input image in Figure(A.2)
and Figure(3.7).
Figure 3.28: Result of BF based method on image Figure(A.6)
64
(a) (b)
Figure 3.29: (a), (b) Results of BF based method on image in Figure(C.1) and
Figure(C.3).
Figure 3.30: Result of BF based method on Figure(C.2) Belur temple inscriptions
belonging to 17th century AD.
65
Figure 3.31: Non Local Mean Filter Approach. Small patch of size 2p+ 1 by 2p+ 1
centred at x is the candidate pixel, y and y′ are the non local patch within search
window size 2k + 1 by 2k + 1.
3.4 Non Local Means Filter (NLMF) Based Ap-
proach
Smoothing algorithms are usually employed for noise elimination. But these algo-
rithms blur the image in the process of elimination of noise. Isotropic, median and
mean filters average the pixel values in the direction of contours and they tend to
preserve straight curvature but fail to preserve corners. Wavelet denoising is one
technique widely used for filtering. Wavelet filters try to separate the input image
into true image and noisy image by removing the higher frequencies components from
the lower frequency components by assuming that high frequencies corresponds to
noise. When the high frequencies are removed, the high frequency content of the
true image will be removed along with the high frequency noise because the method
cannot categorize the components as noise and true image components [17], resulting
in a loss of finer details in the denoised image. Noise in the low frequency components
will be left unaddressed after filtering. To address the loss of detail in the filtered
image, Antoni Buades et al. [17] developed the non-local means filter.
66
3.4.1 Overview of Non Local Means Filter
By taking the advantage of similar sub window in the same image, Antoni Buades
et al. [17], introduced Non-Local Means filter for image denoising. This filter uti-
lizes spatial correlation in the entire image for noise removal and adjusts each pixel
value with a weighted average of neighborhood pixels that has a similar geometrical
structure. Even movie denoising can be achieved by employing Non Local Means Fil-
ter(NLMF) [159]. Given a discrete noisy image S = S(x)|x ∈ I, where I represents
set of images, the computed value of NL{S(x)} for the pixel S is given by
NL{S(x)} =∑
y∈N(x)
w(x, y)S(y) (3.13)
where x is the candidate pixel, y is the pixel in the neighborhood N . Similarity
between two patch weights can be computed using Euclidean distance and weight
function for the Gaussian kernel window is given by
w(x, y) = − 1
C(x)exp
(‖ S(x)− T (y) ‖22,ah2
)
(3.14)
where a > 0 is the standard deviation of Gaussian Kernel, h is the parameter which
controls the degree of the filtering and C(x) is normalizing constant given by
C(x) =∑
y∈N(x)
exp
(‖S(x)− T (y)‖22,ah2
)
(3.15)
Size of the patch window shown in Fig(3.31) should be 2p+ 1 by 2p+ 1, and size of
the search window should be greater than patch window and is 2k + 1 by 2k + 1.
Figure(3.32) shows a palm script image with low contrast, dark letters. The doc-
ument image has to be preprocessed before performing segmentation on it. Non local
mean filter technique works well for denoising the noisy pixel values with similarity
measure of the non local neighbor window. The complexity of the non local filter
algorithm is K2 ∗ P 2 ∗ N ∗M , where P = 2p + 1, K = 2k + 1, N ∗M are the total
number of pixels in the image. Center pixel in the Figure(3.31) is the candidate pixel
for evaluation within search window of size 2k + 1 by 2k + 1, and other surrounding
pixels are the centers of the neighboring patches. New value is calculated for the
67
Figure 3.32: Input palm script image with low contrast.
candidate pixel based on the similarity measure between candidate pixel patch and
surrounding patch. Then replace the value of the average of non local windows, which
is having highest similarity value. The patch having stronger influence on the denois-
ing of pixel x is selected for replacing the pixel value by taking average of that patch
window. Eq(3.14) will be used to find the similarity measure between two patches.
New value of the candidate pixel will be calculated by Eq(3.13). The parameter h
controls the filtering process and gives prominent result in the range of 10 to 15 in
the case of document images. For smaller size document with small font size images
k and p should be in between 4 to 5 and 2 to 4 respectively to preserve the edge
details and sharp continuity.
3.4.2 Proposed Algorithm
The proposed algorithm for enhancement of the degraded document image uses a
combination of mathematical morphology and Non Local Mean filter (NLMF). It
employs gray scale mathematical morphological operations opening and closing in
combination with NLMF technique. The method consists of three stages and is
explained in details in the following paragraphs.
Stage 1 : The color image is converted into gray scale image R using the equa-
tion(3.9). As we discussed in the MR method, opening and closing operations are
helpful in removing noise around characters by using suitable structuring elements
and also to bridge the gap between strokes of the characters shown in Figure(3.33).
Gray scale morphological opening operation is applied on input image R to get R1
68
Figure 3.33: Result of NLMF method with residual image on Figure(3.32).
69
(a)
(b)
Figure 3.34: (a) Result of NLMF based method on image shown in Figure(3.32). (b)
Binarized image.
70
(a) (b)
(c) (d)
Figure 3.36: (a) Original image. (b) Filtered image using NLMF. (c) Binarized image
of the proposed NLMF method. (d) Binarized noisy image using Otsu method.
image. Morphological reconstruction is performed to get R2, by adding R1 and R.
Further gray scale closing operation is applied to get R3. Subtraction of R3 from
input image R reconstructs the image R4.
Stage 2 : The dilation operation on image R4 dilates the boundary of the char-
acter so that it bridges the gap between broken character. The result image R5 is
shown in Figure. Further NLMF is applied to remove the noise present in the image
by replacing the pixel value with the value of similar geometrical structure in the
neighborhood window.
Stage 3 : Postprocessing is performed to eliminate background by applying open-
ing on filtered image and then filtered image is added back to NLM filtered image.
The main advantage of NLMF is that denoising will be performed on the image if it
contains noise. Otherwise the image will not be altered, as it will not find maximum
similar patch in neighboring window. Finally Otsu method is applied to get the bi-
72
narized image. The proposed algorithm is given below.
3.4.3 Results and Discussion
Experimentation has been conducted on images of palm script, paper and stone
inscriptions with variable size and resolution. The parameters for patch window size
p, search window size k and filter controlling parameter h are selected by observing
the character size. Usually the values of k, p, and h can be 4 to 5, 2 to 4 and 10 to 15
respectively. Initially the values for k, p and h are taken as 4, 2, and 10 respectively
and experimentation is carried out. The result of the proposed method along with
original image, is shown in Figure(3.33). Enlarged version of result of NLMF method
on image in Figure(3.32) is shown in Figure(3.34)(a) along with postprocessed image
is shown in Figure(3.34)(b). However experimentation has been carried out to verify
the result for various combinations of k, p, and h values, where k is set to 4, p is
to 2 & 3 and h is set to 10 & 15 respectively. As the search window value k, patch
size p, and the degree of filtering h values increase, the image becomes blurred and
it also increases computational time. For larger character sized documents, k value
has been set to 5 and p value to 3 to get a sharp edged image.
The major problem with the NLMF based method is the selection of the size of
the patch and search window. If the size of the search window is large, then image
becomes over smoothed and if the size is small, then smoothing effect will not be
noticed. The time taken to denoise depends directly on the square of patch window
size and the square of the search window size. If the both window sizes are small and
image is very large, then more number of iterations are required to denoise the image
and the method takes more time as the time complexity is more than quadratic order
as explained earlier.
Experimentation has been carried out using NLMF based approach to enhance the
palm leaf document images of 3 to 5 century old. Images shown in Figure(3.36)(a)
is the input noisy palm leaf image. Figure(3.36)(b) shows the filtered image us-
ing NLMF method without eliminating the background. Figure(3.36)(c) and Fig-
73
ure(3.36)(d) are the result of binarized image of Figure(3.36)(a) and binarized image
without enhancement respectively.
Experimentation has been carried out on previous century paper documents in the
form of digital and results are shown in Figure(3.37) (a), (b), (c), (d) on input images
shown in Appendix 1 Figure(B.1), Figure(B.2), Figure(B.3) and Figure(B.4). The
NLMF method performs better and enhances paper document images better than
MR and BF method.
Some more results of NLMF method on palm documents images are shown in
Figure(3.38) along with the results of previous methods. Experimentation on palm
leaf images shown in Appendix 1 Figure(3.5) and Figure(A.1) gives the results shown
in Figure(3.39) and Figure(3.40) respectively.
The images shown in Figure(3.41) (a), (b) and Figure(3.42) are the results on
NLMF method on stone inscription images shown in Figure(C.1) and Figure(C.3) and
Figure(C.2) respectively. The proposed method has enhanced the image properly, but
is unable to eliminate noise completely.
74
(a) (b)
(c) (d)
Figure 3.37: Results of NLMF based method on input images in Appendix 1 Fig-
ure(B.1), Figure(B.2), Figure(B.3) and Figure(B.4)
75
(a) (b) (c)
Figure 3.38: (a) Result of MR based method, (b) enhanced image of using BF based
method, and (c) result of NLMF based method on input image shown in Figure(3.26).
(a)
(b)
Figure 3.39: (a) and (b) Results of NLMF based method on input images shown in
Figure(A.2) and Figure(A.1).
76
Figure 3.40: Result of NLMF based method on input image in Figure(A.6).
(a) (b)
Figure 3.41: Results of NLMF nased method on images Figure (C.1 and Figure(C.3).
3.5 Discussion of Three Spatial Domain Techniques
The performance of the denoising and compression techniques are usually measured
using Peak Signal Noise Ratio (PSNR) and is given by equation
PSNR = 10.log10MAX2
I
MSE(3.16)
77
(a)
(b)
Figure 3.42: (a), (b) Results of NLMF based method on images shown in Figure(C.2)
and Figure(C.4).
78
Table 3.1: Comparison of PSNR values and execution time for three spatial domain
methods to enhance the paper document images of 512× 512 size.
PSNR in dB Time in seconds
S. No MR BF NLMF MR BF NLMF
1 23.7489 37.7467 37.9339 1.3809 4.1930 75.5781
2 23.0110 37.2470 37.3458 0.8599 3.9922 75.0680
3 21.7989 36.4558 36.6667 0.7258 3.6989 74.9188
4 27.9908 38.9503 49.2923 0.9177 4.2560 75.2277
5 21.6121 31.8781 49.0835 0.5370 3.3715 75.5161
6 32.9706 36.5272 42.0805 0.7368 3.8580 80.6541
7 23.3884 31.8267 36.7855 0.5735 3.4506 77.8826
8 24.3084 33.9875 38.0856 0.6086 3.5448 77.3176
9 21.7715 38.6051 37.2253 0.8701 4.2231 74.8562
10 18.8089 38.4077 38.7401 0.8700 4.2708 75.0643
11 27.5821 38.5781 42.3505 0.8114 4.1390 73.8697
12 24.7502 34.3364 39.4167 0.6264 3.6351 74.9942
13 28.3042 36.3690 41.1888 0.6785 3.7626 74.8971
14 26.0303 33.3616 37.1162 0.6218 3.5696 76.4342
15 22.0632 32.7488 36.3494 0.5879 3.4778 74.6217
where MAXI represents the maximum intensity in the gray scale image and MSE is
the Mean Square Error given by
MSE =1
mn
m−1∑
i=0
[s(i, j)− g(i, j]2 (3.17)
where s is the input noisy image and g is the enhanced output image.
The performances of the three spatial methods in this chapter are measured using
PSNR value, execution time and human interpretation. PSNR value is the quanti-
tative measurement for performance, based on the intensity difference between input
image and output image, which is in the form of mean square error. The PSNR
value is zero if the images are same. High PSNR value signifies large difference in the
79
Table 3.2: Comparison of PSNR values and execution time for three spatial domain
methods to enhance the palm leaf document images of 512× 512 size.
PSNR in dB Time in seconds
S. No MR BF NLMF MR BF NLMF
1 26.4524 42.2957 47.9365 0.5969 3.4360 74.9510
2 26.2149 43.1558 51.8227 0.6293 3.5938 75.0180
3 28.0176 36.1681 42.8645 0.5465 3.3711 74.6831
4 37.7392 43.9641 50.7459 0.6046 3.5207 74.4475
5 36.5024 45.4206 53.4520 0.6582 3.7449 74.3839
6 35.1399 42.8637 55.7108 0.6127 3.6044 73.0795
7 36.3590 39.1353 36.7321 0.5780 3.3639 72.8865
8 26.1604 43.5994 53.8239 0.7203 3.7519 73.1462
9 35.3636 35.9913 40.6100 0.7166 3.3461 73.6913
10 28.9509 34.3862 42.0768 0.5607 3.3302 73.8049
11 36.1223 36.2438 48.1984 0.5725 3.2981 73.7186
12 35.7097 47.4333 68.1295 0.7405 3.8808 74.3674
13 25.1643 43.6651 52.2960 0.6647 3.5375 71.8197
14 31.6850 40.2207 54.0870 0.6022 3.4445 74.0001
15 33.1248 36.5673 41.2524 0.5893 3.3587 73.7465
16 29.8257 35.5526 46.7688 0.5671 3.3819 71.9199
17 29.8723 40.3023 46.9310 0.7836 3.8665 70.9062
18 29.8161 38.2893 46.0705 0.6401 3.7022 71.4467
19 31.4859 37.2854 43.0458 0.5720 3.4136 72.5182
20 27.0621 44.8477 66.3683 0.6410 3.6374 71.6499
21 28.8259 34.1294 41.2640 0.5590 3.4780 71.0915
22 31.4629 39.4394 70.8347 0.6927 3.8897 71.2999
23 27.7132 42.3511 53.9032 0.7264 3.7395 70.8606
24 33.4980 36.5833 48.6999 0.5746 3.3302 71.0554
25 23.1851 41.2683 49.8445 0.6972 3.8921 71.5517
intensity of input and output images, and is considered a good measure for obtaining
better results. However, it is very difficult to prove that the method having high
80
Table 3.3: Comparison of PSNR values and execution time for three spatial domain
methods to enhance the stone inscription images of 512× 512 size.
PSNR in dB Time in seconds
S. No MR BF NLMF MR BF NLMF
1 27.5511 34.5556 55.3837 0.7176 7.0044 72.7901
2 24.1430 37.8203 59.7441 0.9204 6.8328 74.8493
3 32.3258 46.0989 49.5172 0.9360 6.7236 72.9305
4 29.3418 42.6803 51.0786 1.1076 7.0200 71.9633
5 24.7338 34.7385 41.8901 0.7020 6.7236 71.9165
6 27.2829 37.3984 37.2663 0.7644 6.8328 72.2753
7 27.8336 34.1124 38.5350 0.6240 6.8172 72.8993
8 28.2949 37.7342 34.9951 0.6552 6.9108 72.2285
9 21.6276 29.5958 34.9956 0.7332 6.7860 73.4609
10 21.7366 32.9754 43.9985 0.7644 6.5988 73.4921
PSNR value will give better results. Human interpretation is required along with
PSNR value to judge the quality of the output of the proposed system. The perfor-
mance of the algorithm is also measured using execution time. All these methods are
implemented on Intel Core i5-560M Processor with 2.66 GHz speed machine. The
algorithm with low computational time would be considered as the better method.
However the computational time is not the standard way of evaluating the perfor-
mance of any algorithm, we have added to demonstrate actual time taken in real
scenario to enhance 512 × 512 size image. Therefore all these three parameters are
used to analyze the quality of the output image. The PSNR values and execution
time in seconds for three methods are calculated and tabulated in the three tables.
Results on paper document image, palm leaf images and stone inscription images of
size 512 × 512 are given in Table(3.1), Table(3.2) and Table(3.3) respectively. The
detail discussion on all enhancement algorithms is given in the last section of the next
chapter.
The PSNR values for the MR method provided in the Table(3.1) shows that this
method works well for the document images which have clean and moderate degra-
81
dation. This method enhances low resolution, low contrast, stained, decolored paper,
palm leaf images and some of the stone inscriptions properly. It is however, un-
able to enhance severely degraded historical document images which include stone
inscription images. The PSNR values are lesser than BF and NLMF values. The
execution time is very low as MR method is completely based on simple set theory
operations requires O(M × N) operations. BF method enhances the paper, palm
and stone inscriptions better than MR method. The time complexity of BF method
is O(M × N × D2) as discussed in the sub section (3.3.1). However, time taken by
BF method is more than MR method but less than NLMF method. The PSNR val-
ues and enhanced appearance of the images along with execution time have proved
that BF method provides better results for severely degraded document images. The
NLMF based method performs better in denoising the noisy documents and preserves
the smoothness based on the similarity measure of the non local neighbor (window)
means. The time complexity of NLMF based method is N ∗M ∗K2∗P 2. This method
enhances paper, some of the palm leaf documents and stone inscriptions. The PSNR
values are proved to be better than previous two methods, but the main drawback
lies in its computational time as it takes ten times more than BF method.
Enhancement of such images is really a challenging task and demands development
of suitable algorithms for enhancement. Another drawback of these three methods is
the selection of proper values for structuring element size and controlling parameter
values for BF and NLMF methods. It is very difficult to select suitable parameter
values to address all types of degradation. The time required to enhance the large
size documents is very high in case of NLMF based method. However NLMF based
method enhances the contrast of the palm leaf document images and a well formed
binarized image can be obtained without background elimination. As enhancement
results play a vital role in the subsequent segmentation and recognition stages of the
document processing, results of these three methods can be used.
3.6 Summary
Three spatial domain techniques are implemented to enhance the degraded historical
documents which usually pose low contrast, uneven background and noise. These
82
methods effectively handle the problems present in the images due to various factors
mentioned in previous chapters. The first method uses adaptive histogram equal-
ization for contrast enhancement followed by morphological operations as these are
simple and powerful tools to eliminate noise, remove background and produce an
enhanced image with uniform background intensity and text content. This simple
and computationally efficient method is used as a background elimination technique
and is also used for character enhancement in the other two enhancement algorithms.
However this method is unable to handle all types of degraded document images.
Therefore a second method is developed to address these problems.
The Second method implemented uses a simple and computationally efficient bi-
lateral filter approach in combination with set theory techniques. Bilateral filter is
an efficient technique to eliminate noise without smoothing the edges. Mathematical
morphology is used to eliminate the background and enhance the characters. These
operations are very much useful in bridging the gap between the broken parts of the
character. This method enhances the stone inscription images properly and performs
better than MR method. This method takes more time than MR method, but less
time than NLMF method.
The third method is developed using mathematical morphology and NLMF tech-
nique. Both are powerful in maintaining the edge and contour details. NLMF is
powerful in addressing the noise present in the low frequency components of the im-
age. The proposed hybrid approach is compared with the results of MR method and
BF methods and the results of the proposed method outperforms the existing ones by
preserving the edge details and eliminating noise. But the NLMF method takes more
time compared to MR and BF methods. NLMF method is also unable to enhance
the stone inscription images properly.
Limitations of these methods lie in properly selecting the size of the structuring
element and the values for the parameters as discussed earlier. Limitations of these
methods have motivated us to further explore the frequency domain based approaches
which are explained in the next chapter.
83
Chapter 4
Enhancement of Degraded
Historical Documents : Frequency
Domain Techniques
1
4.1 Introduction
In the previous chapter spatial domain approaches have been implemented and ex-
perimented on the historical document images. However, limitations spatial domain
techniques which are explained in the previous chapter lead us to explore different
domain approaches. Some of the complex operations and measurements are carried
out better in frequency domain than in spatial domain. Images in spatial domain
can be transformed into the frequency domain by the technique called Fourier trans-
form. The signal can then be analyzed for its frequency content because the Fourier
coefficients of the transformed function represent the contribution of each sine and
1Some of the material of this chapter appeared in the following research papers
1. B. Gangamma, Srikanta Murthy K , Priyanka Chandra G C, Shishir Kaushik, Saurabh Kumar, “A Combined
Approach for Degraded Historical Documents Denoising Using Curvelet and Mathematical Morphology”,
IEEE International Conference on Computational Intelligence and Computing Research, Coimbatore, India,
pages 824-829, 2010.
84
cosine function at each frequency. It is a powerful tool for analyzing the components
of a stationary signal where there is no change in the properties of signal. But it
is unable to analyze and process non-stationary signal where there is a change in
the properties of signal. Extended version of Fourier transform, Short Time Fourier
Transform(STFT) is used to analyze the frequency at different time. But it is not
possible to measure the signals at different scales(frequencies). Wavelet theory has
been used to address the problem of analyzing the signal properties of both varying
and stationary signals.
Wavelets allow complex information such as music, speech, images and patterns
to be decomposed into elementary forms at different positions and scales and subse-
quently reconstructed with high precision. Wavelet transforms enable us to represent
signals with a high degree of sparsity. Due to its excellent localization property,
wavelet transform has rapidly become an essential signal and image processing tool
for a variety of applications, including denoising, reconstruction, feature extraction
and compression. Wavelet transform are widely used in character recognition of var-
ious languages [160]. Wavelet transform based denoising methods attempt to remove
the noise present in the signal while preserving the signal characteristics, regardless
of its frequency contents. In put research work, wavelet transform based approach
is presented to enhance the degraded documents and a brief introduction of wavelet
transform and thresholding algorithms is given in the following sections.
4.2 Wavelet Transform (WT) Based Approach
As historical document images are degraded in nature, enhancement of such docu-
ment image becomes primarily important to get well formed image for preservation
as well in the further stages of image processing. Wavelet transform based denoising
method employs the thresholding technique to remove the noise present in the image
[161]. Thresholding is applied to each part of the decomposed image to get denoised
image. Since wavelet transform technique is widely used for denoising the noisy
image. Using wavelet transform, 2D image is decomposed into four types of coeffi-
cients: approximation, horizontal, vertical and diagonal coefficients. Approximation
coefficients contains the low frequency components of an image which usually carry
85
useful information. Detailed coefficients are contained in the remaining three set of
coefficients. Thresholding is applied to either diagonal coefficients or all the three.
The thresolding may be hard thresholding or soft thresholding. Hard thresholding
means setting the coefficient(elements) values to zero whose absolute values are lower
than the threshold. Soft thresholding means setting the values to zero whose values
are lower than the threshold and then scaling the non zero coefficients to towards
zero. Soft thresholding eliminates the discontinuity. Inverse wavelet transform will
be applied to reconstruct the decomposed image. Various authors have proposed
algorithms for finding the threshold value using the information available.
4.2.1 Overview of Wavelet Transform
The 1 Dimensional Discrete Wavelet Transform(1D DWT) coefficients of function
f(x) is given by
Wϕ(j0, k) =1√M
∑
n
f(n)ϕj0,k(n) (4.1)
Wψ(j, k) =1√M
(n)ψj,k(n)forj0 (4.2)
whereWϕ(j0, k) andWψ(j, k) in these equations are sampled version of basis functions
ϕj0,k(x) and ψj,k(x).
Inverse discrete wavelet transform is applied to reconstruct the signal using the
following equation
f(n) =1√M
∑
k
Wϕ(j0, k)ϕj0,k(n) +1√M
∞∑
j=j0
∑
k
Wψ(j, k)ψj,k(n) (4.3)
The wavelet transforms for 2D signal is given by the scaling function ϕ(x, y) and three
dimensional wavelets ψH(x, y), ψV (x, y)andψD(x, y). Each is the product of two one
dimensional functions. Excluding products that produce one- dimensional results,
like ϕ(x)ψ(y), the four remaining products produce the separable scaling function
ϕ(x, y) = ϕ(x)ϕ(y) (4.4)
and separable “directionally sensitive” wavelets
ψH(x, y) = ψ(x)ϕ(y) (4.5)
86
ψV (x, y) = ϕ(x)ψ(y) (4.6)
ψD(x, y) = ψ(x)ψ(y) (4.7)
These wavelets measure functional variations and intensity variations for images along
different directions: ψH measures variations along columns(for example horizontal
edges). ψV responds to variations along rows(like vertical edges), and ψD responds
to variations along diagonals[154].
Given separable two dimensional scaling and wavelet functions, extension of the
1-D DWT to two dimensions is given by using scaling and translated basis functions :
ϕj,m,n(x, y) = 2j/2ϕ(2jx−m, 2jy − n) (4.8)
ψij,m,n(x, y) = 2j/2ψ(2jx−m, 2jy − n), i = H, V,D; (4.9)
where index i indicates the directional wavelets in Eqs.(4.5),(4.6) and (4.7). The
discrete wavelet transform of image f(x, y) of size M ×N is then
Wϕ(j0, m, n) =1√MN
M−1∑
x=0
N−1∑
y=0
f(x, y)ϕj0,m,n(x, y) (4.10)
W iψ(j,m, n) =
1√MN
M−1∑
x=0
N−1∑
y=0
f(x, y)ψij,m,n(x, y), i = H, V,D (4.11)
As in the one-dimensional case, j0 is an arbitrary starting scale and the Wϕ(j0, m, n)
coefficients define an approximation of f(x, y) at scale j0. The W iψ(j0, m, n) coeffi-
cients add horizontal, vertical, and diagonal details for scales j ≥ j0. Normally initial
value of j0 is set to 0(j0 = 0) and selecting N = M = 2j so that j = 0, 1, 2, ...J − 1
and m = n = 0, 1, 2, ...2j− 1. Given the Wϕ and W iψ of Eqs.(4.10) and (4.11), f(x, y)
is obtained via the inverse discrete wavelet transform
f(x, y) =1√MN
∑
m
∑
n
Wϕ(j0, m, n)ϕj0,m,n(x, y)
+1√MN
∑
i=H,V,D
∞∑
j=j0
∑
m
∑
n
W iψ(j,m, n)ψ
ij,m,n(x, y) (4.12)
87
4.2.2 Denoising Method
A general wavelet transform procedure for denoisning the image is as follows
1. Select a wavelet and number of levels(scales) P, for the decomposition. Then
compute the discrete wavelet transform of the noisy image.
2. Threshold the detail coefficients and apply a threshold to the detail coefficients
from scales J-1 to J-P. This can be accomplished by hard thresholding or by
soft thresholding. Soft thresholding eliminates the discontinuity.
3. Compute the inverse wavelet transform using the original approximation coef-
ficients at level J-P and the modified detail coefficients for level J-1 to J-P.
4.2.2.1 Thresholding Algorithms
Various thresholding algorithms have been implemented for denoising the image using
wavelet transform. Based on the literature survey, five thresholding algorithms are
selected and implemented. Results of these five thresholding algorithms are compared
using PSNR value. Thresholding method with high PSNR value is selected for wavelet
transform based approach to denoise the degraded historical document image. The
following section provides brief explanation of five thresholding algorithms.
1. Bayes Shrink
Chang et al. [162] proposed Bayes Shrink which is an adaptive data-driven
threshold for image denoising and uses soft-thresholding method. The aim of
this method is to minimize the Bayesian risk. Therefore it is known as Bayes
Shrink. The Bayes threshold, λs is defined as
λs =σ2n
σx(4.13)
Where σ2n is the estimated noise variance, and can be found as median of the
absolute deviation of the diagonal detail coefficients on the finest level(sub band
HH1) and is given by
σn =median(| Xij |∈ HH1)
0.67452(4.14)
88
σx is the estimated signal variance on the subband level is
σx =√
Max(σ2y − σ2
n, 0) (4.15)
where σ2y , is estimate of the variance of the observation and is given by
σx =1
Ns
N∑
k=1
s(W2k ) (4.16)
where Ns is the number of wavelet coefficients Wk on the sub band selected.
Value 0.67452 is the median absolute deviation of normal distribution with zero
mean and unit variance.
2. Visu shrink
VisuShrink is thresholding by applying the Universal threshold proposed by
Donoho and Johnstone [163]. This threshold t is given by
t = σ√
2logM (4.17)
where σ is the noise variance and M is the number of pixels in the image. For
denoising images, VisuShrink is found to yield an overly smoothed estimate.
3. SURE shrink
Donoho and Johnstone proposed a threshold value chooser method based on
the concept of Steinas Unbiased Risk Estimator (SURE) and is known as Sure
Shrink. It combines the features of universal threshold and the SURE threshold
[164][165]. This technique suggests a level dependent threshold value for each
resolution level. The goal of Sure shrink is to minimize the mean squared
error[166], defined as,
MSE =1
n2
m∑
x,y−1
(Z(x, y)− S(x, y))2 (4.18)
where Z(x, y) is the output image, S(x, y) is the original image without noise
and m×n is the size of the image. Sure Shrink suppresses noise by thresholding
the empirical wavelet coefficients. The Sure shrink threshold t∗ is defined as
t∗ = min(t, σ√
(2log(N)) (4.19)
89
where t represents the value that minimizes Stein’s Unbiased Risk Estimator,
σ which is the noise variance and N is the size of the image. It is smooth-
ness adaptive, which means that if the unknown function contains abrupt
changes or boundaries in the image, the reconstructed image also contains the
same.[167][168].
4. Norm Shrink The threshold value which depends on the sub band character-
istics of transform is given by
TN = βσ2
σy(4.20)
where the scale parameter β has computed once for each scale, using the fol-
lowing equation.
β =
√
logLkK
(4.21)
where Lk means the length of sub band at Kth scale. σ2 means the noise
variance[163], which can be estimated from the sub band of diagonal coefficients.
5. Universal Shrink
The universal shrinkage method is introduced[169] to denoise the noisy image
using wavelet transform and is known as universal shrink method. The thresh-
old T can be defined as,
T = σ√
2log(N) (4.22)
where N is the size of the image. σ being the local noise variance in sub band
and calculated by
σ =1
N
N−1∑
j=0
(X2j ) (4.23)
4.2.3 Proposed Methodology
The proposed method uses discrete wavelet transform based approach for denois-
ing the noisy document image. It uses Bayes shrink method for thresholding and
Daubechies’s wavelets are selected based on the experimentation on 500 images. Fig-
ure(4.1) shows the result of five thresholding algorithms. The result of five thresh-
olding algorithms for five images are shown in Table(4.1) and PSNR values for 20
90
Table 4.1: Comparison of various wavelet thresholding methods for five images along
with PSNR values.
Image No −→ 1 2 3 4 5
Methods ↓
Input Image
Bayes
27.5350∗ 28.6793∗ 28.5738∗ 26.1720∗ 24.8080∗
SURE soft
27.4348∗ 28.1360∗ 28.1144∗ 25.9753∗ 24.4478∗
Visusoft
27.4597∗ 28.1038∗ 28.1196∗ 25.9671∗ 24.4574∗
Norm soft
27.5298∗ 28.6736∗ 28.5581∗ 26.1685∗ 24.8017∗
Univ soft
27.4433∗ 27.9932∗ 28.0337∗ 25.9598∗ 24.4888∗
* - PSNR
91
a) b) c)
d) e) f)
Figure 4.1: Comparison of all thresholding methods
images are shown in Table(4.2). The proposed method also uses adaptive histogram
equalization to enhance the contrast of the image, mathematical morphology for back-
ground suppression and post processing is done by applying bottom hat morphological
operation which is equivalent to subtracting the input image from the result of mor-
phological closing operation on the input image. The algorithm is explained in detail
in the following subsection.
4.2.3.1 Stage 1: Mathematical Reconstruction
1. R1 ← Apply adaptive histogram equalization on noisy input image.
2. R2 ← Perform gray scale opening on R1.
3. R3 ← Add R1 and R2.
4. R4 ← Perform morphological closing on R3.
5. R5 ← Reconstruct the image by subtracting closed image R4 from R3.
92
Table 4.2: PSNR values obtained from five different thresholding methods for few
images.
BayesSoft SURESoft VisuSoft NormSoft UniSoft
1 24.4761 24.3609 24.3372 24.4711 24.4690
2 24.6531 24.4182 24.4140 24.6449 24.4513
3 24.8936 24.4925 24.5077 24.8842 24.5345
4 26.4264 26.2376 26.2288 26.4264 26.2450
5 26.1738 25.9753 25.9671 26.1685 25.9598
6 25.1758 24.5030 24.5185 25.1532 24.5213
7 24.8116 24.4478 24.4574 24.8017 24.4888
8 25.2862 24.7684 24.7735 25.2426 24.7675
9 24.6178 24.3007 24.3324 24.6081 24.3688
10 24.5663 24.3703 24.4560 24.5513 24.4721
11 25.5967 24.9730 24.9772 25.5880 24.9895
12 24.7232 24.2524 24.2662 24.7137 24.2689
13 26.1956 25.7236 25.6325 26.1956 25.7726
14 28.9752 28.7342 28.7704 28.9520 28.7376
15 28.7500 28.3716 28.3746 28.7398 28.2950
16 28.4078 28.2830 28.2827 28.4225 28.2398
17 28.9265 28.4545 28.4810 28.9007 28.3533
18 27.9564 27.4455 27.4085 27.9569 27.3642
19 29.0562 28.6055 28.5946 29.0443 28.5176
20 27.5408 27.4348 27.4597 27.5298 27.4433
4.2.3.2 Stage 2: Denoising by Wavelet Transform
Apply wavelet transform denoisining method on reconstructed image R5 from the
subsection (4.2.3.1) and obtain denoised image R6. The Daubechies’s wavelet is used
and thresholding is applied to detailed coefficients of first level and as well as second
level decomposition coefficients of vertical, horizontal and diagonal. Bayes Shrink is
employed for thresholding, as it denoises better than other four methods which are
93
explained in subsection(4.2.2.1).
1. R6 ← Apply wavelet transform to R5.
4.2.3.3 Stage 3: Postprocessing
1. R7 ← Add output of 2nd stage(4.2.3.2)(R6) to R1 of 1st stage(4.2.3.1).
2. R8 ← Apply bottom hat operation (the difference between the closing of the
original image and the original image) to R7.
3. R9 ← Reconstruct the image by adding complemented versions of R8 and R7.
4.2.3.4 Algorithm
This algorithm takes the noisy degraded document and produces the output image
by adjusting the contrast, character, eliminating the noise, eliminating uneven back-
ground.
Input: Degraded historical document image.
Output: Enhanced image.
begin
1. Perform stage 1 (sub section[4.2.3.1]) operations on input noisy image to get
partially enhanced image.
2. Perform stage 2 (sub section[4.2.3.2]) operation on output of stage 1.
3. Perform postprocessing operations mentioned in stage 3 (sub section[4.2.3.3])
on output of stage 2.
end.
4.2.4 Results and Discussions
Experimentation has been conducted on the entire data set using the wavelet trans-
form in combination with morphological operations to enhance degraded documents.
To select suitable thresholding algorithm, initially experimentation has been on more
94
than 500 images using five wavelet thresholding methods and PSNR values calcu-
lated. Out of 500 results only 20 values are tabulated in Table(4.2). Table(4.1)
shows the result images of thresholding algorithms and corresponding PSNR values.
By observing these PSNR values of all methods of denoised images and human visual
perception Bayes shrink method gives better result and thus selected as threshold-
ing algorithm to denoise degraded historical document. Soft thresholding has been
employed since it gives the smoothing effect to contours and edges.
(a) (b)
Figure 4.2: (a) Paper manuscript image-3 of previous century. (b) Enhanced image
using WT based approach.
Experimentation is carried on paper documents belonging to nineteenth and begin-
ning of twentieth century from a private collection. Result of the WT based approach
is shown in (4.2)(b) on input image Figure(4.2)(a). Figure (4.3) shows the results of
the WT based on input images shown in Appendix 1 (a) Figure(B.2) and (b) Fig-
ure(B.3. The results of the proposed method enhances reasonably well, but again
the selection of the proper size of the structuring element is not automated. Man-
ually, each and every image set should be inspected and normalized, so that proper
structuring element size can be used. But, when compared to NLMF method, WT
based approach takes less time and also is not controlled by the controlling param-
eter as in BF and NLM method. The proposed method also tested on palm leaf
manuscripts belonging to various century from 16th to 18th. Input images and results
95
(a) (b)
Figure 4.3: Enhanced images using WT based approach on (a) Paper manuscript
image of shown in Appendix 1 (a) Figure(B.2) and (b) Figure(B.3
.
are shown in (a) and (b) of Figure(4.4), Figure(4.5), Figure(4.6), Figure(4.7). The
method enhances the method better for the palm leaf images than paper images.
Figure 4.4: (a) Palm leaf manuscript image belonging to 16th - 18th century. (b)
Enhanced image using WT based approach.
96
(a)
(b)
Figure 4.5: (a) Palm leaf manuscript image belonging to 18th century. (b) Enhanced
image using WT based approach.
Results of the WT based approach on stone inscription images shown in Fig-
ure(4.8)(a), Figure(4.9)(a), (c), and Appendix 1 Figure (C.2) are shown in Fig-
ure(4.8)(b), Figure(4.9)(b), (d), and Figure(4.10)(b). The method enhances some
of the stone inscriptions properly, but unable to enhance severely degraded noise.
The limitation of the proposed method is that the wavelet is unable to handle curve
discontinuity. Hence curvelet transform based approach is implemented to address
this issue and explained in the next section.
97
(a)
(b)
Figure 4.6: (a) Palm leaf manuscript image belonging to 18th century. (b) Enhanced
image using WT based approach.
4.3 Curvelet Transform (CT) Based Approach
Remarkable efforts of researchers have produced significant contribution in the field
of spectral domain. Even though, intense research work has happened in the wavelet
field, wavelet transform are suitable only to address the point discontinuity, but fails
in addressing edge, and curve discontinuity. Apart from edge discontinuity prob-
lem, discrete wavelet transform uses only 3 directional wavelets; horizontal, vertical
and diagonal to capture the image information. Wavelet spectral domain will not
be able to represent images which contain high level of directionality. Because of
this limitation of discrete wavelet transform, researchers are trying to introduce spec-
tral approaches with more directional information in an image. This resulted in
development of ridgelet and curvelet transforms [170]. Curvelet transform has been
developed to overcome the limitations of wavelet and Gabor filters. Multiple orien-
98
(a)
(b)
Figure 4.7: (a) Palm leaf manuscript image belonging to 18th century. (b) Enhanced
image using WT based approach.
tation approach of Gabor filters have proved to be better than wavelet transform
in representing textures and retrieving images. Gabor filters are unable to provide
complete spectral information, because of which they cannot be effectively used to
represent images. This will degrade the classifier performance. Hence curvelet trans-
forms are promising method to capture spectral information and can be employed in
denoising, reconstruction and feature extraction problems. Curvelet transform also
provides the flexibility for the degree of localisation in orientation that varies with
scale. Fine scale basis functions are long ridges in curvelet, and the shape of the basis
functions at scale j is given by the 2j × 2j/2.
99
(a) (b)
Figure 4.8: (a) Stone inscription image belonging to seventeenth century. (b) Result
of WT based approach.
4.3.1 Overview of Curvelet Transform
Curvelet transform implemented using ridgelet transform proved to be less efficient
[171], because of the complex nature of ridgelet transform. Candes et al.[172] pro-
posed two new curvelet transforms based on Fast Fourier Transform(FFT) and re-
ferred to as Fast Discrete Curvelet Transform(FDCT). The first form is Unequally-
Spaced Fast Fourier Transform(USFFT) and second one is wrapping based FDCT.
Wrapping based Curvelet Transform is faster in computation time and more robust
than ridgelet transform and USFFT based curvelet transform [171]. Curvelet trans-
form based on wrapping of Fourier samples takes a 2-D image as input in the form of
a Cartesian array f [m,n]0 ≤ m < M, 0 ≤ n < N and generates a number of curvelet
coefficients indexed by a scale j, an orientation l and two spatial location parameters
k1, k2. Discrete curvelet coefficients can be defined by:
CD(j, l, k1, k2) =∑
0≤m<M,0≤n<N
f [m,n]ϕDj,l,k1,k2[m,n] (4.24)
where ϕDj,l,k1,k2[m,n] is a digital curvelet waveform.
Wrapping based FDCT[172] is a multi scale transform with a pyramid structure
and includes several subbands at different scales in the frequency domain. Orientation
and positions of the subbands at high frequency are different from subbands at low
100
(a) (b)
(c) (d)
Figure 4.9: (a) and (c) Stone inscription images belonging to 14th - 17th century. (b)
and (d) Results of WT based approach.
frequency. The curvelet waveform looks like a needle shaped element at high scales,
where as it is non directional at the coarsest scale. Curvelet becomes finer and
smaller at high scales and addresses curved edges more sensitively. The FDCT uses
effective parabolic scaling approach on the subbands in the frequency domain to
capture curved edges within an image more effectively. Since curvelet effectively
captures the curves in an image, curved singularities can be well approximated. The
best results can be achieved in the frequency domain. Both the curvelet and the image
are transformed to fourier frequency domain and then multiplied. The frequency
response of the curvelet transform is a trapezoidal wedge shown Figure(4.11)(a). This
wedge data cannot be accommodated directly into a rectangle of size 2j × 2j/2. To
101
Figure 4.10: Result of WT based approach on stone inscription belonging to seven-
teenth century shown in Appendix 1 Figure (C.2).
(a) (b)
Figure 4.11: (a)Wrapping data, initially inside a parallelogram, into a rectangle by pe-
riodicity(Figures reproduced from paper [172]). The shaded region represents trape-
zoidal wedge.(b) Discrete curvelet frequency tiling.
overcome this problem, Candes et al.[172] have implemented wrapping based FDCT
where a parallelogram with sides 2j × 2j/2 is chosen as a support to the wedge data.
The wrapping procedure is applied by periodic tiling of the spectrum inside the wedge
and collecting the rectangular coefficient area in the center. Figure(4.11)(b) shows the
wrapping of the data into rectangle tile. Then taking inverse FFT gives the curvelet
coefficients in spatial domain. The fastest curvelet transform currently available is
curvelets via wrapping [173], [174], [175], which is used in our work.
102
(a) (b)
(c) (d)
(e) (f)
Figure 4.12: (a), (c) and (e) Input images paper, palm leaf and stone. (b), (d) and
(f) Result of CT based approach.
103
4.3.2 Proposed Method
The curvelet transform is used to eliminate noise and enhance the degraded noisy
image. Mathematical morphological operators opening and closing are used to elim-
inate the background of the document image. The following steps are used in the
method.
4.3.2.1 Denoising Using Curvelet Transform
Curvelet toolbox is used to extract curvelet coefficients as explained in subsection
(4.3.1). These curvelet coefficients will be in spatial domain. Thresholding value is
applied to normalize curvelet coefficients. Again inverse curvelet transform is ap-
plied to get the output image. Three levels(scales) of decomposition is applied for
enhancement process.
1. OutputImage1 ← Extract curvelet coefficients using curvelet transform, nor-
malize the curvelet coefficients by applying thresholding and Take inverse curvelet
transform.
4.3.2.2 Algorithm
This algorithm takes the noisy degraded document and produces the output image
by adjusting the contrast, character, eliminating the noise, eliminating uneven back-
ground.
Input: Degraded historical document image.
Output: Enhanced image.
begin
1. Perform stage 1 of sub section[4.2.3.1] operations on input noisy image to get
partially enhanced image.
2. Perform stage 2 of sub section[4.3.2.1] operation on output of stage 1.
3. Perform postprocessing operations mentioned in stage 3 of sub section[4.2.3.3]
on output of stage 2.
end
104
(a) (b)
(c) (d)
(e) (f)
Figure 4.13: (a)-(b) Input images. (c)-(d) Results of first and second stage of curvelet
based approach. (e)-(f) Result of last stage(image 15-49).
105
(a) (b) (c)
Figure 4.14: (a) Palm leaf manuscript image belonging in between 16th to 18th century.
(b) Enhanced image using WT based approach. (c) Result of CT based approach.
4.3.3 Results and Discussions
The experimentation has been carried out using Matlab Curvelet Toolbox downloaded
from Curvelab.org [176]. The proposed method has been tested on historical docu-
ments of Kannada language. The proposed method enhances the severely degraded
noise by eliminating dark background. Figure(4.12) shows the experimentation re-
sult of curvelet transform based approach on paper, palm leaf and stone inscription
images. Figure(4.13) shows the experimentation result of CT based approach along
with the intermediate results on palm leaf manuscript images.
The results of CT based approach for 5 images which includes palm leaf image
and paper document image, are given in Table(4.3). Results on some more palm
leaf images are shown in Figure(4.14), Figure(4.15) and Figure(4.16). The proposed
method also enhances the stone inscriptions and results are shown in Figure(4.17)
and Figure(4.18).
106
(a) (b) (c)
Figure 4.15: (a) Input image of palm script. (b) Result of WT based method. (c)
Result of CT method.
(a) (b) (c)
Figure 4.16: (a) Input image of palm script. (b) Result of WT based method. (c)
Result of CT method.
;
107
(a) (b)
Figure 4.17: (a) Result of WT based approach, (b) result of CT based approach on
image shown in Figure(4.8)(a).
4.4 Summary
Two frequency domain based approaches are developed and experimentation has
been performed on the historical Kannada document images. First method is based
on wavelet transform approach and second based on curvelet transform. Curvelet
transform is suitable in handling curve discontinuity and gives smooth curve, when
compared to wavelet transform. The low contrast and uneven background inten-
sity has been handled using mathematical reconstruction techniques. Both these
frequency domain methods are compared using PSNR values, execution time and hu-
man visual perception. Curvelet Transform based approach outperforms the wavelet
transform based approach with respect to visual appearance by human interpretation
and PSNR values, but takes slightly more time than WT based method and results
are given in Table(4.7). The Table(4.4), Table(4.5) and Table(4.6) show the PSNR
values and execution time for paper document images, palm leaf document image
and stone inscriptions respectively.
4.5 Discussion on Enhancement Algorithms
In the current and the previous chapter, we have presented five enhancement algo-
rithms and experimented on large data sets containing approximately 2700 images.
108
(a) (b)
(c) (d)
Figure 4.18: Results of WT based method shown in (a), (c) and result of CT based
method shown in (b)-(d) for stone inscription images shown in Figure(4.9)(a) and
(c).
The proposed methods are able to enhance the degraded documents and produce
better results in terms of giving quality binary images. The proposed methods are
compared using basic filtering/denoising techniques only. The proposed methods are
tested on Kannada historical document images and also works well on any other lan-
guage documents. But in our research work, we concentrate on the era prediction
of characters belonging to Kannada language only, because, the inclusion of other
languages creates further complexities and demands additional algorithms for identi-
fying the language. We are unable to compare proposed methods with state-of-the-art
methods, because we have used our own data sets to conduct experiments. We could
109
not completely implement state-of-the-art methods and experiment on them using
our data sets. We wanted to implement simple techniques to enhance Kannada doc-
uments. The performance evaluation parameters like PSNR, SSIM, MSE are not
standard, because PSNR value is a quantitative measurement and whose high value
may not always signify an enhanced image. The Structural Similarity Index Mea-
sure(SSIM) is applicable only when the original ground truth images are available
and similarity is measured between the original image and the restored image. In our
research work, original images are not available and therefore, SSIM cannot be used
to measure the performance. We cannot compare the results of our method with the
state-of-the-art methods, even if the state-of-the-art methods are implemented and
used for experimentation, because the data sets that are used to compare their results
are different from our data set. One more evaluation criteria is to use the enhanced
binarized image for segmentation. The segmentation algorithm should segment the
document image into lines, words and character properly. So if the segmentation
algorithm segments the binarized image properly, then the performance of the en-
hancement methods can be said to be satisfactory. The output of the enhancement
techniques can be used as input to the segmentation algorithms which is explained
in next chapter segmentation of document image into lines, words and character.
110
Table 4.3: Result of Curvelet Transform based approach.
Input Image Result of 1st and 2nd stage Result of last stage
111
Table 4.4: Comparison of PSNR Values and execution time for Wavelet and Curvelet
Transform based methods on paper images.
PSNR for PSNR for Time in Sec Time in Sec
S.No WT Based CT Based WT Based CT Based
1 25.8798 35.5900 3.0985 5.7283
2 25.7759 35.2788 2.4512 5.0381
3 24.9246 33.8267 2.1122 4.5498
4 27.3457 37.8142 2.3272 4.5129
5 24.6156 30.0907 1.9518 4.5297
6 25.1462 34.0457 2.1306 4.5294
7 25.6768 29.4456 1.9917 4.5357
8 24.4999 31.7419 1.9671 4.5847
9 24.5966 36.1923 2.2618 4.5223
10 25.0807 36.1255 2.5894 4.5129
11 24.3833 37.0358 2.5262 4.5744
12 24.5652 32.2564 2.3669 4.5386
13 26.7434 34.2467 2.3850 4.5189
14 25.3168 31.2967 2.3612 4.5554
15 24.9640 30.2086 2.3910 4.5514
112
Table 4.5: Comparison of PSNR Values and execution time for Wavelet and Curvelet
Transform based methods on palm leaf images.
PSNR for PSNR for Time in Sec Time in Sec
S.No WT Based CT Based WT Based CT Based
1 24.2724 35.6228 2.3424 4.6382
2 24.2116 36.6294 2.3034 4.6146
3 24.3478 31.7468 2.2459 4.5703
4 24.2225 36.0736 1.9751 4.5886
5 24.6479 36.7832 2.0093 4.6487
6 24.7061 36.0069 1.9547 4.6589
7 24.4497 30.2781 1.8661 4.8125
8 24.2690 37.7530 2.0014 4.5264
9 24.3078 30.6214 1.8734 4.5395
10 24.4159 31.3910 1.8745 4.5329
11 24.3806 31.4680 1.8825 4.4984
12 24.1759 38.9257 2.0604 4.7344
13 24.2226 36.1537 1.9738 4.5569
14 24.3169 34.9385 1.9947 4.5872
15 24.4767 31.6551 1.8945 4.5200
16 24.3498 32.2874 1.9326 4.5080
17 24.3430 36.9334 2.0311 4.5991
18 24.3146 33.2810 1.9806 4.5533
19 24.3899 31.9061 1.8903 4.5287
20 24.2313 37.4947 1.9488 4.6408
21 24.4372 31.0819 1.9853 4.5355
22 24.1572 39.0294 2.3300 4.7449
23 24.3488 38.2007 2.0846 4.6863
24 24.5303 31.5589 1.9545 4.6738
25 24.3335 37.4303 2.0544 4.6256
113
Table 4.6: Comparison of PSNR Values and execution time for Wavelet and Curvelet
Transform based methods on stone inscription images.
PSNR for PSNR for Time in Sec Time in Sec
S.No WT Based CT Based WT Based CT Based
1 24.4136 31.1071 2.7333 5.3786
2 24.4923 37.5022 2.1230 4.9018
3 24.5634 35.9363 2.1270 4.8408
4 24.6208 39.2543 2.2215 7.5410
5 25.1249 32.9747 1.9283 6.8818
6 24.9396 33.1379 1.9560 6.7669
7 24.6928 30.6739 1.8946 6.8962
8 24.3710 31.7174 1.9159 6.9271
9 24.9423 29.7077 1.8633 7.7404
10 25.0671 33.1831 1.9341 6.8797
114
Table 4.7: Comparison of PSNR values of two frequency domain based approaches.
Input Images Wavelet Transform Curvelet Transform
24.6479 36.7832
24.3335 37.4303
24.1836 38.7004
24.6156 30.0907
25.6768 29.4456
115
Chapter 5
Segmentation of Document Images
1
5.1 Introduction
In the previous two chapters, enhancement algorithms have been developed to en-
hance the degraded historical documents using spatial and frequency domain tech-
niques. In this chapter, document image segmentation algorithms have been pre-
sented. Document image segmentation is the process of segmenting the document
image into lines, words and characters. Segmented characters are used further in the
classification and recognition stages. Efficiency of the classifier is completely depen-
dent on the character features extracted, which in turn depends on the segmentation
of the characters. Hence the development of efficient segmentation algorithms to ex-
tract lines, words and characters is very important. Extracting lines from printed doc-
uments is comparatively simpler than extracting lines from handwritten documents,
as lines in the handwritten document are usually contain nonuniform spacing between
1Some of the material of this chapter appeared in the following research papers
1. B. Gangamma, Srikanta Murthy K , Hemanth Kumar G, Riddhi J Shah, Swati D V, Sandhya B, “Text Line
extraction from Kannada Handwritten Document”, IEEE, International Conference on Computer Engineer-
ing and Technology, November, Jodhpur, India, pages E 8-11, 2010
2. B. Gangamma, Srikanta Murthy K, Riddhi J. Shah, Swati D V, “Text Line Extraction from Palm Script Docu-
ments Using Morphological Approach”, International Conference on Computer Engineering and Applications
Dubai, UAE, pages 1452-1455, January 29-31, 2012.
116
them. Apart from this, historical document images which are inscribed/written usu-
ally pose uneven line space, inscriptions over curved lines, overlapping text lines
etc., making segmentation of the document difficult. Therefore, there is a need for
development of efficient segmentation algorithms to address these problems..
This chapter deals with the segmentation of the document image into lines. The
chapter is organized into six sections. Section one a provides brief introduction, the
second section gives information about the proposed methodologies, section 3,4, and
5 detail the three different segmentation algorithms. The last section provides the
summary of all the methods.
5.2 Proposed Methodologies
Tremendous efforts have been expended to address the segmentation problem. But
these algorithms address only a specific set of problems and are unable to address all
the segmentation problems. It is still an open challenge for the research community to
devise a suitable algorithm to address the segmentation problem. It is noticed from
the literature survey that, only a countable number of researchers have addressed
the segmentation of historical documents and not much work has been traced back
to South Indian Language documents. This has motivated us to design suitable
segmentation algorithms to segment the Kannada language historical documents into
lines and characters and extract the character features to recognize the era of the
character.
The segmentation algorithm requires binarized image as input. Binarization is
the process of separating(segmenting) the document into foreground and background
groups. Also this process requires thorough preprocessing methods to enhance the
documents, as there is significant degradation in these documents due to various
factors which are discussed in the previous chapters. Hence there is a requisite to de-
velop efficient preprocessing algorithms. In this chapter, an attempt is made towards
developing efficient segmentation algorithms to segment the historical document im-
age. The segmentation algorithm requires a well formed binary image. The Results
117
(a) (b)
Figure 5.1: (a) Handwritten Kannada document image. (b) Horizontal projection
profile of handwritten document image.
of the preprocessing algorithms are considered as input to the segmentation process
and are binarized using global thresholding Otsu[53] method.
Two algorithms have been developed for the extraction of text lines and characters
from the historical document images and an algorithm is developed to detect and
correct the skews in the document. All of which are explained in detail in the following
sections.
5.3 Method 1: Piece-wise Horizontal Projection
Profile Based Approach
Global horizontal projection profile is the widely used method to segment the lines.
This method is well suited for printed documents where the spacing between lines is
prominent. Individual lines are segmented based on the valley points in the histogram.
It is also used to segment the handwritten documents into lines with sufficient line
spacing between lines. The document image shown in Figure(5.1)(a) is a handwritten
Kannada document with uniform spacing between lines and Figure(5.1)(b) is its
horizontal projection profile. The projection profile is used to extract individual lines
from the document. The gap between two valleys is used to separate the line. This
118
method works well for images with uniform spacing between lines. However, not
all handwritten documents will possess this uniformity. Lines in the document are
usually skewed or curved and they pose uneven spacing between lines. The global
projection profile method fails to segment the lines in such situation. The sample
input text document is shown in Figure(5.2) with uneven spacing between lines and
its projection profile is shown in Figure(5.3). The proposed method is devised to
address the segmentation of document image with uneven line spacing into lines and
characters.
This method consists of four stages: the first stage divides the document into verti-
cal strips, the second stage obtains the horizontal projection profile of the individual
strips, the third stage constructs the line using vertical strips and the last stage deals
with extraction of the characters. The following sub section explains the algorithm
steps in detail.
Figure 5.2: Handwritten Kannada document image.
Figure 5.3: Horizontal projection profile of the input image Figure(5.2).
119
5.3.1 Division into Vertical Strips
In this approach, the image is divided into vertical strips of equal width W as shown
in Figure(5.4). The value ofW can be chosen in terms of multiples of 100. Depending
on the size of the image, W can be selected. If the size of the smaller image is less
than 500, then 100 will be a better value. If the size is more than 1000, then choosing
200 will be better. The more the number of pieces, better the extraction. But the
arrangement of the pieces into blocks of lines is very imprecise. So, the number of
vertical strips should reasonably be between 5 and 10.
To calculate number of strips N , the total number of columns C in the image
is divided by W . If C is exactly divisible by W then all strips of equal size are
obtained. Else, N +1 strips, with N strips of W size and (N +1)th size is calculated
by size = C −W ∗C/N where N is obtained by N = C/W , where C is the width of
the image.
5.3.2 Horizontal Projection Profile of a Strip
For each strip obtained using the above method, the horizontal projection profile is
calculated and the text pixel count in each row is stored in the pixel count array. The
plot of pixel count versus row number yields a projection profile which contains clear
peaks and valleys. Valley points on the plot represent the zero text pixel count in the
rows and peak points represent the maximum number of text pixels in rows. These
zero text pixel rows are represented as Zero Rows (ZR) and non-zero pixel rows as
Non Zero Row (NZR). Keeping only one ZR between two NZRs and eliminating
other ZR makes extraction of the lines easier. The Figure(5.4) depicts the vertical
division, zero pixel count row and non-zero pixel count rows. Horizontal projection
profile of a strip is shown in Figure(5.5)
5.3.3 Reconstruction of the Line Using Vertical Strips
By scanning each row of the profile array h, search the first NZR and store the row
number in Rn as a starting row number(NZR1), where n represents strip number.
Continue to scan until the next first zero text pixel count. Store the previous row
120
Figure 5.4: Non-Zero Rows (NZRs) and rows labelled NZR1 and NZR2.
Figure 5.5: Horizontal projection profile of a strip.
as ending row number(NZR2) of the first line. Continue to scan the profile array
until the next NZR is found and store this row number as (NZR3) and proceed
scanning until the ending row number for the second line, as discussed for the first
line is found. This process is repeated until all the lines are scanned and all the strips
are processed.
Once all the potential NZRs are extracted, calculate the distance between pairs
of potential consecutive NZRs. Average distance is calculated and used to check
whether these NZRs represent the starting and ending of a single line or not. Average
height of all these lines is used as threshold value. If the difference between the
corresponding NZRs of the adjacent strip is less than the threshold, then both NZR1
and NZR2 values are considered, else ignored for the current line. To extract the
first line, contents between the first pair of NZRs are extracted from each strip and
121
joined. This method is applied repeatedly to all pairs of NZRs from each strip in
order to extract all the lines.
(a) Line 1
(b) Line 2
(c) Line 3
Figure 5.6: Extracted text lines.
5.3.4 Character Extraction
The extracted lines are used to extract words and characters using connected com-
ponent analysis (CCA). Vertical projection profile is suitable for extraction of words
from the paper document image, but is unable to separate the words from palm leaf
document images, because of improper spacing between words and characters. Also
the prediction of the era requires individual characters. So, extraction of the charac-
ter is performed. However reconstruction of the character from the segmented piece
of character and broken character is not performed, as is out of the scope of our
research work.
5.3.5 Algorithm for Document Image Segmentation.
Input : Input binarized image
Output: Segmented lines and characters.
1. Binarize the input image using Otsu[53] method.
2. Divide the image into vertical strips of size W.
122
Figure 5.7: Character extraction from line.
(a)
(b)
(c)
(d)
(e)
(f)
Figure 5.8: (a), (c), (e) are the extracted lines and (b),(d),(f) are showing extracted
characters from lines(a), (c), (e).
123
(a) Line 1
(b) Line 2
Line 3
Figure 5.9: Input handwritten image and extracted Lines.
(a) Line 1
(b) Line 2
(c) Line 3
Figure 5.10: Extracted characters.
3. For each strip, obtain the horizontal projection profile h, that is the array
containing number of text pixels in each row.
4. Find the potential NZR and store the corresponding row numbers in separate
array Rn where R represents an array to store row number and n represents
the strip number.
5. Repeat the step from 3 to 5 for all the strips.
124
6. Extract the first row number pairs from Rn and join them to make first line
and store it as separate line image.
7. Extract the characters from the extracted line using CCA.
8. Repeat the step 6 until all the lines are formed.
5.3.6 Results and Discussion
Experimentation has been performed on 200 different Kannada language handwrit-
ten documents and some of the results are shown here. Two sets of documents are
considered for experimentation, one set containing paper documents and the other
set containing palm leaf documents. Figure(5.2) shows a handwritten paper docu-
ment image. As this document does not have distinct gaps between two valleys, the
separation of the two lines is not possible. The projection profile of Figure(5.2) shown
in Figure (5.3) does not have distinct gaps between two valleys. The division of the
document into vertical strips helps us to extract each vertical strip separately. From
each vertical strip, individual lines are extracted and stored separately. Furthermore,
each piece of line from each strip is taken and joined to get single line. The ex-
tracted lines from the document image are shown in Figure(5.6). Vertical projection
profile is employed to extract individual characters from each line and the extracted
characters from the line are shown in Figure(5.7). Figure(5.8) shows the extracted
characters for all the lines in the document. one more experimental result is shown
in Figure(5.9) for extraction of lines and Figure(5.10)shows the extracted characters
in the Figure(5.9) respectively.
When compared to global projection profile method and Hough transform method,
the proposed method works well for document image with uneven spacing between
lines and words. If the gap between two lines is very small and too curvy, then the
strip size has to be reduced in such a way that projection of each individual strip
should contain prominent gaps between two lines. Dividing the image into too many
vertical strips makes extraction and joining process complicated and the time involved
for reconstruction is also more. Further, the reconstruction of the line will also not
be proper because of too many pieces. Therefore, dividing the document image into
125
too many pieces reduces the quality of the line construction. Therefore this method
cannot be used to segment touching lines and lines that have too much curvedness.
One such image shown in Figure (5.11) is subjected to the proposed method. The
result of segmentation of Figure(5.11) is shown in Figure(5.12). The 1st, 2nd, 3rd and
6th vertical strips in the image Figure(5.12) have uneven spacing between lines and
thus make the segmentation of row piece much more difficult. Therefore, the present
research work has proposed another algorithm to segment the touching lines from the
document image and is explained in the next section.
Figure 5.11: Input image with uneven spacing between lines
Figure 5.12: Result of method 1 on the image shown in Figure(5.11).
5.4 Method 2: Mathematical Morphology and Con-
nected Component Analysis(CCA) Based Ap-
proach
As the piece-wise projection profile method is unable to handle skewed and touching
lines, a second method is devised to segment the lines from the curved line docu-
126
ment which will have uneven spacing between lines and touching lines. The proposed
method uses mathematical morphology and connected component analysis. The fol-
lowing section explains the procedure used to segment the image having uneven spac-
ing between lines.
Figure 5.13: Result of closing operation.
(a) Line 1
(b) Line 2
(c) Line 3
Figure 5.14: Extracted text lines.
The proposed method requires binarized image of the historical document. De-
graded historical document images are enhanced using the methods explained in the
previous chapters, 3 and 4. Output of the Curevlet transform and Non Local Means
filter techniques are used to obtain the enhanced images. Efficient global thresholding
method Otsu[53] algorithm is used to binarize the enhanced image.
127
(a)
(b)
Figure 5.15: (a) Line and extracted characters from line (a).
Figure 5.16: Input image.
5.4.1 Morphological Closing Operation
Morphological closing operation is applied to the binarized image. This is mainly used
for connecting and merging the characters in a line. The line structuring element with
length L and zero degree is used for closing operation, where the value of L can be
between 20 and 60. L value can be selected based on the length of the palm script
and higher value of L will be used to bridge gaps between words and fill the holes
that are prominent in palm scripts. These holes are used to tie all scripts together.
If the gap between the words is more, then the smaller value of L will create many
number of components. To avoid this, the length of the line structuring element is
chosen carefully. The result of closing operation should yield single component for
each line. The result of closing operation on the sample input image Figure(5.11) is
shown in Figure(5.13).
128
5.4.2 Line Extraction Using Connected Components Analy-
sis
Once the closing operation is performed on the image, connected component analysis
is used to extract the connected components. The following paragraph explains the
steps to be followed in extracting lines from the connected component image.
1. Scan the row from the beginning until text(ON) pixel is found.
2. Store this row number as starting row in an array called Row Numbers(RN).
Copy the pixel values to another array called as Extracted Connected Compo-
nent Line (ECCL).
3. Continue to scan each line and copy the pixels to (ECCL) array until all the
pixels belonging to one connected component are copied. Store the end row of
the connected component in RN .
4. Steps 2 and 3 should be repeated for all connected components. Line com-
ponents and row pairs for each connected component lines are extracted and
stored separately in (ECCL) and RN arrays. (ECCL) for each component is
maintained separately.
Once the connected components are extracted, the original lines have to be ex-
tracted from the image as follows:
• Select the pair of connected component coordinates from the RN array and
extract the pixels between pair of RN values(as starting row number and ending
row number), from the original document image. Since the lines are not straight,
extracted contents between two rows will usually contain pixels from next line
also. Corresponding connected line component ECCL is taken and logical AND
operation is performed on extracted pixels in the original document.
• Extracted lines are then stored separately as a line segment. This line segment
can then be used to extract the words and characters for further processing in
recognition system.
129
5.4.3 Finding the Height of Each Line and Checking the
Touching Lines.
Using starting row and ending number which are stored in the RN array for each line,
height of the line can be calculated. Difference between starting and ending rows gives
the height of connected line which is also equal to the height of the line. Average
height of all connected components is calculated. This is required to check whether
each component is containing one line or more than one line. If each components
height is more than the average height, then the component will contain more than
one line. This occurs when two or more lines are touching. There is a need to break
the connectivity. If the height of the line is less than the average height, then the
lines are extracted directly. Otherwise the extracted touching line components is
again given as input to the opening operation to break the connectivity between the
lines. Then perform the same operation of extracting connected components and
calculate the line height. Repeat from the beginning until all the single lines are
extracted.
5.4.4 Character Extraction
Simple CCA is used to extract the individual characters from the extracted line.
Vertical projection profile may not yield proper result because of spacing problem
between characters. Since Kannada language script contains vattu’s and matras, it
is difficult to segment the characters along with the vattu’s and matras properly.
CCA method proved to be better for extraction of characters. As mentioned in the
previous section, reconstruction of the broken character is not performed.
Figure 5.17: Result of closing operation.
130
(a) First Line.
(b) Second Line.
(c) Third Line.
Figure 5.18: Result of extraction of connected components(lines).
5.4.5 Algorithm for Segmentation of the Document Image
into Lines.
Input: Binarized image obtained from the first stage of the proposed method.
Output: Segmented lines of the document image.
1. Mathematical Morphological closing with line structuring element being ap-
plied.
2. Connected component analysis is applied and components are extracted.
3. Calculate the height of all connected components.
4. Find out the average height of the connected component.
5. Check the height of each connected component, if the height of the connected
component is less than the average height, then the original line is extracted
using the method explained above. If the connected component height is greater
than the average height, then the mathematical opening operation is applied
to break the touching lines. Repeat the steps 2 to 5.
6. Use CCA to extract the characters from the extracted line.
131
Each extracted line component is painted with different color to show the different
components shown in Figure(5.14). The characters are extracted again using CCA
method. The CCA is applied to the extracted lines and the results are shown in the
Figure(5.15).
5.4.6 Results and Discussion
Experiments have been conducted on 200 historical document images of palm leaf
and paper scripts with varying space between lines. Out of them, only a few results
are presented here. More than 50% lines are extracted in the first iteration and the
remaining lines are extracted in the subsequent iterations. Not more than 2 iterations
are required to extract all the lines, if the lines are touching with narrow width.
Same segmentation procedure can be used for word extraction with the smaller
value of L of the structuring element. The lines which are connected to the above
and below lines may lose some information and that can be addressed when we extract
each character. Some of the segmentation results are shown in the following figures:
input image is shown in Figure(5.16) and its closed image is shown in Figure(5.17).
Extracted lines are given in the Figure(5.18). One more experimentation result of
image Figure(5.19) is shown in Figure(5.20), that is closed(painted) image. Result of
line extraction is shown in Figure(5.21).
Touching lines(two lines) in the image are shown in red color (second red line) as
in Figure(5.22), and its opened image is in the same image. Again applying opening
operation on touching line component will segment the lines into distinct lines. Re-
sults of segmentation of touching lines are shown in Figure(5.22), Figure(5.23). If the
lines are touching with minimum width(may be narrow joining), then using the same
structuring element, opening operation can be applied. If the touching width is more,
then segmentation of lines becomes very difficult. Again, the structuring element size
has to be changed. Setting the structuring element size completely depends on the
portion of the line that is touching. Development of the algorithm for automatic
selection of the structuring element size is a challenging task for researchers.
132
Figure 5.19: Result of binarization operation.
Figure 5.20: Result of closing operation.
5.5 Discussion on Method 1 and Method 2
The two algorithms presented in the previous sections are designed to segment and
extract lines from the document image. The performance of the segmentation al-
gorithm is usually measured using the parameters viz. number of lines extracted,
number of character extracted, number of character recognized correctly using the
OCR system. Again the segmentation algorithms performance completely depends
on the samples present in the data set and language OCR. In this research work, data
sets containing historical Kannada language documents, inscribed on palm and pa-
per are considered. OCR for such characters(old Kannada, middle Kannada) are not
available and it is highly impossible to measure the performance of the segmentation
algorithms based on the OCR performance. Almost all the state-of-art methods are
tested on standard data sets and the performance of such algorithms can be measured
using the number of lines extracted and the number of characters recognized. The
proposed algorithms which are developed are again based on the state-of-art method-
133
(a) First Line.
(b) Second Line.
(c) Third Line.
(d) Fourth Line.
(e) Fifth Line.
Figure 5.21: Result of extraction of connected components and corresponding lines.
ologies available for segmenting the document image into lines, words and characters
and will work for any language. As the reconstruction of the segmented pieces of
the character and broken characters is not in the scope of the present research work,
only whole/complete character is used in the recognition stage which is given in the
next chapter. In this research work, an attempt has been made to design simple and
efficient algorithms to address some of the issues present in the existing methods.
Also, we wanted to test the performance of the enhancement algorithms which are
presented in the previous chapters using segmentation algorithm as another param-
eter of measuring the performance of the enhancement algorithms. Therefore, the
proposed segmentation algorithms cannot be compared with the state-of-art meth-
ods. However, we have tested the performance of the algorithms on a small portion of
our data set which have a clear background. This selection is made manually. In the
134
(a)
(b)
Figure 5.22: (a) Touching line portion. (b) Result of closing and opening operation.
(a)
(b)
Figure 5.23: Extraction of lines.
next section, one more algorithm to find the skew within the document and correct
the skew is explained. The motivation behind the development of this algorithm is
to reconstruct the lines after correcting the skew so that a simple global projection
profile algorithm for line segmentation can be applied to segment the document. In
the next section simple and efficient skew detection and correction is designed.
5.6 Skew Detection and Correction Algorithm
Document skew is a common problem that occurs during the digitization process
using advanced scanners or cameras. Skew or tilt in the images are caused due to
incorrect positioning of the documents on the scanners. It may also be introduced
135
while capturing the photograph. Skew angle in digital documents can be defined as
the angle made by the text lines of a digital document with that of the direction of
the x-axis of the co-ordinate system.
The skew may cause problems in text line extraction, word and character extrac-
tion. Incorrect segmentation leads to incorrect classification. Therefore, it is often
necessary to determine the skew angle and correct the skew before proceeding to
the subsequent steps i.e. segmentation, feature extraction, classification, document
layout analysis, representation, in order to make recognition in the document image
analysis stage more intelligent. Hence, skew angle detection is a major and funda-
mental step in document image analysis.
From the literature survey, it is observed that the document skew is applied to
printed and handwritten documents for the whole page. All the above mentioned
methods work well for the whole document with single skew. None of the authors
have addressed the handwritten document skew detection and correction. Skew can
also occur while writing, by authors. It is hard to find handwritten documents which
are similar to printed documents. Each and every line will be skewed or slanted
upwards or downwards with different angle as shown in the Figure(5.26) of image in
Figure(5.24). The horizontal profile is shown in Figure(5.25). This can be viewed as a
multi-skewed document and there is a need to deskew each and every line separately.
If the skew correction is done on each line properly and the document is reconstructed,
then the simple and efficient horizontal projection profile method can be used to
segment the lines accurately.
The proposed method is based on line smearing approach. The binarized docu-
ment image is subjected to mathematical closing and each line is smeared(painted)
by merging all the words and characters to make a single line of block. Connected
component analysis is used to extract the components and boundary values are ex-
tracted. Upper Left Corner(ULC) point and Lower Right Corner(LRC) are used to
find the skew angle. Then pixel values of f(x′, y′) is obtained by copying the pixel
value of f(x, y) ie, f(x′, y′) = f(x, y).
136
Figure 5.24: Input skewed image.
The proposed method is explained in detail in the following sub sections. Once
the binarized image is obtained, the size of the image is calculated. The image
length(column size) is used to calculate the size of the line structuring element. The
next two steps are the same as the second line extraction algorithm steps where
morphological operation closing operation is applied to the binarized document to
merge the text line. The CCA method is used to extract the connected blocks of
lines.
5.6.1 Skew Angle Detection
Skew angle is calculated using the two opposite corners of the connected block shown
in Figure(5.27). Upper Left Corner(ULC) point and Lower Right Corner(LRC) are
used to calculate the length and width of the connected block of line. The skew angle
137
Figure 5.25: Horizontal projection profile of the input image(5.24).
can be calculated using the simple formula
tan(θ) = R/C (5.1)
where R is the row difference given by R2 − R1 and C is the column difference
given by C2 − C1. R1 and R2 represent the row values of ULC and LRC, C1 and
C2 represent the column values of ULC and LRC. Once skew angle is calculated,
new points are obtained using skew correction step.
5.6.2 Skew Correction
Using calculated skew angle, actual line is rotated using the formula given in Eq(5.2)
and Eq(5.3).
x′ = xcosθ − ysinθ (5.2)
y′ = xsinθ + ycosθ (5.3)
Segmentation of the line has to be done to get the actual line. Using second line
extraction algorithm discussed in the previous subsection, skewed lines are extracted.
138
Figure 5.26: Result of closing operation.
Figure 5.27: Skew angle calculation from single connected component.
Skew correction is now applied to the actual text line to get the deskewed line.
Corrected lines are then transferred to another image to reconstruct the complete
document image with uniform spacing between lines. These steps result in the de-
skewed line shown in Figure(5.28). These steps are repeated for all the lines in the
entire document.
139
5.6.3 Algorithm for Deskewing
Input:Binarized document image.
Output:Deskewed image.
1. Calculate the line structuring element value and its value awa is equal to the
one tenth of the column width of the image, w = C/10; where C represents the
number of columns in an image.
2. Apply morphological closing with line element of width w and zero angle.
3. Extract the connected block of merged line using CCA.
4. Find the width c and height of the block r using ULC and LRC values of the
connected block.
5. Calculate the skew angle using the formula θ = tan−1(c/r).
6. Rotate the image using skew angle which is detected in the previous step using
the formula
x′ = xcosθ − ysinθ (5.4)
y′ = xsinθ + ycosθ (5.5)
7. Append the deskewed lines to another image.
8. Repeat step 3 to step 7 until all the lines are deskewed.
5.6.4 Results and Discussion
To substantiate the efficiency of the proposed methodology, several experiments have
been conducted on document images of various scripts with different skews of his-
torical documents. Out of them, only a few results are presented here. Historical
documents of Kannada language which are in the form of palm leaf image and paper
images are considered. Since the handwritten documents usually have curved and
140
Extracted connected component of a line
Exacted document line
Deskewed line.
Figure 5.28: Result of deskewing.
Table 5.1: Result of skew detection and correction.Merged Line Extracted Line Deskewed Line Skew angle
3.697
4.454
5.434
3.727
5.327
8.045
6.295
7.002
6.545
5.228
6.702
4.618
141
Figure 5.29: Reconstructed image of Figure(5.24).
skewed lines, detecting skew angle and correcting the skew is a major challenge. The
documents are scanned using flat bed scanner at a resolution of 300 dpi. The results
obtained for Kannada document images scanned at different orientations are shown
in the Table(5.1). Final reconstructed image is shown in Figure(5.29). Results on
some more input images shown in Figure(5.30)(a) and Figure(5.31) are de-skewed
and the results are shown in Figure(5.30)(b) and Figure(5.32).
142
Figure 5.30: (a) Input Image. (b) Deskewed image.
Figure 5.31: Input skewed image.
Experimentation conducted on original historical documents with varying skew
are listed out in the Table(5.2). The corrected skew lines can be used further for
word and character segmentation and for feature extraction and classification. Also,
these de-skewed lines can be used to reconstruct documents with sufficient inter-line
spacing.
In this proposed method, there is no need to thin or skeletonize the connected
component as the two opposite corner values are sufficient to calculate the skew angle.
Once the skew angle is obtained the proposed algorithm works well irrespective of
the type of script even for a wide range of skew angles within ±90◦.
143
Figure 5.32: Deskewed image.
5.7 Summary
Segmentation of the document image is very essential as the recognition of any char-
acter from the document image is carried out by segmenting the document image
into lines, words and characters. Segmentation of the handwritten document com-
pletely depends on the way each line is written. Usually, uneven spacing between
lines, curved and touching lines and touching characters create problems in proper
segmentation. In this chapter, two efficient algorithms for segmentation of the doc-
ument image into lines and words and one skew detection and correction algorithm
are presented. The first algorithm works well for curved lines but fails to address the
touching lines problems. A second algorithm has been developed to address touching
lines problem and it segments touching lines properly. Another algorithm efficiently
detects skews within the lines and suitably corrects them. The segmented charac-
ters are used to extract the characteristic features and subsequently to recognize the
extracted characters. In the next section, we propose an era prediction algorithm
144
Table 5.2: Skew angle detected for each line in the document image.
No. Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7 Line 8
1 -4.433 -0.122 -1.959
2 -9.047 -5.526 -6.431
3 -1.581 -1.453 0.251
4 -0.554 -0.368 0.114
5 9.335 9.659 11.085
6 12.980 7.812 8.506
7 0.407 0.623 0.470 3.807
8 7.625 8.595 7.762 8.665 8.002 8.458
9 2.023 2.443 1.959 2.632 2.066 2.952
10 -7.362 -8.084 -7.094 -8.054 -7.618 -7.447
11 5.429 6.219 5.928 5.993 5.023 5.238 5.526 5.702
12 3.697 4.454 5.434 3.727 5.327 8.045 6.295 7.002
to identify the era of the characters so that the character set corresponding to that
particular era can be referred in order to decipher the contents of the document.
145
Chapter 6
Prediction of Era of Character
Using Curvelet Transform Based
Approach
1
6.1 Introduction
Recognition of characters from handwritten document images and categorizing them
into various classes is one of the major challenges in the area of document image anal-
ysis and recognition. The characters written on paper depend on the author’s mood,
style and materials used for writing. So the extraction of characters from handwrit-
ten document images is a profoundly complex task. Historical documents inscribed
on variety of materials usually pose many challenges to researchers, particularly in
pre-processing, segmentation and feature extraction stages. In order to recognize the
characters and decipher a given document, there is a prerequisite to know the period
1Some of the material in this chapter appears in the following research papers:
1. B. Gangamma, Srikanta Murthy K, Punitha P, “Curvelet Transform Based Approach for Prediction of Era
of the Epigraphical Scripts”, IEEE International Conference on Computational Intelligence and Computing
Research, coimbatore, pages 636-641, 2012.
146
of the character, so that the character set pertaining to that era can be used to ap-
propriately decipher the document. Hence there is a pressing need to predict the era
of the character efficiently. In this chapter, we have presented an algorithm for the
prediction of the era of the script based on curvelet transform approach. Curvelet
transform is effective in handling curve features [172]. In this research work Fast
Discrete Curvelet Transform (FDCT) based algorithm is designed to predict the era
of the character/script. The characters are extracted using segmentation techniques
discussed in the previous chapter and used as input to the algorithm to suitably
predict era of the script.
Writing or inscribing on hard materials was the usual practice in early days. An-
cestors used both hard soft materials like rocks, metal plates and palm leaves to
inscribe the information. The current practice is to decipher these documents manu-
ally. Expert epigraphists use few characters (a, e, ka, cha, la) as standard characters
for predicting the era of the script. These are the key characters having distinct
shapes and variations in structures as shown in Figure(6.1). In this research an at-
tempt has been made to develop a method for the prediction of the era of various
scripts of Kannada - a South Indian Language, so that the deciphering can be done
by selecting the character set belonging to that particular era.
This chapter is organized as follows; Section 2 deals with related work; the pro-
posed methodology is extensively discussed in section 3; The Experimental results
are provided in section 4 and finally section 5 provides the summary of the proposed
method.
6.2 Related Literature
Research efforts in the field of character recognition have grown exponentially and
a substantial number of articles have been published during the last few decades.
Designing an OCR system is one of the most fascinating and challenging areas of
pattern recognition and it can contribute immensely to the advancement of automa-
tion processes. OCR is one of the most important components of pattern recognition
and has many applications in automatic document processing.
147
Figure 6.1: Sample epigraphical characters belonging to different era.
Handwriting identification and recognition are of great practical interest in the
extraction of discriminating and invariant information from a handwritten specimen.
One of the major difficulties in offline word recognition originates from the presence
of same writer over time or from different scripters. There is no perfect mathematical
model that can describe such extreme variations and hence it is impossible to find
the characteristic features that are invariant with different writing styles.
Literature survey reveals that enormous amount of work has been done in the area
of document image processing and recognition. Many authors have developed efficient
algorithms for the enhancement of the degraded documents, segmentation of the
document into lines, words and characters and subsequently for feature extraction and
classification of the characters [190],[35][36], [37]. Feature extraction and recognition
are crucial steps in any recognition system. Almost all feature extraction algorithms
are based on spectral, statistical and structural features which are based on the
topological and geometrical characteristics of the character [38].
148
The most widely used statistical methods are zonal and projection profile methods.
Local and global image statistics such as mean, variance and deviation are used as
feature sets along with other methods. Zonal methods count the number of ON pixels
in different image zones. Horizontal and vertical profiling method counts the number
of ON pixels in each row and column respectively. Dholakia et al. [191], proposed an
algorithm to recognize the Gujarati printed text using zonal method. This method
deals with the identification of various zones for text regions. Zones in the image
are identified by the slope of the lines created by the upper left corner of rectangle
created by the boundaries of connected components. They attempted to simplify the
task of OCR design by developing algorithms for character zone extraction.
Desai [193] suggested a Multi-layered feed forward neural network classifier for the
recognition of the Gujarati digits by using zonal profile method. The author has
taken four different profiles of digits, two of them are in the diagonal direction and
the other two are in the horizontal and vertical direction. This method has been ap-
plied on isolated characters after thinning and skew correction has been performed.
Authors have claimed to have achieved an admirable recognition rate of 82%. Gatos
and Kesidis [194] presented the idea of Adaptive Zoning Features based on the local
pattern information, every zones pixel density was extracted after adjusting the po-
sition of each zone by maximizing the local pixel density around each zone. Khanale
and Chitnis [195] proposed a method for recognition of Devanagari characters using
Directional plane method, where the character image is decomposed into directional
planes and each plane is partitioned into equal zones and the sum of the pixels in
each zone is used as feature value. A texture based method has been employed by
Murthy et al. [196], to extract the epigraphical character features. The era of the
characters was adequately predicted using template matching method.
A modified algorithm with scale and translation invariant properties has been de-
signed by Amayeh et al. [192]. Image normalization has been performed in a different
manner by rescaling the coordinates of the image instead of the usual technique of
re-sampling it. The ratio of the image area to the area of the unit disk is set to
a constant value. The Authors have claimed that their algorithm resulted in faster
computation and yielded a higher recognition rate.
149
Kan et al. [197] presented a novel approach by combining two different invariant
moment methods a Orthogonal Fourier Mellin Moments [198] and Zernike [199] mo-
ments for recognition of alphanumeric characters. The combined method was useful
in characterizing images with large variability. Kunte and Samuel [200] have devel-
oped an OCR system for the recognition of basic printed Kannada characters, which
works for different font sizes and styles. Hu [201] invariant moments and Zernike
moments have been used for extracting invariant moments which serve as feature
vectors. These moments work fine for low order moments which represents less infor-
mation, but is unable to handle higher order moments as higher order moments are
prone to noise. The Zernike moments are used as efficient shape descriptors for im-
ages that cannot be defined by a single contour. The Zernike moment have rotation
invariance and noise robustness properties. But these moments do not have scale
and translation invariance properties which are required for efficient shape recogni-
tion algorithms. These moments can represent global information more accurately,
but smaller images are represented with comparatively less accuracy.
Spectral methods consider images in frequency domain and locate the Fourier com-
ponents. These methods are invariant to rotation, translation and scale and ade-
quately address the recognition of characters with these variations. Wavelet trans-
form of the input coordinates and the angle were used as feature sets for classification
of the Malayalam character with simplified Fuzzy ARTMAP Network, which takes
comparatively very less time for training. It also supports incremental learning mak-
ing it suitable for practical implementation [202]. In the last two decades, extensive
research has taken place in the field of mathematics and computational tools based
on Multi-resolution Analysis. This research has led to the design of newer tools for
analysing the information. Development of wavelets and related transforms has pro-
vided a lot of methods for addressing the problems related to large data sets like
image compression, de-noising and reconstruction of objects. Various efforts have
been made which include simple ideas like thresholding of the orthogonal wavelet co-
efficients of noisy data, followed by its reconstruction. Translation invariance property
was achieved by using un-decimated wavelet transform [173] as further improvement.
150
Literature survey reveals that sufficient work has been done in the spectral do-
main for various applications. Even though enormous work is found in the wavelet
transform field, wavelet transforms are suitable only to address point discontinuity
and fail to address edge and curve discontinuity. Apart from the edge discontinuity
problem, discrete wavelet transform uses only three directional wavelets; horizontal,
vertical and diagonal to capture image information. Another limitation of Wavelet
Transform is that it is unable to represent images that contain high levels of direc-
tionality because of which, discrete wavelet transform researchers have tried to find
other spectral approaches which have more directional information in an image. At-
tempts to overcome the disadvantages of waveform has resulted in the development
of ridgelet and curvelet transforms [175], [170]. Furthermore, Curvelet Transform
has been developed to overcome the limitations of Gabor filters. Even though the
multiple orientation approach of Gabor filters gives better results than wavelet trans-
form in representing textures and retrieving images, it is unable to provide complete
spectral information. Therefore Gabor filters cannot be effectively used to represent
images. This will degrade the classifier performance. Hence Curvelet Transforms
are used for feature extraction. Curvelet Transform also provides flexibility for the
degree of localisation in orientation that varies with scale. Fine scale basis functions
are long ridges in curvelet, and the shape of the basis functions at scale j is given by
the 2j × 2j/2 [170]. Brief explanation of Curvelet Transform is provided in chapter 4.
4.3.1
6.3 Proposed Method
The proposed method uses Curvelet Transform to extract character features and uses
minimum distance classifiers to predict the era of the character. The proposed model
comprises of 4 stages viz. 1) Data set generation, 2) Preprocessing, 3) Feature ex-
traction and 4) Classification.
1) Data set generation step deals with the collection of sample characters belong-
ing to various eras.
2) Preprocessing step explains the method of binarization of the scanned documents
151
and the segmentation and extraction of individual characters.
3) Curvelet coefficients are extracted using Fast Discrete Curvelet Transform(FDCT)
for each of the segmented characters in the feature extraction step.
4) Nearest Neighbour Method is employed in the classification step, to predict the
era of the character.
6.3.1 Data Set Creation
The Database of era characters contains 4145 samples belonging to 6 various eras
which are extracted from various documents. A minimum of 13 and maximum of 19
characters are considered from 6 different eras and 40 samples are collected for each
character. Out of the 4145 samples, 2600 characters were taken for training and 1545
were considered for testing. These characters have translation, scale and rotational
variance with different image sizes.
6.3.2 Preprocessing
Characters are extracted from documents having uniform intensity. Character images
with pixel value 1 as foreground of an image and plain surface as background with
pixel value 0 have been considered. Binarization techniques are available for binariz-
ing the document images. Otsu[53] method has been used to binarize the document
containing epigraphical characters. Once the binarized image is obtained, characters
are extracted using aConnected Component Analysisa. Characters are normalized to
100 × 50 size to maintain an aspect ratio of 1:2. Many characters have rectangular
rather than square shapes. Characters belonging to same era are labeled using era
numbers and referred to as class labels. Two more data sets have been created from
the same collection. The samples are preprocessed and skeletonized, dilated once
and normalized to 40 × 40 and 64× 64 to study the curvelet transform response for
skeletonized and dilated image characters with equal aspect ratio.
152
6.3.3 Feature Extraction using FDCT
In this stage, FDCT is used to extract the features of segmented characters. Curvelet
coefficients of a character image f(m,n)0 ≤ m < M, 0 ≤ n < N whereM×N , is size
of the character image, is calculated using equation(4.24). This equation computes
an array of coefficients of scale j, orientation l along with the location parameters
(k1, k2) as explained in [section curvelet]. These coefficients are used as representative
feature vectors of the segmented character. Such features are computed for all the
characters in the database.
6.3.4 Classification
Characters belonging to various eras are predicted in the classification stage. The
test characters are subjected to preprocessing and feature extraction steps. The
curvelet coefficients of the test character are extracted using FDCT and used as
feature vectors. This feature vector is compared with other feature vectors from
the database. Minimum Distance Classifier is employed to predict the era of the
characters. Distance between the test character feature vector and database feature
vector is computed using Euclidean Distance. The distance with the minimum value
is selected and the era is predicted based on the index of the character using class
label table. The algorithm for era prediction is given below.
6.3.5 Algorithm for Era Prediction
Input: Set of test character images belonging to different eras.
Output: Classification of the test characters based on their eras.
1. Curvelet coefficients are extracted using curvelet transform method which forms
the feature vector of the test character.
2. Feature vector of the test character is compared with database. Euclidean
distance classifier is used to find the match. It calculates the distance between
the test feature vector and trained feature vector using the equation,
d(p, q) = d(q, p) =√
(p1 − q1)2 + (p2 − q2)2 + . . .+ (pn − qn)2where p is the training feature vector and q is the test feature vector. Index of
the distance with minimum value is used to find the era of the character.
153
3. Repeat step 1 and 2 for entire set of test characters.
6.4 Experimentation and Results
The characters belonging to 6 different eras are shown in the Fig(6.1). Experimenta-
tion was conducted on normalized character image size 100 × 50 and also compared
with 40 and 64× 64 size character images.
Experimentation has been conducted using curvelet tool box CurveLab-2.1.2[176]
for feature extraction to extract the curvelet coefficients. Experimentation has been
performed to understand the curvelet transform in different scales. Curvelet coef-
ficients at coarsest level are good at capturing the low level approximation of the
function. The remaining scales give the finer details especially corresponding to edge
details. Curvelet coefficients of first scale, second scale and with both scales having 16
orientations are extracted. Size of the feature vector completely depends on the size
of the image as well as the number of scales considered. For an image size of 40× 40,
scale 1 contains 169 coefficients, scale 2 contains 2352 coefficients. Total of both scale
1 and scale 2, contains 2521 coefficients. For character image size 64 × 64, scale 1
consists of 441 coefficients, both 1 and 2 contain 6425 coefficients. Character image
of 100× 50 size contains 3 scales with 561 coefficients in scale 1, 4131 coefficients in
scale 1 and 2.
6.4.1 Experimentation 1
Experimentation on image size 100× 50 has been performed and feature vectors are
obtained with scale 1, scale 2 and for both scales. Table 6.1 shows recognition results
and confusion matrix for the image size 100× 50 with 1 scale having 561 coefficients.
The feature vector size is 1× 561 and the training data set has 2600× 561 coefficient
matrix. The testing set contains 1545 images with each image having again 4131
coefficients in the feature vector and 1545 × 561 total coefficients in the testing set.
The recognition accuracy of 3rd century B.C. characters is 267 out 285 , as these
characters have distinct shapes and features and they stand out to be significantly
different from other era characters. Error rate is 6.32% because 6 to 7 characters have
154
similar structure and shape. Beginning few characters are similar to the characters
of the first three eras. Therefore there is a chance of predicting the era incorrectly.
The second set of characters belonging to 1st century A.D. are classified correctly
with 77.41% and wrongly with 22.59%. Since the 1st century A.D. characters have
evolved from the previous century and most of these characters have similar structure.
Therefore the characters that have similarity will be classified wrongly. 5th century
A.D. characters are having overlapping from previous and next century, because of
which recognition rate has decreased and has a value of 78.15%. Recognition rate of
6th, 9th and 11th century A.D are 88.72%, 90.42% and 86.32% respectively.
Table 6.1: Confusion Matrix and Recognition Rate(RR) for character image size
100× 50.
Era BC3 AD1 AD5 AD6 AD9 AD11 Total RR in %
BC3 267 10 3 4 0 1 285 93.68
AD1 31 209 18 5 5 2 270 77.41
AD5 13 13 211 11 8 7 265 78.15
AD6 1 0 12 173 4 5 195 88.72
AD9 1 2 5 4 217 11 240 90.42
AD11 2 1 4 9 23 246 285 86.32
6.4.2 Experimentation 2
In the second experimentation, the image was normalized to square size of 40×40 andfeatures were extracted from the training data set and test images were subjected to
classification. This training set has 2600×169 coefficients and testing set has 1545×169. The recognition rate for various era characters are shown in Table Table 6.2.
Some of the era characters have similar geometrical structure from next and previous
century characters, causing misclassification. In addition, normalizing images with
square dimensions causes many characters to lose significant features. Therefore
normalizing images with square size deteriorates the recognition or classification rate.
155
Table 6.2: Confusion Matrix and Recognition Rate (RR) for character image size
40× 40 with first scale.
40× 40 BC3 AD1 AD5 AD6 AD9 AD11 Samples RR in %
BC3 243 17 16 6 0 3 285 85.263
AD1 41 190 25 8 2 4 270 70.370
AD5 13 18 216 15 3 5 270 80.000
AD6 3 3 14 165 5 5 195 84.615
AD9 3 1 6 8 205 17 240 85.417
AD11 3 1 7 8 15 251 285 88.070
6.4.3 Experimentation 3
Experimentation on image size 64 × 64 is shown in the Table 6.3. Classification of
the era characters along with misclassified characters result are shown in the Table
6.3 and observed. As per observations, era prediction rate has decreased with the
increase in size (square matrix size from 40 to 64). The recognition rate for 3rd
century B.C, 1st, 5th, 6th, 9thand11th century A.D were obtained as 89.825%, 65.614%,
75.439%, 56.491%, 69.123%, and 81.404
Table 6.3: Recognition Rate(RR) of the data set 64 × 64 and Confusion Matrix for
character image size 64× 64 with first scale.
64× 64 scale1 BC3 AD1 AD5 AD6 AD9 AD11 Samples RR in %
BC3 256 13 10 4 1 1 285 89.825
AD1 49 187 24 4 1 5 270 65.614
AD5 16 15 215 7 7 10 270 75.439
AD6 4 2 17 161 7 4 195 56.491
AD9 9 0 9 6 197 19 240 69.123
AD11 8 4 15 6 20 232 285 81.404
156
Table 6.4: Comparison of the Recognition Rates(RR) for various character image
sizes 40× 40, 64× 64, 100× 50.
Era ↓ 40× 40 Size 64× 64 Size 100× 50
% RR in % RR in % RR in %
BC3 85.263 89.825 93.68
AD1 70.370 65.614 77.41
AD5 80.000 75.439 78.15
AD6 84.615 56.491 88.72
AD9 85.417 69.123 90.42
AD11 88.070 81.404 86.32
Average 82.289 72.982 85.78
Figure 6.2: Prediction Rate for Gabor, Zernike and proposed method.
6.4.4 Discussion
The proposed method was implemented to find the era to which the character be-
longs. In this thesis, prediction of the era has been implemented using the synthetic
157
data set generated by various persons. Following assumptions have been made: 1)
Isolated characters are selected and collected manually from various persons 2) These
characters are free from noise, 3) Some of the characters are taken from result of seg-
mentation methods explained in the previous chapter, 4) these characters are having
non uniform resolution and dots per inch, as these documents are acquired from
different scanners at different resolutions.
The characters contained in the data set have rotated/slanted, translated, non
uniform character sizes. The characters also have variable resolutions as images
are acquired using different resolutions under various conditions. Some characters
are created synthetically and some are extracted from the enhanced and segmented
historical documents. The features extracted from these characters should be able to
represent the unique features of individual characters. Therefore feature extraction
method should be efficient in extracting features from characters, so that the era can
be predicted properly. The main focus of the work is to select a suitable feature
extraction technique to address these variations and classify the character eras. As
discussed in the previous chapter’s section 4.3.1, Curvelet Transform is used to extract
the features as it is more efficient in handling curve details and complete spectrum
information of the image.
The distinguishing patterns in the image provide the unique features as represen-
tative feature vectors. The Gabor filters are suitable for extracting features from
objects with different scales and orientations. There is a prerequisite to understand
and analyze the Gabor wavelet bank in detail. There are total 40 filter banks and
they are selected based on the features that need to be extracted. The selection of
the proper scale and the number of filter banks also plays a vital role. Furthermore,
Gabor Filters are unable to represent complete spectrum information or handle curve
discontinuity. Characters have only uniform background after preprocessing which
tends to deteriorate the recognition rate. Therefore the recognition rate is compara-
tively low compared to other methods.
Most popular methods like Hu, Zernike moments are usually used to extract shape
descriptors and are employed in the recognition of characters. These moments work
158
fine for low order moments with less information, but are unable to handle higher
order moments as those are prone to noise. Hu and Zernike moments are invariant
to rotation but are not scale invariant. Digital images must be mapped onto the
unit disk before the Zernike moments can be calculated. Zernike moments are not
natural scaling invariants. The scaling invariance should be provided by this mapping;
therefore, the correct mapping of objects into the unit disk is a crucial step. Hence
the recognition of the characters with variable scale is not possible using Zernike
moments.
6.5 Summary
The shapes of the character set in historical scripts have evolved over the centuries.
Hence, in order to competently understand the script, it is necessary to know the
corresponding era in order to use its character set. Lines and curves happen to be
the dominating features in these character sets. Since curvelet transform is effective in
handling these features[172], Fast Discrete Curvelet Transform(FDCT) based model
was designed to predict the era of the script. Experimentations were conducted on
data sets comprising of 4145 images belonging to six different eras. The resultant
recognition rate of the proposed method was 85.78%. The proposed method was
compared with Gabor filter and Zernike moments based approaches. The results
showed that the proposed method on an average had 20% to 25% better accuracy
over Gabor filter and Zernike moment based approaches in efficiently predicting the
era of the epigraphical scripts.
159
Chapter 7
Conclusion and Future Work
7.1 Conclusion
Historical documents are immeasurably crucial resources which provide valuable in-
formation about our past. It is necessary to preserve these resources for posterityas
sake in a suitable format. There are various issues which need to be adequately ad-
dressed during the preservation and processing of these documents. One of the major
issues is the legibility of the document content, which is impacted by numerous fac-
tors that have affected the health of these materials. Since these issues persist, as
they are carried over, when they are transferred into digital form, they need to be
handled appropriately.
Therefore, there is a dire need to address these issues using appropriate image
processing techniques. Literature survey reveals that many admirable works exist
in the field of Kannada historical document image processing. In our research work
we have presented several image enhancement algorithms to enhance the quality
of the degraded historical Kannada documents. These algorithms satisfactorily de-
noise the input document image and subsequently binarize the document for further
processing. The resultant image thus produced by these methods, is an enhanced
image with sharp edges which is further used to segment and predict the era of the
scripts.
160
To enhance the degraded document images, five image enhancement algorithms
have been developed, three of these are in the spatial domain and other two are in
the frequency domain. In the spatial domain, morphological reconstruction (MR)
techniques have been used to develop an algorithm to eliminate dark uneven noisy
backgrounds. It has also been used in combination with the remaining four algo-
rithms as a background elimination technique. However, MR method was unable to
effectively handle severely degraded noisy images and furthermore it was also unable
to address all types of problems posed by degraded documents. Therefore a Bilateral
Filter (BF) based approach was devised in combination with domain and range fil-
tering methods in order to de-noise severely degraded images without smoothing the
edges and it proved to be quite proficient in this undertaking. BF method performed
better than MR method and was also found to enhance stone inscription images.
However, the computational time of BF method turned out to be slightly more than
that of MR method. MR method has its complexity equivalent to the size of the
image and was unable to enhance all types of degraded document images.
The major limitation of this method is the selection of the controlling parameter
value and structuring element size.
Furthermore, an algorithm based on Non Local Means Filter (NLMF) was imple-
mented to de-noise the document images using similarity measure between non local
windows in order to enhance the image. This method adequately addressed severely
degraded document images by eliminating noise and preserving the edge information.
Although NLMF method proved to be a better solution when compared with the pre-
vious two methods, the computational cost was very high. Moreover, the need for the
proper selection of the search and patch window sizes tends to complicate matters
further because as the search window size increases so does the computational time.
The performances of the above three spatial methods were measured using PSNR
value, execution time, human interpretation and binarized image after enhancement.
PSNR value is the quantitative measurement for performance, based on the intensity
difference between the input image and the output image. High PSNR value signifies
large difference in the intensity of input and output image, and is considered as
a good measure for obtaining better results. However, it is very difficult to prove
161
that the method having high PSNR value will always give better results. Although,
computational time cannot be taken as a parameter for measuring the performance
of any given algorithm, we have considered this parameter to signify the duration
taken by different methods from a practical point of view.
To support the evaluation criteria, we have considered computational time as one
of the parameters to evaluate performance. Therefore time complexity plays a major
role in evaluating the performance of any algorithm. Image enhancement algorithms
need to produce images which are not only good in quality but are also visually
appealing, irrespective of their PSNR values and computational time. Human inter-
pretation along with binarized output, PSNR and computational time are required.
The performance of an image enhancement algorithm is also measured by the seg-
mentation algorithm, which tells us how well the binarized image will get segmented
into lines, words and characters. Therefore, these are the deciding parameters that
are used to evaluate the performance of an image enhancement algorithm. While the
PSNR values of BF method are higher than MR method, the PSNR values of NLMF
method are higher than that of all the other methods. Execution time of MR method
is considerably less when compared to the other two methods. BF method takes ten
times more time than MR method. Although NLMF method satisfactorily enhances
the image, it approximately consumes ten times more time than BF method. NLMF
and BF methods enhance the image with respect to PSNR values and human visual
interpretation and NLMF method takes more time than MR and BF methods which
turns out to be a major drawback.
Frequency domain based transforms are employed in preprocessing the documents.
An algorithm based on Wavelet Transform (WT) has been developed to analyze and
restore the degraded document images. Since wavelet transform handles only point
discontinuity and not curve discontinuities, another algorithm based on Curvelet
Transform (CT) approach was devised, which proved to be better than other pre-
processing algorithms developed in this research work. The major advantage of the
frequency domain based approach lies in the efficiency of its computational time and
the fact that it does not require the selection of any parameters that control the
output of the filter. However, the selection of the structuring element for morpholog-
ical operations is completely dependent on the size of the character in the document
162
image. Wavelet transform can be applied to images of variable size, where as curvelet
transform requires the size of the image to be square. This is another limitation of
the curvelet transform based approach in addition to the selection of the structuring
element size. Two frequency domain based methods were compared using PSNR
values, computational time and human visual perceptions. The CT based method
takes slightly more time than WT, but gives an enhanced image in terms of PSNR
values and human visual interpretation.
To segment the historical document image, two segmentation algorithms have been
developed: one is based on piecewise projection profile method and other on mor-
phological operations and Connected Component Analysis (CCA). The First method
addresses uneven spacing between lines by dividing the image into vertical stripes.
It subsequently extracts each line from each the stripes and combines them to make
a line. Although this method segments the document image with uneven spacing
between two lines, it is unable to segment touching lines. The Second method was
developed to address both uneven line spacing and touching lines using morphological
operations and connected component analysis.
Skew is a common error introduced during the image acquisition process which
is performed either using a camera or a scanner and needs to be de-skewed. Hand-
written documents typically contain multiple skewed lines which are commonly due
to uneven spacing between lines where each line gets skewed. Global skew correc-
tion algorithms were not helpful in segmenting the handwritten documents correctly.
To address the skew problem within the document lines, an extended version of the
second segmentation algorithm: gray scale morphological morphology and connected
component analysis based method was developed. To recognize the segmented char-
acters, the character set pertaining to that particular era needs to be identified.
Therefore the prediction of the era of the script necessitates knowing its character
set. To predict the period of the script/character, a recognition algorithm based on
curvelet transform has been implemented. Curvelet transform is employed to extract
the character features and the minimum distance classifier is used to classify the
characters according to their eras.
163
7.2 Future Work
The present research work implemented various preprocessing techniques such as the
elimination of noise, segmentation of lines and characters, skew detection and cor-
rection which are very essential for historical document image processing. Historical
documents are usually plagued by low contrast, noise, existence of broken characters
and are typically found in worn out conditions. In this research work, several algo-
rithms have been developed for preprocessing of low contrast and noisy documents.
Although the methods developed in this work are suitably efficient, they were not
extended to address all types of degradations.
Issues in historical documents like the Ink bleeding through creates double con-
tours for characters, which makes the binarization task much more difficult. The
extraction of characters from documents which have ink bleeding through them is
the limitation of our research work and can be taken up as future work. Folding of
the paper introduces unwanted lines and may create distortions if it appears in the
middle of some characters. Suitable elimination of these lines will be our future work.
Even though the enhancement techniques improved legibility, they failed to eliminate
the noise completely. Therefore Post processing algorithms are required to eliminate
these noises and reconstruct the broken characters and words. These could prove to
be grounds for future work.
Extracting useful information from severely degraded stone inscriptions and palm
leaf documents poses an array of challenges. Image acquisition task has to handle
the issues related to capturing and scanning the document. Furthermore, several
additional complexities will be introduced during the image acquisition process based
on lighting condition, image resolution, document size, character size of the document,
paper/palm leaf position, blurring, illumination etc. Elimination of these problems
can also be taken up for future research work.
Documents also contain calligraphic and ornamental styles and carvings of animals,
birds etc and are typically inscribed during and after the writing process. These pose
significant challenges, not only to document processing but also in the extraction of
characters for recognition purposes. The algorithms developed in our research have
164
not been extended to address these issues. Elimination of such styles creates more
avenues for the research community.
Palm leaf documents are available in large volumes and they need to be efficiently
deciphered. Digitizing and enhancing these documents requires fully automated sys-
tems. As palm leaves vary in size, acquiring proper images from their raw form
requires technical expertise. The Scanning process also introduces varieties of chal-
lenges and makes the enhancement task much more difficult. In this work, we have
considered digitized document images and subsequently processed them. Therefore,
the problems that occur during image acquisition have not been addressed. This
could be a motivation for researchers to take up.
The stone inscriptions pose a wide range of problems, starting from image acqui-
sition, all the way up to image recognition. In our research work, we have enhanced
few images which were available in digital format, but most of these images posed a
multitude of problems even after enhancement. These problems need to be handled
effectively and can be taken up in future works.
Handwritten documents are typically very hard to segment into lines, words and
characters, as they consist of very narrow spaces between lines, touching characters,
skewed and curved lines. Efficient algorithms need to be designed to appropriately
extract the lines from curved lines. The algorithms which are developed in this re-
search work adeptly handled segmented touching lines by breaking their connectivity.
The reconstruction of the broken characters has not been handled, it has been de-
ferred for our future work, where we will make an attempt to reconstruct the broken
character in its entirety. Efforts were expended in using Morphological operations to
reconstruct during enhancement, but they were not proficient in handling that task.
As we have mentioned earlier, we have used a myriad of digitized images through-
out our research work and they were converted from their raw for using different
setups. Methods implemented in this research work demanded that the analysis of
the document for character size be done manually. Furthermore it required that the
approximate estimation of the resolution of the image acquisition device to select an
165
optimal value for the controlling parameters and the size of the structuring element.
This could be an effective motivation in paving the path for further work in the
automation of deciphering of historical document images. Since Stone inscriptions
pose several unprecedented challenges, they provide a lot of compelling avenues for
future work. Future work can also be directed towards effectively handling broken
and touching characters and efforts can be guided towards the complete automation
of the deciphering process.
The field of digitization of Historical documents has immense potential whose
applications span numerous fields and domains. Research in Historical Document
Processing endures by being continually intriguing and proffers a fertile ground for
researchers to immerse themselves into and since the research in this field has mostly
remained passive, there is not only a wide scope but also dire need to bring it into
mainstream focus through a number of lively studies.
166
Appendix A
Palm Leaf Images
Figure A.1: Original image of palm leaf script of 18th century.
167
Figure A.2: Input images of palm leaf document belonging to 17th century.
Figure A.3: Palm leaf image belonging to 18th century. noisy input image.
Figure A.4: Input image of palm leaf document belonging to 17th century.
168
Figure A.5: Input image of palm leaf document belonging to 17th century.
Figure A.6: Input images of palm leaf document belonging to 17th century.
169
Appendix B
Paper Images
Figure B.1: Sample paper image belonging to previous century.
170
Figure B.2: Original paper image -1 belonging to nineteenth and beginning of twen-
tieth century.
171
Figure B.3: Original paper image -2 belonging to nineteenth and beginning of twen-
tieth century.
172
Figure B.4: Original paper image-3 belonging to nineteenth and beginning of twen-
tieth century.
173
Appendix C
Stone Inscription Images
Figure C.1: Stone inscription image belonging to 14− 17th century.
174
Figure C.2: Digitized image of Belur temple inscription belonging to 17th century
AD.
175
Figure C.3: Digitized image of Belur temple inscriptions belonging to 17th century
AD.
176
Figure C.4: Digitized image of Shravanabelagola temple inscriptions belonging to
14th century AD.
177
Appendix D
Author’s Publications
List of Publications in Journal
1. B. Gangamma, Srikanta Murthy K, Arun Vikas Singh, “Hybrid Approach Using
Bilateral Filter and Set Theory for Enhancement of Degraded Historical Doc-
ument Image”, CiiT International Jounal of Digital Image Processing, Volume
5, Issue May, pages 488-496, 2012.
2. B. Gangamma, Srikanta Murthy K, Arun Vikas Singh, “Restoration of De-
graded Historical Document Image”, Journal of Emerging Trends in Computing
and Information Sciences, Volume 3, No. 5, pages 792-798, May 2012.
3. B. Gangamma, Srikanta Murthy K, “An Effective Technique using Non Local
Means and Morphological Operations to Enhance Degraded Historical Docu-
ment”, International Journal of Electrical, Electronics and Computer Systems,
Volume 4, Issue 2, pages 1-10, 2011.
4. B. Gangamma, Srikanta Murthy K, “Enhancement of Degraded Historical Kan-
nada Documents”, International Journal of Computer Applications (0975-8887),
Volume 29, No. 11, pages 1-6, September 2011.
5. B. Gangamma, Srikanta Murthy K , “A Collective Approach for Enhancement
and Segmentation of Historical Document Image using Mathematical Morphol-
178
ogy and Non Local Means”, communicated to International Journal on Image
and Graphics, World Scientific Publications.
List of Publications in Conferences
1. B. Gangamma, Srikanta Murthy K, Punitha P, “Curvelet Transform Based Ap-
proach for Prediction of Era of the Epigraphical Scripts”, IEEE International
Conference on Computational Intelligence and Computing Research, coimbat-
ore, pages 636-641, 2012.
2. B. Gangamma, Srikanta Murthy K, Riddhi J Shah, Swati D V, “Extraction
of Text Lines from Historical Documents using Mathematical Morphology”,
National Conference on Indian Language Computing, Cochin, pages 1-4, 2012.
3. B. Gangamma, Srikanta Murthy K, Riddhi J. Shah, Swati D V, “ Text Line Ex-
traction from Palm Script Documents Using Morphological Approach”, Inter-
national Conference on Computer Engineering and Applications Dubai, UAE,
pages 1452-1455, January 29-31, 2012.
4. B. Gangamma, Srikanta Murthy K, “Enhancement of Historical Document Im-
age using Non Local Means Filtering Technique”, IEEE International Confer-
ence on Computational Intelligence and Computing Research, Kanyakumari,
pages 1264-1267, 2011.
5. B. Gangamma, Srikanta Murthy K, Priyanka Chandra G C, Shishir Kaushik,
Saurabh Kumar, “A Combined Approach for Degraded Historical Documents
Denoising Using Curvelet and Mathematical Morphology”, IEEE International
Conference on Computational Intelligence and Computing Research, Coimbat-
ore, pages 824-829, 2010.
6. B. Gangamma, Srikanta Murthy K, Hemanth Kumar G, Riddhi J Shah, Swati
D V, Sandhya B, “Text Line extraction from Kannada Handwritten Docu-
ment”, IEEE, International Conference on Computer Engineering and Tech-
nology, Jodhpur, India, pages E 8-11, 2010.
179
7. B. Gangamma, Srikanta Murthy K, Priyanka Chandra G C, Shishir Kaushik,
Saurabh Kumar, “Degraded Historical Documents Enhancement Using Curvelet
and Mathematical Morphology ”, IEEE, International Conference on Computer
Engineering and Technology, Jodhpur, pages E105-111, 2010.
180
Bibliography
[1] Lap K. H., Guan L., Perry S. W., Wong H. S., “Adaptive image processing, a
computational Intelligent perspective”, CRC Press, Taylor and Francis Group,
2010.
[2] Umbaugh S. E., “Digital Imaging Processing and Analysis : Human and Com-
puter Vision Applications with CVIP tools,”, CRC Press, Taylor & Francis
Group, Second Edition, 2010.
[3] Cheriet M., Kharma N., Liu C. L., “Character Recognition System: A Guide
for Students and Practitioners”, John Wiley & Sons Publications, 2007.
[4] Govindaraju V., Setlur S., “Guide to OCR for Indic Scripts: Document Recog-
nition and Retrieval”, Springer-Verlag London Ltd, 2009.
[5] Sircar D C, “Indian Epigraphy”, Motilal Banarsidass Publications, Delhi, 1996.
[6] Narasimhacharya R, “History of Kannada Literature”, Madras: Asian Educa-
tional Services, 1988.
[7] Tsien, Tsuen-Hsuin, “Paper and Printing”, Joseph Needham, Science and Civil-
isation in China, Chemistry and Chemical Technology, Cambridge University
Press, Volume 5, part 1, 1985.
[8] Murthy K. S., “Transformation of Epigraphical Objects into Machine Recog-
nizable Image Patterns”, Ph.D Thesis, University of Mysore, 2005.
[9] Murthy A. V. N., “Kannada Lipiya Ugama mattu Vikasa”, Kannada Ad-
hyayana Samsthe, Mysore University, 1968.
181
[10] Parpola A., “The Indus Script: a Challenging Puzzle”, World Archaeology,
Volume 17, No. 3, pages 399-419, Feb 1986.
[11] www.ancientscripts.com
[12] Fisher R., Ken D. H., Fitzgibbon A., Robertson C., Trucco E., “Dictionary
of Computer Vision and Image Processing”, John Wiley & Sons, Publications,
2005.
[13] Haralick R. M, Shapiro L. G., “Glossary of Computer Vision Terms”, Pattern
Recognition, Volume 24, Issue 1, pages 69-93, 1990.
[14] Lu S., Tan C. L., “Binarization of Badly Illuminated Document Images through
Shading Estimation and Compensation”, Ninth International Conference on
Document Analysis and Recognition, pages 321-316, 2007.
[15] Ntogas, N., Ventzas D., “A Binarization Algorithm for Historical Manuscripts”,
12th WSEAS International Conference on Communications, Greece, July 23-25,
pages 41-51, 2008.
[16] Likeforman-Sulem L., Drabon J., Smith E. H. B., “Enhancement of Historical
Printed Document Images By Combining and Non Local Means Filtering”,
Image and Vision Computing, Volume 29, Issue 5, pages 351-363, April 2011.
[17] Buades A., Coll B., Morel J. M., “A Non-Local Algorithm for Image Denois-
ing”, Proceedings IEEE Computer Society Conference on Computer Vision and
Pattern Recognition, Volume 2, pages 60-65, 2005.
[18] Gatos B., Pratikakis I., Perantonis S. J., “Adaptive Degraded Document Image
Binarization”, Journal of Pattern Recognition, Volume 39, pages 317-327, 2005.
[19] Kishore N. K., Rege P. P., “Adaptive Enhancement of Historical Document
Images”, IEEE International Symposium on Signal Processing and Information
Technology, pages 983-88, 2007.
[20] Razak Z, Zulkiflee K, Idris M. Y. I, Tamil E. M, Noor M. N. M, Rosli Salleh R,
Yusof M. Y. M. Y, Yaacob M. “Off-line Handwriting Text Line Segmentation:
182
A Review”, International Journal of Computer Science and Network Security,
Volume 8, Issue 7, pages 12-20 , 2008.
Razak08
[21] B. Yanikoglu,P.A. Sandon,Segmentationof Off-line Cursive Handwriting Us-
ing Linear Programming ,Pattern Recognition, Volume 31, Issue 12,pages
1825a1833, 1998.
[22] Louloudis G, Gatos B, Halatsis C, aText line detection in unconstrained hand-
written documents usinga block-based Hough transform approacha, Proceed-
ings of International Conference on Document Analysis and Recognition, pages
599a603, 2007
[23] Nagy G., Seth S., ”Hierarchical Representation of Optically Scanned Docu-
ments”, Seventh International Conference on Pattern Recognition, pages 347-
349, 1984.
[24] Wahl F. M., Wong K.Y., Casey R. G., “Block Segmentation and Text Extrac-
tion in Mixed Text/Image Documents”, Computer Graphics and Image Pro-
cessing, pages 375-390, 1982.
[25] Feldbach M., Tonnies K. D, “Line detection and segmentation in Historical
Church registersa, Proceedings of the 6th International Conference on Document
Analysis and Recognition, pages 743-747, 2001.
[26] O’Gorman L., “The document spectrum for page layout analysisa, IEEE Trans-
actions on Pattern Analysis and Machine Intelligence, Volume 15, Issue 11,
pages 1162-1173, 1993.
[27] Breuel T. M., “Two geometric algorithms for layout analysisa, Proceedings of
the 5th International Workshop on Document Analysis Systems V, pages 188-
199, 2002.
[28] Hough P. C. V., “Methods and Means for Recognizing Complex Patterns”, US
Patent 3069654, 1962.
183
[29] Duda R. D., Hart P.E., “Use of the Hough Transform to Detect Lines and
Curves in Pictures”, Communications of the ACM, Volume 15, Issue 1, pages
11-15, 1972.
[30] Manmatha R., Rothfeder J. L., “A scale space approach for automatically seg-
menting words from historical handwritten documents”, IEEE Transactions on
Pattern Analysis and Machine Intelligence, Volume 27, Issue 8, pages 1212-
1225, 2005.
[31] He J., Downton A. C., ”User-assisted archive document image analysis for
digital library construction”, Seventh International Conference on Document
Analysis and Recognition, pages 498-502, 2003.
[32] Shi Z., Govindaraju V., “Line separation for complex document images using
fuzzy runlength”, Proceedings a First International Workshop on Document
Image Analysis for Libraries a DIAL 2004, pages 306-312, 2004.
[33] Likforman-Sulem L., Hanimyan A., Faure C., “A Hough based algorithm for
extracting text lines in handwritten document” Proceedings of International
Conference on Document Analysis and Recognition, pages 774-777, 1995.
[34] Manmatha R., Srimal N., “Scale space technique for word segmentation in
handwritten manuscripts”, Proceedings 2nd International Conference on Scale
Space Theories in Computer Vision, pages 22-33, 1999.
[35] Pal U., Datta S., “Segmentation of Bangla Unconstrained Handwritten Text”,
Proceedings of Seventh International Conference on Document Analysis and
Recognition, pages 1128-1132, 2003.
[36] Surinta O., Chamchong R., “Image Segmentation of Historical Handwriting
from Palm Leaf Manuscripts”, Intelligent Information Processing IV series,
pages 182-189, 2008.
[37] Kunte R. S., Samuel R. D. S., “A Simple and Efficient Optical Character Recog-
nition System for Basic Symbols in Printed Kannada Text”, Sadhana, Volume
32, pages 521-533, 2007.
184
[38] Arica N., Vural F. Y., “An Overview of Character Recognition Focused on
Off-line Handwriting”, IEEE Transactions on Systems, Man, and Cybernetics,
Part C: Applications and Reviews, Volume 31, Issue 2, pages 216-233, 2001.
[39] Boutros G., “Automating Degraded Image Enhancement Processing”, Sympo-
sium on Document Image Understanding Technology, College Park, Maryland,
2005.
[40] Jain A., Bhattacharjee S., “Text Segmentation using Gabor Filters for Auto-
matic Document Processing”, MVA 5, pages 169-184, 1992.
[41] Mello C. A. B., Cavalcanti C. S. V. C., Carvalho C., “Colorizing Paper Texture
of Green-Scale Image of Historical Documents”, In: Proceedings of the 4th
IASTED Conference on Visualization, Imaging and Image Processing, 2004.
[42] Ulges A., Lampert C. H, Breuel T. M., “Document Image Dewarping using Ro-
bust Estimation of Curled Text Lines”, Proceedings of 8th International Con-
ference on Document Analysis and Recognition, pages 1001-1005, 2005.
[43] Zhang L., Tan C. L, “Warped Image Restoration with Applications to Digital
Libraries”, Proceedings of 8th International Conference on Document Analysis
and Recognition, pages 192-196, 2005.
[44] Cao H., Ding X., Liu C., “Rectifying the bound document image captured by
the camera: a model Based Approach”, Proceedings 7th International Confer-
ence on Document Analysis and Recognition, pages 71-75, 2003.
[45] Fan J., Lin X., Simske S, “A Comprehensive Image Processing Suite for Book
Re-mastering”, Proceedings of 8th International Conference on Document Anal-
ysis and Recognition, pages 447-451, 2005.
[46] Jayadevan R., Kolhe S. R., Patil P. M., Pal U., “Automatic processing of hand-
written bank cheque images:a Survey”, International Journal on Document
Analysis and Recognition, Volume 15, Issue 4, page 267-297, July 2011.
[47] Suen C. Y., Lam L., Guillevic D., Strathy N. W., Cheriet M., Said J. N., Fan
R., “Bank check processing system”, International Journal on Imaging Systems
Technology, Volume 7, pages 392-403, 1996.
185
[48] Madasu V. K., Lovell B. C., “Automatic Segmentation and Recognition of Bank
Cheque Fields”, Proceedings of the Digital Imaging Computing: Techniques and
Applications, pages 33-38, 2005.
[49] Neves R. F. P., Mello C. A. B., Silva M. S., Bezerra B. L. D., “A New Algorithm
to Threshold the Courtesy Amount of Brazilian Bank Checks”, Proceedings of
IEEE International Conference on Systems Man and Cybernetics, pages 1226-
1230, 2008.
[50] Hull J. J., “Document Image Skew Detection: Survey and Annotated Bibliogra-
phy”, Document Analysis Systems II, pages 40-64. World Scientific, Singapore,
1998.
[51] Lee L. L., Lizarraga M.G, Gomes N.R, Koerich A.L, “A Prototype for Brazil-
ian Bank Check Recognition”, International Journal on Pattern Recognition
Artificial Intelligence, Volume 11, Issue 4, pages 549-570, 1997.
[52] Sahoo P. K., Soltani S., Wong A. K. C, Chen Y. C., “A Survey of Thresholding
Techniques”, Computer Vision Graphics and Image Processing, Volume 41,
Issue 2, pages 233-260, 1988.
[53] Otsu N., “A Threshold Selection Method from Gray Level Histograms”, IEEE
Transaction on Systems Man and Cybernetics, Volume 9, Issue 1, pages 62-66,
1979.
[54] Sezgin M, Sankur B, “Survey Over Image Thresholding Techniques and Quan-
titative Performance Evaluation”, Journal on Electronic Imaging, Volume 13,
Issue 1, pages 317-327, 2004.
[55] Mello C. A. B, Bezerra B. L. D., Zanchettin C., Macario V., “An efficient thresh-
olding algorithm for Brazilian bank checks”, Proceedings of 9th International
Conference on Document Analysis and Recognition, pages 193-197, 2007.
[56] Palacios R, Gupta A, “A System for Processing Handwritten Bank Checks
Automatically”, Image and Vision Computing Journal, Volume 26, Issue 10,
pages 1297-1313, October, 2008.
186
[57] Chandra L., Gupta R., Kumar P., Ganotra D., “Automatic Courtesy Amount
Recognition for Indian Bank Checks”, Proceedings of IEEE Region 10 Confer-
ence, pages 1-5, 2008.
[58] Kim G., Govindaraju V., “Bank Check Recognition using Cross Validation Be-
tween Legal and Courtesy Amounts”, International Journal on Pattern Recog-
nition Artificial Intelligence, Volume 11, Issue 4, pages 657-674, 1997.
[59] Guillevic D., Suen C. Y., “Recognition of Legal Amounts on Bank Cheques”,
International Journal on Pattern Analysis and Application, Volume 1, Issue 1,
pages 28-41, 1998.
[60] Guillevic D., Suen C. Y., “Cursive Script Recognition Applied to the Processing
of Bank cheques”, Proceedings of 3rd International Conference on Document
Analysis and Recognition, pages 11-14, 1995.
[61] Guillevic D., “Unconstrained Handwriting Recognition Applied to the Recog-
nition of Bank Cheques”, PhD thesis, Concordia University, 1995.
[62] Kaufmann G., Bunke H., “Automated Reading of Cheque Amounts”, Interna-
tional Journal on Pattern Analysis and Application, Volume 3, pages 132-141,
2000.
[63] Kimura F., Tsuruoka S., Miyake Y., Shridhar M., “A Lexicon Directed Algo-
rithm for Recognition of Unconstrained Handwritten Words”, IEICE Transac-
tions on Information Systems, pages 785-793, 1994.
[64] Antonacopoulos A., “Flexible Page Segmentation using The Background”, Pro-
ceedings of the 12th International Conference on Pattern Recognition, Volume
2, pages 339-344, 1994.
[65] Fletcher L. A., Kasturi R., “Text String Segmentation From Mixed
Text/Graphics Images”, IEEE Pattern Analysis and Machine Intelligence, Vol-
ume 10, Issue 3, pages 910-918, 1988.
[66] Nagy G., Seth S., “Hierarchical Representation Of Optically Scanned Docu-
ments”, 7th International Conference on Pattern Recognition, pages 347-349,
1984.
187
[67] Downton A., Leedham C. G., “Preprocessing And Presorting of Envelope Im-
ages for Automatic Sorting using OCR”, International Journal on Pattern
Recognition, Volume 23, Issue 3-4, pages 347-362, 1990.
[68] Cohen E., Hull J., Srihari S., “Understanding Handwritten Text in a Structured
Environment: Determining Zip Codes From Addresses”, International Journal
on Pattern Recognition, Volume 5, Issue 1-2, pages 221-264, 1991.
[69] Govindaraju V., Srihari S. N., “Handwritten Text Recognition”, Proceedings of
Document Analysis Systems, pages 157-171, 1994.
[70] Seni G., Cohen E., “External Word Segmentation of Off-line Handwritten Text
Lines”, Journal of Pattern Recognition, Volume 27, Issue 1, pages 41-52, 1994.
[71] Srihari S., Kim G., “Penman: A System for Reading Unconstrained Handwrit-
ten Page Image”, Symposium on Document Image Understanding Technology,
pages 142-153, 1997.
[72] Zhang B., Srihari S. N., Huang C., “Word Image Retrieval Using Binary Fea-
tures”, SPIE Conference on Document Recognition and Retrieval XI, pages
18-22, January 2004.
[73] Zahour A., Taconet B., Mercy P., Ramdane S., “Arabic Handwritten Text-
Line Extraction”, Proceedings of the 6th International Conference on Document
Analysis and Recognition, Seattle, pages 281-285, 2001.
[74] Shapiro V., Gluhchev G., Sgurev V., “Handwritten Document Image Segmen-
tation and Analysis”, Pattern Recognition Letters, Volume 14, pages 71-78,
1993.
[75] Antonacopoulos A., Karatzas D., “Document Image Analysis for World War II
Personal Records”, First International Workshop on Document Image Analysis
for Libraries, DIALa04, pages 336-341, 2004.
[76] Wong K., Casey R., Wahl F., “Document Analysis Systems”, IBM J. Res. Dev.
Volume 26, Issue 6, pages 647-656, 1982.
188
[77] LeBourgeois F., “Robust Multifont OCR System From Gray Level Images”,
4thInternational Conference on Document Analysis and Recognition, Volume 1,
pages 1-5, 1997.
[78] LeBourgeois F., Emptoz H., Trinh E., Duong J., “Networking Digital Document
Images”, 6th International Conference on Document Analysis and Recognition,
pages 379-383, 2001.
[79] Shi Z., Govindaraju V.,“Line Separation for Complex Document Images using
Fuzzy Runlength”, Proceedings of International Workshop on Document Image
Analysis for Libraries, pages 23-24, January 2004.
[80] Pu Y., Shi Z., “A natural learning algorithm based on Hough transform for
text lines extraction in handwritten documents”, In: Proceedings of the 6th
International Workshop on Frontiers in Handwriting Recognition, Korea, pages
637-646, 1998.
[81] Oztop E., Mulayim A. Y., Atalay V., Yarman-Vural F., “Repulsive Attrac-
tive Network for Baseline Extraction on Document Images”, Signal Processing,
Volume 75, pages 1-10, 1999.
[82] Tseng Y. H., Lee H. J., “Recognition-Based Handwritten Chinese Character
Segmentation using a Probabilistic Viterbi Algorithm”, Pattern Recognition
Letters, Volume 20, Issue 8, pages 791-806, 1999.
[83] Bruzzone E., Coffetti M. C., “An Algorithm for Extracting Cursive Text Lines”,
Proceedings of International Conference on Document Analysis and Recogni-
tion, pages 20-22, pages 749-752, 1999.
[84] Khandelwal A., Choudhury P., Sarkar R., Basu S., Nasipuri M., Das N., “Text
Line Segmentation for Unconstrained Handwritten Document Images Using
Neighborhood Connected Component Analysis”, Pattern Recognition and Ma-
chine Intelligence, pages 369-374, 2009.
189
[85] Zahour A., Taconet B., Likforman-Sulem L., Boussellaa W., “Overlapping and
Multi-Touching Text-Line Segmentation by Block Covering Analysis”, Interna-
tional Journal on Pattern Analysis and Applications, Volume 12, Issue 4, pages
335-351, 2009.
[86] Boussellaa W., Bougacha A., Zahour A., Abed H. E, Alimi A., “Enhanced Text
Extraction from Arabic Degraded Document Images using EM Algorithm”,
International Conference on Document Analysis and Recognition, pages 743-
747, 2009.
[87] Bloomberg D. S., “Multiresolution Morphological Approach to Document Im-
age Analysis”, Proceedings of International Conference on Document Analysis
and Recognition, pages 963-971, 1991.
[88] Bukhari S. S., Shafait F., Breue T. M., “Improved Document Image Seg-
mentation Algorithm using Multiresolution Morphology”, In IST/SPIE Elec-
tronic Imaging, International Society for Optics and Photonics,, pages 78740D-
78740D, January 24, 2011.
[89] Bansal V., Sihna R. M. K, “Segmentation of Touching and Fused Devanagari
Characters”, Pattern Recognition, Volume 35, Number 4, pages 875-893, 2002.
[90] Ashkan M.Y., Guru D. S., Punitha P., “Small Eigen Value Based Skew Es-
timation in Persian Digitized Documents”, Proceedings of the International
Conference on Computer Graphics, Imaging and Visualisation, pages 64-70,
2006.
[91] Shi Z., Govindaraju V., “Historical Document Image Enhancement Using Back-
ground Light Intensity Normalization”, 17th International Conference on Pat-
tern Recognition, Volume 1, pages 473-476, 2004.
[92] Shi Z., Govindaraju V., “Historical Handwritten Document Image Segmenta-
tion Using Background Light Intensity Normalization”, Proceedings of SPIE
5676, 167 and Document Recognition and Retrieval, Issue 12, pages 167-174,
2005.
190
[93] Yan C, Leedham G., “Decompose Threshold Approach to Handwriting Extrac-
tion in Degraded Historical Document”, Proceedings of the 9th International
Workshop on Frontiers in Handwritten Recognition, pages 239-244, 2004.
[94] Louloudis G., Gatos B., Pratikakis I., Halatsis K., “A Block-Based Hough
Transform Mapping for Text Line Detection in Handwritten Documents”, Pro-
ceedings of the Tenth International Workshop on Frontiers in Handwriting
Recognition, pages 515-520, 2006.
[95] Gatos B., Pratikakis I., Perantonis S. J., “Efficient Binarization of Historical
and Degraded Document Images”, 8th International Workshop on Document
Analysis Systems(DAS’08), pages 447-454, Japan, 2008.
[96] Shi Z., Setlur S., Govindaraju V., “Digital Image Enhancement of Indic His-
torical Manuscripts, Guide to OCR for Indic Scripts”, Advances in Pattern
Recognition, Springer-Verlag London Ltd, pages 249-267, 2009.
[97] Shi Z., Setlur S., Govindaraju V., “A Steerable Directional Local Profile Tech-
nique for Extraction of Handwritten Arabic Text Lines”, Proceedings of the
10th International Conference on Document Analysis and Recognition, pages
176-180, 2009.
[98] Nikolaou N., Makridis M., Gatos B., Stamatopoulos N., Papamarkos N.,
“Segmentation of Historical Machine-Printed Documents using Adaptive Run
Length Smoothing and Skeleton Segmentation Paths”, Image and Vision Com-
puting Journal, Volume 28, Issue 4, pages 590-604, 2010.
[99] Fadoua D., Bourgeis F. L., Emptoz H., “Restoring Ink Bleed Through Degraded
Document Images Using a Recursive Unsupervised Classification Technique”,
Spinger-Verlag Berlin Heidelberg, DAS LNCS 3872, pages 38-49, 2006.
[100] Gatos B., Pratikakis I., Perantonis S.J., “Improved Document Image Bina-
rization by Using a Combination of Multiple Binarization Techniques and
Adapted Edge Information”, 19th International Conference on Pattern Recog-
nition, pages 1-4, 2008.
191
[101] Halabi Y. S., Zaid S. A., “Modeling Adaptive Degraded Document Image Bi-
narization and Optical Character System”, European Journal of Scientific Re-
search, Volume 28, No. 1, pages 14-32, 2009.
[102] Fillali F., Benmahammed K, Abid G., “Image Restoration using SVD and
Adaptive Regularization”, Journal Automation and Systems Engineering Vol-
ume 4, Issue 3, pages 173-181, 2010.
[103] Badekas E., Papamarkos N., “Estimation of Appropriate Parameter Values
for Document Binarization Techniques”, International Journal of Robotics and
Automation, Volume 24, No. 1, pages 66-78, 2009.
[104] Bukhari S. S., Shafait F., Breuel T. M., “Layout Analysis of Arabic Script
Documents”, Book Chapter 2, Guide to OCR for Arabic Scripts, pages 35-53,
Springer-Verlag London, 2012.
[105] Asi A, Saabni R., Sana1 J. E., “Text Line Segmentation for Gray Scale His-
torical Document Images”, Proceedings of the 2011 Workshop on Historical
Document Imaging and Processing, Beijing, China, pages 120-126, 2011.
[106] Hanault D. R., Moghaddam R. F., Cheriet M., “A Local Linear Level Set
Method For the Binarization of Degraded Historical Document Images”, Inter-
national Journal on Document Analysis and Recognition, Volume 15, Issue 2,
pages 101-124, June 2012.
[107] Mantas J., “An Overview of Character Recognition Methodologies”, Pattern
Recognition, Volume 19, No. 6, pages 425-430, 1989.
[108] Govindan V. K, Shivaprasad A.P, “Character Recognition a A Review”, Pattern
Recognition, Volume 23, No. 7, pages 701-709, 1990.
[109] Tian Q., Peng Z., Thomas A., Yongmin K., “Survey: Omni Font Printed Char-
acter Recognition”, Proceedings of SPIE Visual Communication and Image
Processing, Volume 1606, pages 260-268, 1991.
[110] Belaid A. Haton J. P., “A Syntactic Approach for Handwritten Mathematical
Formula Recognition”, IEEE Transaction on Pattern Analysis and Machine
Intelligence, Volume 1, pages 105-111, 1984.
192
[111] Sridhar M., Badreldin A., “High Accuracy Syntactic Recognition Algorithm for
Handwritten Numerals”, IEEE Transaction on Systems Man and Cybernetics,
Volume 15, Issue 1, pages 152-158, 1985.
[112] Tappert C.C., Suen C.Y., Wakahara T., “The State of Art in on-line Handwrit-
ing Recognition”, IEEE Transaction on Pattern Analysis and Machine Intelli-
gence, Volume 12, No. 8, pages 787-808, 1990.
[113] Stubberud P, Kanai J., Kalluri V., “Adaptive Image Restoration of Text Im-
ages That Contain Touching or Broken Characters”, Proceedings of the Third
International Conference on Document Analysis and Recognition, pages 778-
781, 1995.
[114] Chaudhuri B. B., Pal U., “A Complete Printed Bangla OCR Systems”, Pattern
Recognition , Volume 31, Issue 5 pages 531-549, 1998.
[115] Sural S., Das P. K., “Fuzzy Hough transform, Linguistic Sets and Soft Decision
MLP for Character Recognition”, Proceedings of Fifth International Conference
on Soft Computing and Information/Intelligent Systems, pages 975-978, 1998,.
[116] Pal U., Kundu P. K., Chaudhuri B. B., “OCR Error Correction of Inflectional
Indian Language using Morphological Parsing”, Journal of Information Science
and Engineering, Volume 16, pages 903-922, 2000.
[117] Pal U., Chaudhuri B. B., “Machine Printed and Handwritten Text Lines Iden-
tification”, Pattern Recognition Letters, Volume 2, pages 431-441, 2001.
[118] Pal U., Belaid A., Choisy Ch., “Touching Numeral Segmentation using Water
Reservoir Concept”, Pattern Recognition Letters, Volume 24, pages 261-272,
2003.
[119] Pal U., Chaudhuri B. B., “Indian Script Character Recognition:A Survey”,
Journal of Pattern Recognition, Volume 37, Issue 9, pages 1128-1132, 2004.
[120] Uchida S., Sakoe H., “Eigen Deformations for Elastic Matching Based Hand-
written Character Recognition”, Pattern Recognition, Volume 36, pages 2031-
2040, 2003.
193
[121] Pujari A. K., Naidu C. D., Jinaga B. C., “An Intelligent Character Recognizer
for Telugu Scripts using Multiresolution Analysis and Associative Memory”,
Image and Vision Computing, Volume 22, Issue 14, pages 1221-1227, 2004.
[122] Rasagna V., Jinesh K. J., Jawahar C. V., “On Multifont Character Classifi-
cation in Telugu”, Information Systems for Indian Languages,Communications
in Computer and Information Science, Volume 139, pages 86-91, 2011.
[123] Sastry P. N., Krishnan R., Ram B. V. S., “Classification and Identification
of Telugu Handwritten Characters Extracted from Palm leaves Using Decision
Tree Approach”, ARPN Journal of Engineering and Applied Sciences, Volume
5, Issue 3, pages 22-32, 2010.
[124] Goyal P., Diwakar S., Agrawal A., “Devanagari Character Recognition towards
natural Human-Computer Interaction”, India HCI,Interaction Design and In-
ternational Development, Indian Institute of Technology, pages 20-24, 2010.
[125] Shelke S., Apte S., “A Novel Multi-feature Multi-Classifier Scheme for Uncon-
strained Handwritten Devanagari Character Recognition”, Proceedings of 12th
International Conference on Frontiers in Handwriting Recognition, Kolkata,
pages 215-219, 2010.
[126] Shelke S., Apte S., “A Novel Multistage Classification and Wavelet Based Ker-
nel Generation For Handwritten Marathi Compound Character Recognition”,
Proceedings of International Conference on Communications and Signal Pro-
cessing, pages 193-197, 2011.
[127] Shelke S., Apte S., “Multistage Handwritten Marathi Compound Character
Recognition Using Neural Networks”, Journal of Pattern Recognition Research,
Volume 2, pages 253-268, 2011.
[128] John J., Pramod K. V., Balakrishnan K., “Unconstrained Handwritten Malay-
alam Character Recognition using Wavelet Transform and Support vector Ma-
chine Classifier”, Procedia Engineering, Volume 30, pages 598-605, 2012.
194
[129] Pal U., Kundu S., Ali Y., Islam H., Tripathy N., “Recognition of Unconstrained
Malayalam Handwritten Numeral”, Proceedings of the Fourth Indian Confer-
ence on Computer Vision,ICVGIP, pages 423-428, 2004.
[130] Nagabhushan P., Pai R. M., “Modified Region Decomposition Method and Op-
timal Depth Decision Tree in the Recognition of Non-uniform Sized Characters-
An Experimentation with Kannada Characters”, Pattern Recognition Letters,
Volume 20, pages 1467-1475, 1999.
[131] Ashwin T. V., Sastry P. S, “A Font and Size Independent OCR Systems for
Printed Kannada Documents using Support Vector Machines”, Sadhana, Vol-
ume 27, pages 35-58, 2002.
[132] Chaudhuri B. B., Bera S., “Handwritten Text Line Identification In Indian
Scripts”, 10th International Conference on Document Analysis and Recognition,
pages 636-640, 2009.
[133] Lakshmi C. V., Patvardhan C., “An optical character recognition system for
printed Telugu text”, Journal on Pattern Analysis and Application, Volume 7,
Issue 2, pages 190-204, 2004.
[134] Kokku A., Chakravarthy S., “A Complete OCR System for Tamil Magazine
Documents, A Guide to OCR for Indic Scripts”, Springer Verlag London lim-
ited, pages 147-162, 2009.
[135] Shashikiran K., Kolli S. P., Kunwar R., Ramakrishnan A. G., “Comparison of
HMM and SDTW for Tamil Handwritten Character Recognition”, 2010 Inter-
national Conference on Signal Processing and Communications, pages 1-4.
[136] Hirabara L. Y., Aires S. B. K., Freitas C. O. A., “Dynamic Zoning Selection for
Handwritten Character Recognition”, Progress in Pattern Recognition, Image
Analysis, Computer Vision, and Applications, LNCS 7042, pages 507-514, 2011.
[137] DiLecce V, Dimauro G, Guerriero A, Impedovo S, Pirlo G, Salzo A, “Zoning
Design for Handwritten Numerical Recogniotion”, 7th International Workshop
on Frontiers in Handwriting Recognition, pages 583-588, 2000.
195
[138] Freitas C. O. A., Oliveira L. E. S., Bortolozzi F., Aires S. B. K., “Handwrit-
ten Character Recognition using Non-Symmetrical Perpetual Zoning”, Inter-
national Journal of Pattern Recognition and Artificial Intelligence, Volume 21,
Issue 1, pages 1-21, 2007.
[139] Poisson E., Gaudin V. C., Lallican P.M, “Multi-Modular Architecture Based
on Convolutional Neural Networks for Online Handwritten Character Recogni-
tion”, In:International Conference on Neural Information Processing, Volume
5, pages 2444-2448, 2002.
[140] Tay Y. H., Lallican P. M., Khalid M., Gaudin C. V., Knerr S., “An Offline Cur-
sive Handwritten Word Recognition System”, In: Proceedings of IEEE Region
10 International Conference on Electrical and Electronic Technology, Volume
2, pages 519-524, 2001.
[141] Avila S. D., Matos L., Freitas C., Carvalho J. M. D., “Evaluating a Zon-
ing Mechanism and Class-Modular Architecture for Handwritten Characters
Recognition”, CIARP’07 Proceedings of the Congress on pattern recognition,
12th Iberomerican Conference on Progress in pattern recognition, image analy-
sis and applications, pages 515-524, 2007.
[142] Vishwaas M., Arjun M. M., Dinesh. R., “Handwritten Kannada Character
Recognition Based on Kohonen Neural Network”, International Conference on
Recent Advances in Computing and Software Systems, pages 91-97, 2012.
[143] Prasad M. M., Sukumar M., Ramakrishnan A. G., “Divide and Conquer Tech-
nique in Online Handwritten Kannada Character Recognition”, Proceedings of
the International Workshop on Multilingual OCR, Article No. 11, ACM New
York, 2009.
[144] Kunte R. S., Samuel R. D. S., “A two-stage Character Segmentation Tech-
nique for Printed Kannada Text”, GVIP Special Issue on Image Sampling and
Segmentation, pages 1-8, 2006.
[145] Urolagin S., Prema K. V., Reddy N. V. S., “Kannada Alphabets Recognition
with Application to Braille Translation”, International Journal on Image and
Graphics, Volume 11, No. 3, pages 293-314, 2011.
196
[146] Sheshadri K , Ambekar P, Prasad D. P., Kumar R. P, “An OCR System for
Printed Kannada using K-Means Clustering” , 2010 IEEE International Con-
ference on Industrial Technology, pages 183-187, 2010.
[147] Dhandra B. V., Mukarambi G., Hangarge M., “A Recognition System for Hand-
written Kannada and English Characters”, International Journal of Computa-
tional Vision and Robotics, Volume 2, No. 4, pages 290-301, 2011.
[148] Liu C. L., Suen C. Y., “A New Benchmark on the Recognition of Handwritten
Bangla and Farsi Numeral Characters”, Pattern Recognition, Volume 42, pages
3287-3295, 2009.
[149] Sonka M., Hlavac V., Boyle R., “Image Processing, Analysis, and Machine
Vision”, Brooks and Cole Publishing, 1998.
[150] Shih F., “Image Processing and Mathematical Morphology Fundamentals and
Applications”, Wiley Publications, IEEE press, 2010.
[151] Ye X., Cheriet M., Suen C. Y., Liu K., “Extraction of Bank Check Items by
Mathematical Morphology”, International journal on Document Analysis and
Recognition, Springer Link, Volume 2, No. 2, pages 53-66, 1999.
[152] Shetty S. , Sridhar M., “Background Elimination in Bank Cheques using Gray
Scale Morphology”, Proceedings of the 7th International Workshop on Frontiers
in Handwriting Recognition, pages 83-92, 2000.
[153] Mengucci M., Granado I., “Morphological Segmentation of Text and Figures in
Renaissance Books XVI Century”, Mathematical Morphology and its applica-
tions to image processing by Goutsias J, Vincent L, Bloomberg D(eds.), pages
397-404, 2000.
[154] Gonzalez R. C., Woods R. E., “Digital Image processing”, PHI Publication,
Third Edition, 2008.
[155] Tomasi C., Manduchi R., “Bilateral Filtering for Gray and Color Images”, Pro-
ceedings of the IEEE International Conference on Computer Vision, Bombay,
India, pages 839-846, 1998.
197
[156] Barash D., “A Fundamental Relationship Between Bilateral Filtering, Adap-
tive Smoothing, and the Nonlinear Diffusion Equation”, IEEE Transactions on
Pattern Analysis and Machine Intelligence, Volume 24, No. 6, pages 844-847,
2002.
[157] Hamarneh G., Hradsky J., “Bilateral Filtering of Diffusion Tensor Magnetic
Resonance Images”, IEEE Transactions on Image Processing, Volume 16, No.
10, pages 2463-2475, October 2007.
[158] Bazan C., Blomgren P., “Image Smoothing and Edge Detection by Nonlinear
Diffusion and Bilateral Filter”, Research Report CSRCR, Volume 21, pages
2-15, 2007.
[159] Buades A., Coll B., Morel J. M., “A Non local Image and Movie Denoising”,
International Journal of Computer Vision, Volume 72, No. 123-139, 2008.
[160] Chacko B. P., Krishnan V. R. V,, Raju G., Anto P. B., “Handwritten Character
Recognition using Wavelet Energy and Extreme Learning Machine”, Interna-
tional Journal of Machine Learning and Cybernetics, Volume 3, No. 2, pages
149-161, 2012.
[161] Tan C. L, Cao R, Shen P, “Restoration of Archival Documents using a Wavelet
Technique”, IEEE Pattern Analysis and Machine Intelligence, Volume 24, Issue
10, pages 1399-1404, 2002.
[162] Chang S. G., Yu B., Vetterli M., “Adaptive Wavelet Thresholding for Image
Denoising and Compression”, IEEE Transaction on Image Processing, Volume
9, No. 9, pages 1532-1546, September 2000.
[163] Donoho D. L., Johnstone I. M., “Adapt to Unknown Smoothness Via Wavelet
Shrinkage”, Journal of American Statistical Association, Volume 90, pages
1200-1224, 1995.
[164] Luisier F., Blu T., Unser M., “A New SURE Approach to Image Denoising:
Interscale Orthonormal Wavelet Thresholding”, IEEE Transactions on Image
Processing, Volume 16, No. 3, pages 593-606, March 2007.
198
[165] Zhang X. P., Desai M., “Adaptive Denoising Based On Sure Risk”, IEEE Trans-
actions on signal Procesing, Volume 5, Issue 10, pages 265 - 267, 1998.
[166] Rao R. M, Bopardikar A. S., “Wavelet Transforms: Introduction To The-
ory And Application”, Fundamental of Electronic Image Processing, Addison-
Wesly, pages 126, 2001.
[167] Blu T, Luisier F., “The SURE-LET Approach to Image Denoising”,IEEE
Transactions On Image Processing, Volume 16, Issue 11, pages 2778-2786, 2007.
[168] Chipman H. A., Kolaczyk E. D., McCulloch R. E., “Adaptive Bayesian Wavelet
Shrinkage”, Journal of American Statistical Association, Volume 92, Issue 440,
pages 1413-1421, Dec 1997.
[169] Donoho D. L., “De-noising by Soft-Thresholding”, IEEE Transaction on Infor-
mation Theory, Volume 41, Issue 3, pages 613-627, May 1995.
[170] Zhang B., Zhang Y., Lu W., Han G., “Phenotype Recognition by Curvelet
Transform and Random Subspace Ensemble”, Journal of Applied Mathematics
Bio-informatics, Volume 1, Issue 1, pages 79-103, 2011.
[171] Fadili M. J., Starck J. L., “Curvelets and Ridgelets”, In Encyclopedia of Com-
plexity and Systems Science, Volume 3, pages 1718-1738, 2007.
[172] Candes E., Demanet L., Donoho D., Ying L., “Fast Discrete Curvelet Trans-
forms”, Multiscale Modeling and Simulation, Volume 5, No. 3, pages 861-899,
2006.
[173] Starck J. L., CandA¨s E. J, Donoho D. L., “The Curvelet Transform for Image
Denoising”, IEEE Transactions on Image Processing, Volume 11, No. 6, pages
670-684, 2002.
[174] Starck J. L., Murtagh F, Candes E. J., Donoho D. L., “Gray and Color Image
Contrast Enhancement by the Curvelet Transform”, IEEE Transactions on
Image Processing, Volume 12, Issue 6, pages 706-717, 2003.
199
[175] Sumana I., Islam M., Zhang D., Lu G., “Content Based Image Retrieval using
Curvelet Transform”, IEEE 10th Workshop on Multimedia Signal Processing,
pages 11-16, 2008.
[176] http://www.curvelet.org/software.html, last updated 24 August 2007.
[177] Louloudis G., Gatos B., Pratikakis I., Halatsis C., “Text Line and Word Seg-
mentation of Handwritten Documents”, Journal of Pattern Recognition, Vol-
ume 42, pages 3169-3183, 2009.
[178] Basu S, Chaudhuri C., Kundu M., Nasipuri M., Basu D. K., “Text Line Extrac-
tion from Multi-Skewed Handwritten Documents”, Journal of Pattern Recog-
nition, Volume 40, Issue 6, pages 1825-1839, 2007.
[179] Kennard D. J, Barrett W. A., “Separating Lines of Text in Free-Form Handwrit-
ten Historical Documents”, The Second International Conference on Document
Image Analysis for Libraries, pages 23, 2006.
[180] Likforman-Sulem L., Faure C., “Extracting Text Lines in Hand-Written Doc-
uments by Perceptual Grouping”, Advances in Handwriting and Drawing: A
multidisciplinary approach, Paris, pages 21-38, 1994.
[181] Likforman-Sulem L., Hanimyan A., Faure C., Nat E., “A Hough Based Al-
gorithm for Extracting Text Lines in Handwritten Documents”, Proceedings
of the Third International Conference on Document Analysis and Recognition,
Volume 2, pages 774-777, 1995.
[182] Aradhya V. N. M., Kumar G. H., Shivakumara P., “Skew Detection Technique
for Binary Document Images Based on Hough Transform”, International Jour-
nal of Information Technology, Volume 3, Issue 3, pages 194-200, 2006.
[183] Nandini N., Murthy K. S., Kumar G. H., “Estimation of Skew Angle in Bi-
nary Document Images Using Hough Transform”, World Academy of Science,
Engineering and Technology, Volume 42, pages 44-49, 2008.
[184] Chaudhuri B. B., Bera S., “Handwritten Text Line Identification In Indian
Scripts”, 10th Inter. Conference on Document Analysis and Recognition, pages
636-640, 2009.
200
[185] Papavassiliou V., Katsouros V., Carayannis G., “A Morphological Approach for
Text-Line Segmentation in Handwritten Documents”, International Conference
on, Frontiers in Handwriting Recognition, pages 19-24, 2010.
[186] Papavassiliou V., Stafylakis T., Katsouros V., Carayannis G., “Handwritten
Document Image Segmentation into Text Lines and Words”, Pattern Recogni-
tion, Volume 43, pages 369-377, 2010.
[187] Rashid S. F., Shafait F., Breuel T. M., “Scanning Neural Network for Text
Line Recognition”, 10th IAPR Workshop on Document Analysis Systems, Gold
Coast, Australia, pages 105-109, March 2012.
[188] Dani A. H., “Indian Paleography”, Manoharlal Publications, ISBN-10:
8121500281, 1997.
[189] http://www.indianetzone.com/7/kannada.html, last updated on: 01/01/2009.
[190] Buades A., Coll B., Morel J. M., “Nonlocal Image and Movie Denoising”, In-
ternational Journal of Computer Vision Volume 76, Issue 2, pages 123-139,
2008.
[191] Dholakia J., Negi A., Mohan S. R., “Zone Identification in the Printed Gujarati
Text”, Proceedings of the Eight International Conference on Document Analysis
and Recognition, pages 272 - 276, 2005.
[192] Amayeh G., Kasaei S., Tavakkoli A., “A Modified Algorithm to Obtain Trans-
lation, Rotation and Scale Invariant Zernike Moment Shape Descriptors”, In-
ternational Workshop on Computer Vision, Tehran, April 2004.
[193] Desai A., “Gujarati Handwritten Numeral Optical Character Reorganization
Through Neural Network”, Pattern Recognition, Volume 43, pages 2582-258,
2010.
[194] Gatos B., Kesidis A. L., Papandreou A., “Adaptive Zoning Features for Char-
acter and Word Recognition”, International Conference on Document Analysis
and Recognition, pages 1160-1164, 2011.
201
[195] Khanale R. R., Chitnis S. D., “Handwritten Devanagari Character Recognition
using Artificial Neural Network”, Journal of Artificial Intelligence, Volume 4,
Issue 1, pages 55-62, 2011.
[196] Murthy K. S., Doreswamy, Kumar G. H., Nagabhushan P, “Texture Features for
the Prediction of Period of an Epigraphical Script”, In proceedings of National
Workshop on Document Analysis and recognition, P E S College of Engineering
Mandya, India, pages 192-196, 2003.
[197] Kan C, Mandyam D. Srinath, “Invariant Character Recognition with Zernike
and Orthogonal Fourier Mellin Moments”, Pattern Recognition, Volume 35,
Issue 1, pages 143-154, 2002.
[198] Sheng Y., Shen L., “Orthogonal Fourier-Mellin Moments for Invariant Pattern
Recognition”, Journal of Optical Society of America, Volume 11, Issue 6, pages
1748-1757, 1994.
[199] Khotanzad A., Hongs Y. S., “Invariant Image Recognition by Zernike Mo-
ments”, IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol-
ume 12, Issue 5, pages 489-497, 1990.
[200] Kunte S. R., Samuel R. D. S, “Hu’s Invariant Moments and Zernike Moments
Approach for the Recognition of Basic Symbols in Printed Kannada Text”,
Sadhana, Volume 32, Issue 5, pages 521-533, 2007.
[201] Hu M. K., “Visual Pattern Recognition by Moment Invariants”, IEEE IRE
Transaction on Information Theory, Volume 8, pages 179-187, 1962.
[202] Primekumar K. P., Idiculla S. M., “On-line Malayalam Handwritten Character
Recognition Using Wavelet Transform and SFAM”, 3rd International Confer-
ence on Electronics Computer Technology, pages 49-53, 2011.
202