classification of fonts and calligraphy styles based on complex wavelet transform
DESCRIPTION
Classification of Fonts and Calligraphy Styles based on Complex Wavelet Transform, by Alican BozkurtTRANSCRIPT
Alican BozkurtPınar Duygulu ŞahinA. Enis Çetin
GRC 2013Bilkent University
OFR as a mean: Optical Character Recognition (OCR)
• As of August 2010, there are 129.864.880 books in the world1.
• Only 20 million of them have been digitized.
• Digitization ≠ Scanning– Image vs Context– Additional processing
• Optical Character Recognition
1http://booksearch.blogspot.com/2010/08/books-of-world-stand-up-and-be-counted.html
OFR as a mean: Optical Character Recognition (OCR)
• Inter-typeface variability– Vast number of typefaces
(>50000)
• OCR is like an finding needle in haystack
• Knowing the font significantly reduces the size of haystack
OFR as an end: Dead Sea Scrolls
• Digitized by Google• Currently 5 scrolls
are available• Classification of
new scripts
OFR as an end: Identifont
• Font search service• Font are expensive! ($25-$1000)• Finding cheaper alternatives:
Museo (free) Adelle ($599)
How to Recognize Fonts?
Local• Information from individual letters• Higher resolution (decision per
word/letter)• Needs OCR as preprocessing
Global• Information from blocks of words• Faster• Lower resolution (decision per
block)
Dual Tree Complex Wavelet Transform (DT-CWT)
Dual Tree Complex Wavelet Transform (DT-CWT)
• Why CWT?– Directional selectivity
DWT CWT
Directionally selective
90 45(?) 0(deg)
Real
Dual Tree Complex Wavelet Transform (DT-CWT)
• Why CWT?– Directional selectivity– Shift invariance
DWT CWT
Directionally selective
Shift invariant
Demonstration• Train images
– Printscreens– No noise– White background– ~1900x750 px image size– 168x480 px sample size– One paragraph per font
• Test image– Random image for “typewriter”– Real noise– Colored background– 1169x1142 px image size– 96x96 sample size
Demonstration• Smaller subsample size
– Different height/width ratio
• Noise• Different background• Not exact font• %96 success rate
– (125/130)– Blue: Courier New Regular– Red: Bookman Regular
Demonstration
Test image
Train image for “Courier New regular”
Train image for “Bookman regular”
Feature extraction
Step 0• Input Image
Feature extraction
Step 0• Input Image
Step 1
• Convert Image to binary using Otsu’s method
Feature extraction
Step 0• Input Image
Step 1
• Convert Image to binary using Otsu’s method
Step 2
• Divide the image into subsamples
Feature extraction
Subsample Level 1 Level 2 Level 3
level 1 angle 75
level 1 angle 45
level 1 angle 15
level 2 angle 75
level 2 angle 45
level 2 angle 15
level 3 angle 75
level 3 angle 45
level 3 angle 15
Step 0• Input Image
Step 1
• Convert Image to binary using Otsu’s method
Step 2
• Divide the image into subsamples
For each subsample
• 3 level DTCWT
Level 1
Level 2
Level 3
Feature Extraction
: 0,082091 0,084891 0,060045 0,080689 0,085836 0,060873
: 0,14791 0,15201 0,11201 0,14617 0,15402 0,11424
: 0,22597 0,24064 0,11976 0,23731 0,24072 0,12753
: 0,36203 0,35692 0,17401 0,37765 0,34842 0,19024
: 0,49943 0,54883 0,35954 0,55623 0,56736 0,30949
: 0,6949 0,65361 0,46078 0,72141 0,68851 0,39779
Φ = [μ1, μ2, μ3, σ1, σ2, σ3]
μ1σ1
μ2σ2
σ3μ3
(1x36 feature vector)
Step 0• Input Image
Step 1
• Convert Image to binary using Otsu’s method
Step 2
• Divide the image into subsamples
For each subsample
• 3 level DTCWT
Step 4• Mean and std
Step 5• Concatenate
Results:English Font Recognition• Dataset
– Printscreen, Small natural noise, Artificial noise, Large natural noise
– 1 paragraph per font/emphasis pair
– 8 fonts:• Arial, Bookman, Century
Gothic, Comic Sans, Courier, Computer Modern, Impact,Times New Roman
Results: English Font Recognition
• Competition
Algorithm Preprocessing? Subsampling Feature Classifier
Proposed Otsu’s method Variable Mean, std of CWT
SVM (one againist one)
Aviles-CruzText line
detection, normalization,
texture formation
100 random 64x64
subsamplesSkewness &
kurtosisEM trained
Bayes classifier
Ramanathan Normalization, Otsu’s method 3x3 grid
Mean,std, max of Gabor
responsesSVM (one against all)
Results: English Font Recognition
Font
Low Natural Noise
Proposed Avilez-Cruz Ramanathan
A 96,88 81,75 100
B 100 87 100
CG 98,45 69,75 97,22
CS 100 75,5 100
C 100 96,25 100
I 100 99 100
M 100 97 100
T 100 91 100
Mean: 99,41625 87,15625 99,6525
A
B
CG
CS
CI
M
T
Mean:
65
85
Low Natural NoiseProposed Avilez-Cruz Ramanathan
Results: English Font Recognition
Font
Low Natural Noise + Artifical Noise
Proposed Avilez-Cruz Ramanathan
A 95,31 78,25 97,22
B 100 83 100
CG 98,44 67,5 97,22
CS 100 73 100
C 100 91,5 97,22
I 98,44 98,5 100
M 100 91,25 100
T 98,44 79,25 97,22
Mean: 98,82875 82,78125 98,61
A
B
CG
CS
CI
M
T
Mean:
65
85
Low Natural Noise + Artificial NoiseProposed Avilez-Cruz Ramanathan
Results: English Font Recognition
Font
High Natural Noise
Proposed Avilez-Cruz Ramanathan
A 98,44 - 91,67
B 98,44 - 88,89
CG 92,19 - 94,44
CS 100 - 97,22
C 100 - 94,44
I 100 - 94,44
M 98,44 - 88,88
T 98,44 - 100
Mean: 98,24375 - 93,7475
A
B
CG
CS
CI
M
T
Mean:
80
90
100
High Natural NoiseProposed Avilez-Cruz Ramanathan
Results: English Font Recognition
Printscreen Low Natural Noise Low Natural Noise + artificial noise High Natural Noise
10099.4162500000001
98.8287598.24375
100
87.15625
82.7812500000001
100 99.652598.61
93.7475000000001
Recognition MeansProposed Avilez-Cruz Ramanathan
Results: Farsi Font Recognition• Dataset
– Small natural noise– 1 paragraph per font/emphasis pair– 8 fonts:
• Homa, Lotus, Mitra, Nazanin, Tahoma, Times New Roman, Titr, Traffic, Yaghut, and Zar
[a][b][c]
a: Lotus italic
b:Homa bold italicc:Times New Roman bold
Results: Farsi Font Recognition
• Competition
Algorithm Preprocessing? Subsampling Feature Classifier
Proposed Otsu’s method Variable Mean, std of CWT
SVM (one againist one)
Khosravi and Kabir
Text line detection,
normalization, texture formation
4x4 grid Mean,std of Sobel-Roberts AdaBoost
Senobari and Khosravi
Yes, but not explai ned
128x128 size subsamples
PCA of Sobel, Roberts, Symlet
Wavelets MLP classifer
Results: Farsi Font Recognition
Font Proposed Khosravi Senobari
L 92,2 92,2 90,7
M 95,3 93,4 93,7
N 90,6 85,2 92
TR 98,4 97,6 95,9
Y 96,9 97,6 98,5
Z 92,2 87,4 90,9
H 100 99,2 99,8
TI 100 95,2 97
T 100 96,6 98,3
TN 98,4 97,2 98,8
Mean 96,41 94,16 95,56
L
M
N
TR
Y
ZH
TI
T
TN
Mean
60
80
100
Low Natural NoiseProposed Khosravi Senobari
Results: Arabic Font Recognition• Dataset
– ALPH-REGIM database– 749 different sized/long
samples– 10 fonts:
• Ahsa, Andalus, Arabic_transparant, Badr, Buryidah, Dammam, Hada, Kharj, Koufi, Naskh
[a][b][c][d]
a: Ahsab: Badr c: Naskhd: Dammam
Results: Arabic Font Recognition
• Competition
Algorithm Preprocessing? Subsampling Feature Classifier
Proposed Otsu’s method Variable Mean, std of CWT
SVM (one againist all)
Ben Moussa No No Fractal based NN
Results: Arabic Font Recognition
Font Proposed Ben Moussa
AH 99,633 94
AN 98,1595 94
AT 99,734 92
B 99,5968 100
BU 98,2955 100
D 99,8592 100
H 90,4424 100
K 90,4037 88
KO 99,3478 98
N 98,2418 98
Mean 97,3714 96,4
AH
AN
AT
B
BU
DH
K
KO
N
Mean
80
90
100
ALPH-REGIM DatabaseProposed Ben Moussa
Results: Speed Test
Results: Ottoman Style Recognition
• Dataset– Ottoman Archives– 6 pages per style– Different
backgrounds– 5 styles:
• Divani, Nesih, Matbu, Talik, Rika
a: Divanib: Matbu
c: Nesihd: Rikae: Talik
[a][b][c][d][e]
Results: Ottoman Font Recognition
Conclusion
• New feature for font recognition:– Mean and std of 3 level CWT– Higher accuracy than states of art on English, Farsi,
Arabic fonts– Faster than state of art– Robust to noise– Performs well on Ottoman texts
References[1] Abuhaiba, I., 2004. Arabic font recognition using decision trees builtfrom common words. Journal of Computing and Information Technology13 (3), 211–224.[2] Amin, A., 1998. Off-line arabic character recognition: the state of theart. Pattern recognition 31 (5), 517–530.[3] Aviles-Cruz, C., Rangel-Kuoppa, R., Reyes-Ayala, M., Andrade-Gonzalez, A., Escarela-Perez, R., 2005. High-order statistical textureanalysis-font recognition applied. Pattern Recognition Letters 26 (2),135 – 145.[4] Ben Moussa, S., Zahour, A., Benabdelhafid, A., Alimi, A., 2008. Fractalbasedsystem for arabic/latin, printed/handwritten script identification.In: Pattern Recognition, 2008. ICPR 2008. 19th International Conferenceon. IEEE, pp. 1–4.[5] Borji, A., Hamidi, M., 2007. Support vector machine for persian fontrecognition. International Journal of Intelligent Systems and Technologies,184–187.[6] Boser, B., Guyon, I., Vapnik, V., 1992. A training algorithm for optimalmargin classifiers. In: Proceedings of the fifth annual workshop onComputational learning theory. ACM, pp. 144–152.[7] Cai, S., Li, K., Selesnick, I., ???? Matlab implementation of wavelettransforms. Tech. rep., Polytechnic University.[8] Chang, C., Lin, C., 2011. Libsvm: a library for support vector machines.28ACM Transactions on Intelligent Systems and Technology (TIST) 2 (3),27.[9] Chaudhuri, B., Garain, U., 1998. Automatic detection of italic, bold andall-capital words in document images. In: Pattern Recognition, 1998.Proceedings. Fourteenth International Conference on. Vol. 1. IEEE, pp.610–612.[10] Cortes, C., Vapnik, V., Sep. 1995. Support-vector networks. Mach.Learn. 20 (3), 273–297.[11] Duan, K., Keerthi, S., 2005. Which is the best multiclass svm method?an empirical study. Multiple Classifier Systems, 732–760.[12] Hsu, C., Chang, C., Lin, C., et al., 2003. A practical guide to supportvector classification.[13] Jung, M., Shin, Y., Srihari, S., 1999. Multifont classification using typographicalattributes. In: Document Analysis and Recognition, 1999.ICDAR’99. Proceedings of the Fifth International Conference on. IEEE,pp. 353–356.
[14] Khosravi, H., Kabir, E., 2010. Farsi font recognition based on sobelrobertsfeatures. Pattern Recognition Letters 31 (1), 75 – 82.[15] Kingsbury, N., 1997. Image processing with complex wavelets. Phil.Trans. Royal Society London A 357, 2543–2560.[16] Kingsbury, N., 1998. The dual-tree complex wavelet transform: a new ef-29ficient tool for image restoration and enhancement. In: Proc. EUSIPCO.Vol. 98. pp. 319–322.[17] Kingsbury, N., 2000. A dual-tree complex wavelet transform with improvedorthogonality and symmetry properties. In: Image Processing,2000. Proceedings. 2000 International Conference on. Vol. 2. IEEE, pp.375–378.[18] Ma, H., Doermann, D., 2003/// 2003. Gabor filter based multi-classclassifier for scanned document images. In: 7th International Conferenceon Document Analysis and Recognition (ICDAR). pp. 968 – 972.[19] Otsu, N., 1979. A threshold selection method from gray-level histograms.IEEE Transactions on Systems, Man and Cybernetics 9 (1), 62–66.[20] Petkov, N., Wieling, M., 2008. Gabor filter for image processing andcomputer vision. Tech. rep., University of Groningen.[21] Ramanathan, R., Soman, K., Thaneshwaran, L., Viknesh, V., Arunkumar,T., Yuvaraj, P., oct. 2009. A novel technique for english fontrecognition using support vector machines. In: Advances in RecentTechnologies in Communication and Computing, 2009. ARTCom ’09.International Conference on. pp. 766 –769.[22] Rashedi, E., Nezamabadi-pour, H., Saryzadi, S., 2007. Farsi font recognitionusing correlation coefficients (in farsi). In: 4th Conf. on MachineVision and Image Processing, Ferdosi Mashhad.[23] Selesnick, I., Baraniuk, R., Kingsbury, N., 2005. The dual-tree complexwavelet transform. Signal Processing Magazine, IEEE 22 (6), 123–151.30[24] Villegas-Cortez, J., Aviles-Cruz, C., 2005. Font recognition by invariantmoments of global textures. In: Proceedings of international workshopVLBV05 (very low bit-rate video-coding 2005). pp. 15–16.[25] Zhu, Y., Tan, T., Wang, Y., Oct. 2001. Font recognition based on globaltexture analysis. IEEE Trans. Pattern Anal. Mach. Intell. 23 (10), 1192–1200.[26] Zramdini, A., Ingold, R., 1998. Optical font recognition using typographicalfeatures. IEEE Transactions on Pattern Analysis and MachineIntelligence 20, 877–882.