classification of fonts and calligraphy styles based on complex wavelet transform

Alican BozkurtPınar Duygulu ŞahinA. Enis Çetin

GRC 2013Bilkent University

OFR as a mean: Optical Character Recognition (OCR)

• As of August 2010, there are 129.864.880 books in the world1.

• Only 20 million of them have been digitized.

• Digitization ≠ Scanning– Image vs Context– Additional processing

• Optical Character Recognition

1http://booksearch.blogspot.com/2010/08/books-of-world-stand-up-and-be-counted.html

OFR as a mean: Optical Character Recognition (OCR)

• Inter-typeface variability– Vast number of typefaces

(>50000)

• OCR is like an finding needle in haystack

• Knowing the font significantly reduces the size of haystack

OFR as an end: Dead Sea Scrolls

• Digitized by Google• Currently 5 scrolls

are available• Classification of

new scripts

OFR as an end: Identifont

• Font search service• Font are expensive! ($25-$1000)• Finding cheaper alternatives:

Museo (free) Adelle ($599)

How to Recognize Fonts?

Local• Information from individual letters• Higher resolution (decision per

word/letter)• Needs OCR as preprocessing

Global• Information from blocks of words• Faster• Lower resolution (decision per

block)

Dual Tree Complex Wavelet Transform (DT-CWT)


• Why CWT?– Directional selectivity

DWT CWT

Directionally selective

90 45(?) 0(deg)

Real


• Why CWT?– Directional selectivity– Shift invariance

DWT CWT

Directionally selective

Shift invariant

Demonstration• Train images

– Printscreens– No noise– White background– ~1900x750 px image size– 168x480 px sample size– One paragraph per font

• Test image– Random image for “typewriter”– Real noise– Colored background– 1169x1142 px image size– 96x96 sample size

Demonstration• Smaller subsample size

– Different height/width ratio

• Noise• Different background• Not exact font• %96 success rate

– (125/130)– Blue: Courier New Regular– Red: Bookman Regular

Demonstration

Test image

Train image for “Courier New regular”

Train image for “Bookman regular”

Feature extraction

Step 0• Input Image

Feature extraction


Step 1

• Convert Image to binary using Otsu’s method

Feature extraction


Step 1


Step 2

• Divide the image into subsamples

Feature extraction

Subsample Level 1 Level 2 Level 3

level 1 angle 75

level 1 angle 45

level 1 angle 15

level 2 angle 75

level 2 angle 45

level 2 angle 15

level 3 angle 75

level 3 angle 45

level 3 angle 15


Step 1


Step 2


For each subsample

• 3 level DTCWT

Level 1

Level 2

Level 3

Feature Extraction

: 0,082091 0,084891 0,060045 0,080689 0,085836 0,060873

: 0,14791 0,15201 0,11201 0,14617 0,15402 0,11424

: 0,22597 0,24064 0,11976 0,23731 0,24072 0,12753

: 0,36203 0,35692 0,17401 0,37765 0,34842 0,19024

: 0,49943 0,54883 0,35954 0,55623 0,56736 0,30949

: 0,6949 0,65361 0,46078 0,72141 0,68851 0,39779

Φ = [μ1, μ2, μ3, σ1, σ2, σ3]

μ1σ1

μ2σ2

σ3μ3

(1x36 feature vector)


Step 1


Step 2


For each subsample

• 3 level DTCWT

Step 4• Mean and std

Step 5• Concatenate

Results:English Font Recognition• Dataset

– Printscreen, Small natural noise, Artificial noise, Large natural noise

– 1 paragraph per font/emphasis pair

– 8 fonts:• Arial, Bookman, Century

Gothic, Comic Sans, Courier, Computer Modern, Impact,Times New Roman

Results: English Font Recognition

• Competition

Algorithm Preprocessing? Subsampling Feature Classifier

Proposed Otsu’s method Variable Mean, std of CWT

SVM (one againist one)

Aviles-CruzText line

detection, normalization,

texture formation

100 random 64x64

subsamplesSkewness &

kurtosisEM trained

Bayes classifier

Ramanathan Normalization, Otsu’s method 3x3 grid

Mean,std, max of Gabor

responsesSVM (one against all)


Font

Low Natural Noise

Proposed Avilez-Cruz Ramanathan

A 96,88 81,75 100

B 100 87 100

CG 98,45 69,75 97,22

CS 100 75,5 100

C 100 96,25 100

I 100 99 100

M 100 97 100

T 100 91 100

Mean: 99,41625 87,15625 99,6525

A

B

CG

CS

CI

M

T

Mean:

65

85

Low Natural NoiseProposed Avilez-Cruz Ramanathan


Font

Low Natural Noise + Artifical Noise


A 95,31 78,25 97,22

B 100 83 100

CG 98,44 67,5 97,22

CS 100 73 100

C 100 91,5 97,22

I 98,44 98,5 100

M 100 91,25 100

T 98,44 79,25 97,22

Mean: 98,82875 82,78125 98,61

A

B

CG

CS

CI

M

T

Mean:

65

85

Low Natural Noise + Artificial NoiseProposed Avilez-Cruz Ramanathan


Font

High Natural Noise


A 98,44 - 91,67

B 98,44 - 88,89

CG 92,19 - 94,44

CS 100 - 97,22

C 100 - 94,44

I 100 - 94,44

M 98,44 - 88,88

T 98,44 - 100

Mean: 98,24375 - 93,7475

A

B

CG

CS

CI

M

T

Mean:

80

90

100

High Natural NoiseProposed Avilez-Cruz Ramanathan


Printscreen Low Natural Noise Low Natural Noise + artificial noise High Natural Noise

10099.4162500000001

98.8287598.24375

100

87.15625

82.7812500000001

100 99.652598.61

93.7475000000001

Recognition MeansProposed Avilez-Cruz Ramanathan

Results: Farsi Font Recognition• Dataset

– Small natural noise– 1 paragraph per font/emphasis pair– 8 fonts:

• Homa, Lotus, Mitra, Nazanin, Tahoma, Times New Roman, Titr, Traffic, Yaghut, and Zar

[a][b][c]

a: Lotus italic

b:Homa bold italicc:Times New Roman bold

Results: Farsi Font Recognition

• Competition



SVM (one againist one)

Khosravi and Kabir

Text line detection,

normalization, texture formation

4x4 grid Mean,std of Sobel-Roberts AdaBoost

Senobari and Khosravi

Yes, but not explai ned

128x128 size subsamples

PCA of Sobel, Roberts, Symlet

Wavelets MLP classifer

Results: Farsi Font Recognition

Font Proposed Khosravi Senobari

L 92,2 92,2 90,7

M 95,3 93,4 93,7

N 90,6 85,2 92

TR 98,4 97,6 95,9

Y 96,9 97,6 98,5

Z 92,2 87,4 90,9

H 100 99,2 99,8

TI 100 95,2 97

T 100 96,6 98,3

TN 98,4 97,2 98,8

Mean 96,41 94,16 95,56

L

M

N

TR

Y

ZH

TI

T

TN

Mean

60

80

100

Low Natural NoiseProposed Khosravi Senobari

Results: Arabic Font Recognition• Dataset

– ALPH-REGIM database– 749 different sized/long

samples– 10 fonts:

• Ahsa, Andalus, Arabic_transparant, Badr, Buryidah, Dammam, Hada, Kharj, Koufi, Naskh

[a][b][c][d]

a: Ahsab: Badr c: Naskhd: Dammam

Results: Arabic Font Recognition

• Competition



SVM (one againist all)

Ben Moussa No No Fractal based NN

Results: Arabic Font Recognition

Font Proposed Ben Moussa

AH 99,633 94

AN 98,1595 94

AT 99,734 92

B 99,5968 100

BU 98,2955 100

D 99,8592 100

H 90,4424 100

K 90,4037 88

KO 99,3478 98

N 98,2418 98

Mean 97,3714 96,4

AH

AN

AT

B

BU

DH

K

KO

N

Mean

80

90

100

ALPH-REGIM DatabaseProposed Ben Moussa

Results: Speed Test

Results: Ottoman Style Recognition

• Dataset– Ottoman Archives– 6 pages per style– Different

backgrounds– 5 styles:

• Divani, Nesih, Matbu, Talik, Rika

a: Divanib: Matbu

c: Nesihd: Rikae: Talik

[a][b][c][d][e]

Results: Ottoman Font Recognition

Conclusion

• New feature for font recognition:– Mean and std of 3 level CWT– Higher accuracy than states of art on English, Farsi,

Arabic fonts– Faster than state of art– Robust to noise– Performs well on Ottoman texts

References[1] Abuhaiba, I., 2004. Arabic font recognition using decision trees builtfrom common words. Journal of Computing and Information Technology13 (3), 211–224.[2] Amin, A., 1998. Off-line arabic character recognition: the state of theart. Pattern recognition 31 (5), 517–530.[3] Aviles-Cruz, C., Rangel-Kuoppa, R., Reyes-Ayala, M., Andrade-Gonzalez, A., Escarela-Perez, R., 2005. High-order statistical textureanalysis-font recognition applied. Pattern Recognition Letters 26 (2),135 – 145.[4] Ben Moussa, S., Zahour, A., Benabdelhafid, A., Alimi, A., 2008. Fractalbasedsystem for arabic/latin, printed/handwritten script identification.In: Pattern Recognition, 2008. ICPR 2008. 19th International Conferenceon. IEEE, pp. 1–4.[5] Borji, A., Hamidi, M., 2007. Support vector machine for persian fontrecognition. International Journal of Intelligent Systems and Technologies,184–187.[6] Boser, B., Guyon, I., Vapnik, V., 1992. A training algorithm for optimalmargin classifiers. In: Proceedings of the fifth annual workshop onComputational learning theory. ACM, pp. 144–152.[7] Cai, S., Li, K., Selesnick, I., ???? Matlab implementation of wavelettransforms. Tech. rep., Polytechnic University.[8] Chang, C., Lin, C., 2011. Libsvm: a library for support vector machines.28ACM Transactions on Intelligent Systems and Technology (TIST) 2 (3),27.[9] Chaudhuri, B., Garain, U., 1998. Automatic detection of italic, bold andall-capital words in document images. In: Pattern Recognition, 1998.Proceedings. Fourteenth International Conference on. Vol. 1. IEEE, pp.610–612.[10] Cortes, C., Vapnik, V., Sep. 1995. Support-vector networks. Mach.Learn. 20 (3), 273–297.[11] Duan, K., Keerthi, S., 2005. Which is the best multiclass svm method?an empirical study. Multiple Classifier Systems, 732–760.[12] Hsu, C., Chang, C., Lin, C., et al., 2003. A practical guide to supportvector classification.[13] Jung, M., Shin, Y., Srihari, S., 1999. Multifont classification using typographicalattributes. In: Document Analysis and Recognition, 1999.ICDAR’99. Proceedings of the Fifth International Conference on. IEEE,pp. 353–356.

[14] Khosravi, H., Kabir, E., 2010. Farsi font recognition based on sobelrobertsfeatures. Pattern Recognition Letters 31 (1), 75 – 82.[15] Kingsbury, N., 1997. Image processing with complex wavelets. Phil.Trans. Royal Society London A 357, 2543–2560.[16] Kingsbury, N., 1998. The dual-tree complex wavelet transform: a new ef-29ficient tool for image restoration and enhancement. In: Proc. EUSIPCO.Vol. 98. pp. 319–322.[17] Kingsbury, N., 2000. A dual-tree complex wavelet transform with improvedorthogonality and symmetry properties. In: Image Processing,2000. Proceedings. 2000 International Conference on. Vol. 2. IEEE, pp.375–378.[18] Ma, H., Doermann, D., 2003/// 2003. Gabor filter based multi-classclassifier for scanned document images. In: 7th International Conferenceon Document Analysis and Recognition (ICDAR). pp. 968 – 972.[19] Otsu, N., 1979. A threshold selection method from gray-level histograms.IEEE Transactions on Systems, Man and Cybernetics 9 (1), 62–66.[20] Petkov, N., Wieling, M., 2008. Gabor filter for image processing andcomputer vision. Tech. rep., University of Groningen.[21] Ramanathan, R., Soman, K., Thaneshwaran, L., Viknesh, V., Arunkumar,T., Yuvaraj, P., oct. 2009. A novel technique for english fontrecognition using support vector machines. In: Advances in RecentTechnologies in Communication and Computing, 2009. ARTCom ’09.International Conference on. pp. 766 –769.[22] Rashedi, E., Nezamabadi-pour, H., Saryzadi, S., 2007. Farsi font recognitionusing correlation coefficients (in farsi). In: 4th Conf. on MachineVision and Image Processing, Ferdosi Mashhad.[23] Selesnick, I., Baraniuk, R., Kingsbury, N., 2005. The dual-tree complexwavelet transform. Signal Processing Magazine, IEEE 22 (6), 123–151.30[24] Villegas-Cortez, J., Aviles-Cruz, C., 2005. Font recognition by invariantmoments of global textures. In: Proceedings of international workshopVLBV05 (very low bit-rate video-coding 2005). pp. 15–16.[25] Zhu, Y., Tan, T., Wang, Y., Oct. 2001. Font recognition based on globaltexture analysis. IEEE Trans. Pattern Anal. Mach. Intell. 23 (10), 1192–1200.[26] Zramdini, A., Ingold, R., 1998. Optical font recognition using typographicalfeatures. IEEE Transactions on Pattern Analysis and MachineIntelligence 20, 877–882.

classification of fonts and calligraphy styles based on complex wavelet transform

Technology

large natural noise

subsamples level

otsus method level

eachsubsample level

level dtcwtfor

thelevel step

exact font

artifical noise afont