[ieee 2008 the eighth iapr international workshop on document analysis systems (das) - nara, japan...

Shape Code Based Lexicon Reduction for Offline Handwritten Word Recognition

Roman Bertolami, Christoph Gutmann, Horst BunkeInstitute of Computer Science and Applied Mathematics

University of Bern, Neubruckstrasse 10, CH-3012 Bern, Switzerland{bertolam,gutmann,bunke}@iam.unibe.ch

A. Lawrence SpitzDocRec Ltd, 34 Strathaven Place

Atawhai, Nelson 7010, New [email protected]

Abstract

A novel method to reduce the lexicon size in handwrittenword recognition is proposed in this paper. Due to large lex-ica, the computational cost of current word recognisers isoften too high for practical applications. The proposed lex-icon reduction method is based on character shape codes.We examine four different shape code mappings based onmachine printed character font, on handprint, and on cur-sive handwriting. Experimental evaluation shows that theproposed method can reduce the computational effort whilekeeping the recognition rate high.

1. Introduction

The automatic recognition of handwriting has been un-der research for many years [1, 8]. Whereas the recognitionof isolated characters and digits is already quite mature, off-line handwritten word recognition is still a very challengingproblem. The main reasons for the rather low performanceof current systems are the differences in individual hand-writing styles and the rather large size of the underlyinglexicon.

The size of the lexicon is a key aspect of a word recogni-tion system, not only in terms of accuracy but also in termsof computational cost [5]. If the lexicon size increases,more words - some of which may be similar - must be dis-tinguished by the recogniser and the accuracy usually de-creases. Also, because the input image must be matched tomore word models, the computational cost increases.

A common strategy to reduce the computational costis to apply a lexicon reduction strategy before the actualrecognition is conducted. Lexicon reduction is typicallybased either on sources of knowledge about the underly-

ing application or on the optical shape of the words to berecognised. The application environment sometimes pro-vides important contextual knowledge that can dramaticallyreduce the lexicon. An example is bank cheque process-ing [2, 4], where the legal amount can be split into sub-words that come from a rather small lexicon of only a fewtens of words. Additionally, the courtesy amount can beused to further constrain the lexicon. Another example isaddress reading [13], where the ZIP code allows the re-duction of the lexicon of thousands of city names to a fewwords. The second common strategy of lexicon reductionis based on the optical shape of the input image. For exam-ple, the length of a word is a simple criterion for lexiconreduction. Long words can easily be distinguished fromshort words [9]. More sophisticated strategies take the en-tire word shape into account. Examples of this approachinclude [6, 16].

The novel lexicon reduction method proposed in this pa-per follows the second strategy, i.e., it is based on the opticalshape of the input image. It uses character shape codes [11]to determine which words of the lexicon are possible match-ing candidates. Even though the complexity of the proposedlexicon reduction method is rather high, we expect that theaccuracy of the shape code approach is high, too, such thatsome speedup at an only low recognition rate decrease re-sults.

The remaining part of the paper is organised as follows.Section 2 introduces the recognition system, the shape codeapproach, and the lexicon reduction procedure. Experimen-tal evaluation is described in Sect. 3 and conclusions aredrawn in the last section of the paper.

The Eighth IAPR Workshop on Document Analysis Systems

978-0-7695-3337-7/08 $25.00 © 2008 IEEE

DOI 10.1109/DAS.2008.18

158


978-0-7695-3337-7/08 $25.00 © 2008 IEEE

DOI 10.1109/DAS.2008.18

158


978-0-7695-3337-7/08 $25.00 © 2008 IEEE

DOI 10.1109/DAS.2008.18

158


978-0-7695-3337-7/08 $25.00 © 2008 IEEE

DOI 10.1109/DAS.2008.18

158


978-0-7695-3337-7/08 $25.00 © 2008 IEEE

DOI 10.1109/DAS.2008.18

158

Figure 1. Examples of image normalisa-tion. On the left appear the original imageswhereas the normalised versions are shownon the right hand side.

2. Methodology

In this section, we will describe the shape code basedlexicon reduction. First, the underlying recognition systemincluding preprocessing and feature extraction is described.Then we introduce the basic principles of character shapecodes, before we finally combine these two methodologiesto obtain a faster recognition system by lexicon reduction.

2.1. HMM based recognition system

In this paper we consider the task of recognising offlineisolated cursively handwritten words. The system we use issimilar to the recognition system described in [3]. Basically,the same recognition system is used to recognise words andcharacter shape code sequences. The system can be dividedinto pre-processing and feature extraction, on the one hand,and hidden Markov model (HMM) based recognition, onthe other hand. In this section, a brief description of thissystem is given.

To reduce the impact of different writing styles, a hand-writing image is normalised with respect to skew, slant,baseline position, and average character width in the pre-processing phase. During skew correction, the image is ro-tated such that the line on which the word is written be-comes horizontal. The slant correction brings the handwrit-ing into upright position using a shearing transformation.The baseline positioning scales the three main areas of thehandwriting (i.e. the ascender part, the middle part, andthe descender part) to predefined height. The average let-ter width is estimated and normalised to a predefined valueby applying a horizontal scaling transformation. Some ex-amples of this normalisation process appear in Fig. 1 wherethe original word image is shown on the left and the corre-

Truth is what standsthe test of experienceFigure 2. Example of baseline (solid) and x-line (dashed) which separate a text line intothree zones, called ascender, basis, and de-scender.

sponding normalised word is on the right side.After these normalisation steps, the image is converted

into a sequence of feature vectors. For this purpose a slid-ing window is used. The window has a width of one pixeland is moved from left to right over the image, one pixelat each step. At each position of the window nine geomet-rical features are extracted. The first three features containthe number of foreground pixels in the window and the firstand the second order moment of the foreground pixels. Fea-tures four to seven represent the position of the upper andthe lower contour and its first order derivative. The last twofeatures contain the number of vertical black-white transi-tions and the pixel density between the upper and the lowercontour.

For the recognition we apply a HMM based technique.Each character is modelled with a separate HMM. For allHMMs a linear topology is used, i.e. each state of a HMMhas only two transitions, one to itself and one to the nextstate. Because the characters differ in length, the number ofstates is chosen individually for each character as proposedin [15]. In the shape code recogniser, the average numberof states of the corresponding characters is used. TwelveGaussian mixture components model the output distributionin each state. Based on the lexicon, word models are builtby concatenating character models. The Baum-Welch algo-rithm [10] is used for the training of the HMMs, and therecognition is performed by the Viterbi algorithm [14].

2.2. Character shape codes

Character shape codes have been developed in the fieldof machine printed text recognition [11]. They are used toclassify characters based on their optical form into a fewclasses, called shape codes. Each character is assigned toexactly one shape code. For this purpose, we divide a textline horizontally into three writing zones, i.e. ascender, ba-sis, and descender, by considering two horizontal lines, thebaseline and the x-line (see Fig. 2 for an illustration). Abovethe x-line is the ascender part. The basis is between baselineand x-line, whereas the descender part is below the baseline.

Based on these three writing zones we define the follow-ing shape code classes:

159159159159159

Figure 3. Examples of writing a capital ’G’.In the upper line all three writing zones arecovered and in the second example only ba-sis and ascender are used to write the samecharacter.

A : basis and ascenderx : basise : basis and eastward concavityf : basis, ascender, and descenderg : basis and descenderi : basis, ascender, and two segmentsj : basis, ascender, descender, and two segments

Note that, whereas these shape code classes are ratherclearly defined for machine printed characters [12], they areno longer unique for handwritten characters, due to differentindividual handwriting styles. An example of an ambiguityis shown in Fig. 3 where one person writes a capital ’G’ overthree regions while another writes it in only two regions.

In this paper we consider four different shape code map-pings, listed in Tab. 1. The first two mappings, 5C and 6C,originate from [11] and are based on machine printed char-acters. Mapping 6C includes a separate class for characters’e’ and ’c’, which are written in the basis zone and have alarge eastward concavity. The third mapping Handprint isbased on block letter style handwriting. Because i-dots arerarely written exactly vertical, shape codes i and j are notused. The same is true for the last mapping which we callCursive. The difference to Handprint is that some uppercase characters, i.e. ’G’, ’J’, ’Q’, ’Y’, and ’Z’, are mappedto shape code f instead of A because in cursive handwrit-ing they often additionally cover the descender region. Thecomplete alphabets that have been used to decide to whichshape code a character is mapped appears in Fig. 4.

2.3. Lexicon reduction

We now combine the HMM based recogniser with theidea of character shape codes to build a two-step recogni-tion system. In the first step, a shape code sequence recog-niser is used to reduce the lexicon size. In the second step,a traditional word recogniser selects the final result fromthe reduced lexicon. An overview of the proposed system

SC 5C 6C Handprint CursiveA A-Z A-Z A-Z A-FHIK-P

bdfhklt bdfhklt bdhiklt R-Xbdhikltx acemno amnors acemnor acemno

rsuvwxz uvwxz suvwxz rsuvwxe - ce - -f - - fj GJQYZfjg gpqy gpqy gpqy gpqyzi i i - -j j j - -

Table 1. Shape code (SC) mappings.

Shape Code

LexiconShape Code

Recogniser

Word

Recogniser"demand"

n-best Shape

Code Sequences

Word Lexicon

Reduced Word

Lexicon

Shape Code

Decoder

Figure 5. System overview.

appears in Fig. 5. The goal is to reduce the computationalcomplexity and to keep the recognition accuracy as high aspossible.

First, we build the lexicon for the shape code recog-niser. All words in the original word lexicon are trans-formed based on the shape code mappings of Tab. 1. Forexample the mapping for the word ’demand’ is ’AxxxxA’with 5C, Handprint, and Cursive, while it is ’AexxxA’ withmapping 6C. Because multiple words will be mapped to thesame shape code sequence we expect the shape code lexiconto be substantially smaller than the original word lexicon1.

The shape code recogniser that decodes the input imageis built in the same way as the original full word recog-niser described in Sect. 2.1. The only difference is that ituses a smaller set of characters and a smaller lexicon. Theoutput is an n-best list of the n best scoring shape code se-quences. For example, for the word ’demand’ this could be’AxxxxA’, ’AxxxgA’, etc.

Each shape code sequence of this n-best list is then

1This effect is negligible for very small lexica, but becomes more im-portant for large or very large lexicons.

160160160160160

Machine printed text

Handprint

a b c d e f g h i j k l m no p q r s t u v w x y zA B C D E F G H I J K L MN O P Q R S T U V W X Y Z

Cursive handwriting

Machine printed text

Handprint

a b c d e f g h i j k l m no p q r s t u v w x y zA B C D E F G H I J K L MN O P Q R S T U V W X Y Z

Cursive handwriting

Figure 4. Alphabets based on which the shape code mappings are performed.

mapped back to the original lexicon. In other words, a newtemporary lexicon is compiled by selecting those words inthe original lexicon that have been mapped to one of then-best shape code sequences. The result is a reduced lex-icon containing promising hypotheses for the final result.Finally, the word recogniser decodes the input image andmatches it to all words in the reduced lexicon to determinethe recognition result.

Note that if the correct word label is not in the reducedlexicon, the word recogniser will not be able to classify theinput sample correctly. We can control the reduction factorby n, i.e. the length of the n-best list. Clearly, the smallern is, the higher will be the speed up achieved by meansof lexicon reduction. However, with a smaller n, the risk ofmisclassification becomes larger because the correct lexiconword may not correspond to the n best shape code words.

3. Experimental evaluation

All experiments reported in this section make use of theoffline handwritten text line recognition system describedin Sect. 2. The handwriting image data originate from theIAM database2 [7].

A writer independent setup is considered, which impliesthat no writer in the training data has contributed any wordsto the test data. The training set contains 17,028 wordswritten by 113 writers, while 109 persons have contributed1,892 words to the test set. The underlying lexicon contains3,999 words and is given by the union of the words in thetraining and test set.

The reference system does not include any lexicon re-duction and attains a correct recognition rate of 81.71%. Toreduce the lexicon, we examine each of the four shape code

2The IAM database is publicly available for download athttp://www.iam.unibe.ch/∼fki/iamDB

Lexicon size Reduction5C 1,995 50.1%6C 2,734 31.6%

Handprint 1,819 54.5%Cursive 1,860 53.5%

Table 2. Sizes of the different shape code lex-ica. The original word lexicon contains 3,999words.

mappings listed in Tab. 1. For each mapping we build ashape code lexicon based on the mapping and the originalword lexicon. The sizes of these shape code lexica appearin Tab. 2. As expected, the largest shape code lexicon isbuilt with the 6C mapping because it contains six charactershape codes whereas the other mapping use only five or fourshape code characters.

The coverage, i.e. the relative number of times the cor-rect word class is present in the reduced lexicon, is shownin Fig. 6. The parameter n is varied to get the coveragecurve. The best coverage is obtained by the 6C mapping.However, it is worth noting that the lexicon reduction withthe 6C mapping has a higher computational cost because itsshape code lexicon is larger.

The results on the test set using different values of n,i.e. the size of the n-best list provided by the shape codesequence recogniser (see Sect. 2.3 for details), appear inFig. 7. The Significance line indicates the recognition ratea system must achieve not to be statistically significantlylower than the reference system. The statistical significanceis calculated with a t-test at a significance level of 0.05.By selecting n ≥ 50 every shape code mapping achievesa recognition rate that is not significantly lower than thatof the reference system. The mapping 6C performs mostlybest at a given n. However, the value of n does not reflect

161161161161161

40

50

60

70

80

90

100

0 200 400 600 800 1000 1200

Cov

erag

e

Average Reduced Lexicon Size

5C6C

HandprintCursive

Figure 6. Coverage and size of the reducedlexica.

the actual computational complexity of the entire system,because, as shown in Tab. 2, the shape code lexicon sizes ofthe different mappings differ.

Therefore, we compute the trade-off between recogni-tion rate and computational cost for the different shape codemappings as shown in Fig. 8. The computational cost isgiven in terms of the average number of word and shapecode models that have to be matched to the input image.The number of matches to the shape code models is con-stant for each mapping method (i.e. the shape code lexiconsize shown in Tab. 2). To obtain the trade-off curve we varythe size n of the n-best list. The reference system requires3,999 matches (i.e. the size of the original lexicon). Thelargest reduction is achieved with the Handprint mapping.The computational cost is 1,826 matches which is about 2.2times faster than the reference system. However, the recog-nition rate drops dramatically to 43%. The mapping 6Chas the largest shape code lexicon and thus has the high-est computation cost for reducing the lexicon. Consideringagain the Handprint mapping we achieve a recognition rateof 78.3%, which is not statistically lower than the referencesystem, at a relative reduction of the computational cost of52%.

4. Conclusions

We have proposed an offlline handwritten word recog-nition system that implements a novel lexicon reductionmethod based on character shape codes.

The recognition system is based on hidden Markov mod-els. After image normalisation, nine geometric featuresare extracted. Recognition is accomplished in two steps.First, a shape code sequence recogniser reduces the lexi-

40 45 50 55 60 65 70 75 80 85

0 50 100 150 200

Rec

ogni

tion

Rat

e

n

5C6C

HandprintCursive

ReferenceSignificance

Figure 7. Recognition performance depend-ing on different values of n.

con. Then, a word recogniser performs the recognition us-ing the reduced lexicon. For the shape code sequence recog-niser we examined four different shape code mappings, twobased on machine printed character shapes, one based onhandprint handwriting style, and another one based on cur-sive handwriting.

Experimental evaluation shows that the proposed methodcan substantially reduce the computational cost without sig-nificantly loosing recognition performance. The best per-forming shape code mapping is based on handprint andachieves a relative reduction of the computational cost of52% without significantly loosing performance.

Future work will include the investigation of other shapecode mappings and the improvements of the shape coderecogniser, for example, by optimising the number of Gaus-sian mixture models, or using more specific features. Fur-thermore, we could consider the extension of this approachfrom isolated word to handwritten text line recognition.

Acknowledgements

This research was supported by the Swiss National Sci-ence Foundation (Nr. 20-52087.97).

References

[1] H. Bunke. Recognition of cursive Roman handwriting - past,present and future. In Proc. 7th International Conference onDocument Analysis and Recognition, Edinburgh, Scotland,pages 448–459, 2003.

[2] D. Guillevic and C. Suen. Cursive script recognition appliedto the processing of bank cheques. In Proc. 3rd Interna-tional Conference on Document Analysis and Recognition,Montreal, Canada, pages 11–14, 1995.

162162162162162

40 45 50 55 60 65 70 75 80 85

1500 2000 2500 3000 3500 4000 4500

Rec

ogni

tion

Rat

e

Computational Cost

5C6C

HandprintCursive

ReferenceSignificance

Figure 8. Trade-off between computationalcost and recognition performance.

[3] S. Gunter and H. Bunke. HMM-based handwritten wordrecognition: on the optimization of the number of states,training iterations and Gaussian components. PatternRecognition, 37:2069–2079, 2004.

[4] S. Impedovo, P. Wang, and H. Bunke, editors. AutomaticBankcheck Processing. World Scientific, Singapore, 1997.

[5] A. L. Koerich, R. Sabourin, and C. Y. Suen. Large vocab-ulary off-line handwriting recognition: a survey. PatternAnalysis and Applications, 6(2):97–121, 2003.

[6] S. Madhvanath, V. Krpasundar, and V. Govindaraju. Syntac-tic methodology of pruning large lexicons in cursive scriptrecognition. Pattern Recognition, 34(1):37–46, 2001.

[7] U.-V. Marti and H. Bunke. The IAM-database: an Englishsentence database for offline handwriting recognition. In-ternational Journal on Document Analysis and Recognition,5:39 – 46, 2002.

[8] R. Plamondon and S. Srihari. Online and off-line handwrit-ing recognition: a comprehensive survey. IEEE Transac-tions on Pattern Analysis and Machine Intelligence, 22:63–84, 2000.

[9] R. Powalka, N. Sherkat, and R. Whitrow. Word shape anal-ysis for a hybrid recognition system. Pattern Recognition,30(3):421–445, 1997.

[10] L. Rabiner. A tutorial on hidden Markov models and se-lected application in speech recognition. Proc. of the IEEE,77(2):257–286, 1989.

[11] A. L. Spitz. Moby Dick meets GEOCR: Lexical consider-ations in word recognition. In Proc. 4th International Con-ference on Document Analysis and Recognition, Ulm, Ger-many, pages 221–226, 1997.

[12] A. L. Spitz and J. P. Marks. Measuring the robustness ofcharacter shape coding. In DAS ’98: Selected Papers fromthe Third IAPR Workshop on Document Analysis Systems,Nakano and Lee ed., LNCS 1655, pages 1–12. Springer,1998.

[13] S. N. Srihari. Handwritten address interpretation: a task ofmany pattern recognition problems. International Journal of

Pattern Recognition and Artificial Intelligence, 14:663–674,2000.

[14] A. Viterbi. Error bounds for convolutional codes and anasimptotically optimal decoding algorithm. IEEE Transac-tions on Information Theory, 13(2):260–269, 1967.

[15] M. Zimmermann and H. Bunke. Hidden Markov modellength optimization for handwriting recognition systems. InProc. 8th International Workshop on Frontiers in Handwrit-ing Recognition, Niagara-on-the-Lake, Canada, pages 369–374, 2002.

[16] M. Zimmermann and J. Mao. Lexicon reduction using keycharacters in cursive handwritten words. Pattern Recogni-tion Letters, 20:1297–1304, 1999.

163163163163163

[ieee 2008 the eighth iapr international workshop on document analysis systems (das) - nara, japan...

Documents