a spatially adaptive statistical method for the binarization of historical manuscripts and degraded...

13
A spatially adaptive statistical method for the binarization of historical manuscripts and degraded document images Rachid Hedjam , Reza Farrahi Moghaddam, Mohamed Cheriet Synchromedia Laboratory for Multimedia Communication in Telepresence, E ´ cole de technologie supe´rieure, Montre ´al, QC, Canada H3C 1K3 article info Article history: Received 15 March 2010 Received in revised form 16 December 2010 Accepted 18 February 2011 Available online 24 February 2011 Keywords: Historical and degraded documents Document images binarization Adaptive local document image classification abstract In this paper, we present an adaptive method for the binarization of historical manuscripts and degraded document images. The proposed approach is based on maximum likelihood (ML) classifica- tion and uses a priori information and the spatial relationship on the image domain. In contrast with many conventional methods that use a decision based on thresholding, the proposed method performs a soft decision based on a probabilistic model. The main idea is that, from an initialization map (under- binarization) containing only the darkest part of the text, the method is able to recover the main text in the document image, including low-intensity and weak strokes. To do so, fast and robust local estimation of text and background features is obtained using grid-based modeling and inpainting techniques; then, the ML classification is performed to classify pixels into black and white classes. The advantage of the proposed method is that it preserves weak connections and provides smooth and continuous strokes, thanks to its correlation-based nature. Performance is evaluated both subjectively and objectively against standard databases. The proposed method outperforms the state-of-the-art methods presented in the DIBCO’09 binarization contest, although those other methods provide performance close to it. & 2011 Elsevier Ltd. All rights reserved. 1. Introduction Digital archiving of ancient and historical documents, some examples of which are shown in Fig. 1, is an expanding trend in heritage study and preservation [2,35,29,24]. It requires that the archived images be enhanced and restored, regardless of the quality of the real acquired images. The enhancement techniques are usually based on the separation of text and background using various binarization algorithms. However, there are many chal- lenges facing the binarization algorithms. Because of new needs in terms of accurate searching and ‘‘reading’’ through huge volumes of priceless historical documents which have been neglected for a long time, mainly because of the enormous labor associated with studying them in the traditional manual ways [35], binarization methods require that new criteria be met, such as topology preservation which is called for in word spotting [12,33,40] especially in skeleton-based techniques. For historical manuscripts, the performance of OCR techniques is still low. Therefore, word spotting can be used for key word indexing. The degradation on historical document images is, in general, physical, and can take different forms, such as the fading of ink, the presence of interfering patterns (ink bleed-through, show- through, etc.), and denotation of the cellulose structure, among others. Therefore, before any process is carried out, like feature extraction, suitable preprocessing, such as enhancement or bleed- through removal, is essential in order to correct the degradation. Continuity of the strokes and smoothness of the edges are two main factors of the binarization process on which the skeletoni- zation step strongly depends. In this work, a maximum likelihood model is introduced to extract text from document images. The model is locally adaptive to the input image, and uses a sub-binarization map to estimate the parameters. This enables the method to recover the missing parts of text that have been lost in the sub-binarization map (see Fig. 2(b)). In other words, the proposed method adapts to different locations of the document image by considering spatial relations on the input image. For this purpose, the model para- meters are interpolated and corrected across the image domain. The details of the method are presented in the subsequent sections. The structure of the paper is as follows. In Section 2, related work on the binarization of document images is discussed. In Section 3, the problem statement is set out. The details of the model are given in Sections 4 and 5. In Section 6, the Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/pr Pattern Recognition 0031-3203/$ - see front matter & 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2011.02.021 Corresponding author. Tel.: + 1 514 396 8972; fax: + 1 514 396 8595. E-mail addresses: [email protected] (R. Hedjam), [email protected] (R. Farrahi Moghaddam), [email protected] (M. Cheriet). Pattern Recognition 44 (2011) 2184–2196

Upload: mcontrerasseitz3193

Post on 28-Jul-2015

92 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Spatially Adaptive Statistical Method for the Binarization of Historical Manuscripts and Degraded Document Images

Pattern Recognition 44 (2011) 2184–2196

Contents lists available at ScienceDirect

Pattern Recognition

0031-32

doi:10.1

� Corr

E-m

reza.far

moham

journal homepage: www.elsevier.com/locate/pr

A spatially adaptive statistical method for the binarization of historicalmanuscripts and degraded document images

Rachid Hedjam �, Reza Farrahi Moghaddam, Mohamed Cheriet

Synchromedia Laboratory for Multimedia Communication in Telepresence, Ecole de technologie superieure, Montreal, QC, Canada H3C 1K3

a r t i c l e i n f o

Article history:

Received 15 March 2010

Received in revised form

16 December 2010

Accepted 18 February 2011Available online 24 February 2011

Keywords:

Historical and degraded documents

Document images binarization

Adaptive local document image

classification

03/$ - see front matter & 2011 Elsevier Ltd. A

016/j.patcog.2011.02.021

esponding author. Tel.: +1 514 396 8972; fax

ail addresses: [email protected]

[email protected] (R. Farrahi Moghaddam

[email protected] (M. Cheriet).

a b s t r a c t

In this paper, we present an adaptive method for the binarization of historical manuscripts and

degraded document images. The proposed approach is based on maximum likelihood (ML) classifica-

tion and uses a priori information and the spatial relationship on the image domain. In contrast with

many conventional methods that use a decision based on thresholding, the proposed method performs

a soft decision based on a probabilistic model. The main idea is that, from an initialization map (under-

binarization) containing only the darkest part of the text, the method is able to recover the main text in

the document image, including low-intensity and weak strokes. To do so, fast and robust local

estimation of text and background features is obtained using grid-based modeling and inpainting

techniques; then, the ML classification is performed to classify pixels into black and white classes. The

advantage of the proposed method is that it preserves weak connections and provides smooth and

continuous strokes, thanks to its correlation-based nature. Performance is evaluated both subjectively

and objectively against standard databases. The proposed method outperforms the state-of-the-art

methods presented in the DIBCO’09 binarization contest, although those other methods provide

performance close to it.

& 2011 Elsevier Ltd. All rights reserved.

1. Introduction

Digital archiving of ancient and historical documents, someexamples of which are shown in Fig. 1, is an expanding trend inheritage study and preservation [2,35,29,24]. It requires that thearchived images be enhanced and restored, regardless of thequality of the real acquired images. The enhancement techniquesare usually based on the separation of text and background usingvarious binarization algorithms. However, there are many chal-lenges facing the binarization algorithms. Because of new needsin terms of accurate searching and ‘‘reading’’ through hugevolumes of priceless historical documents which have beenneglected for a long time, mainly because of the enormous laborassociated with studying them in the traditional manualways [35], binarization methods require that new criteria bemet, such as topology preservation which is called for in wordspotting [12,33,40] especially in skeleton-based techniques. Forhistorical manuscripts, the performance of OCR techniques is stilllow. Therefore, word spotting can be used for key word indexing.

ll rights reserved.

: +1 514 396 8595.

a (R. Hedjam),

),

The degradation on historical document images is, in general,physical, and can take different forms, such as the fading of ink,the presence of interfering patterns (ink bleed-through, show-through, etc.), and denotation of the cellulose structure, amongothers. Therefore, before any process is carried out, like featureextraction, suitable preprocessing, such as enhancement or bleed-through removal, is essential in order to correct the degradation.Continuity of the strokes and smoothness of the edges are twomain factors of the binarization process on which the skeletoni-zation step strongly depends.

In this work, a maximum likelihood model is introduced toextract text from document images. The model is locally adaptiveto the input image, and uses a sub-binarization map to estimatethe parameters. This enables the method to recover the missingparts of text that have been lost in the sub-binarization map(see Fig. 2(b)). In other words, the proposed method adapts todifferent locations of the document image by considering spatialrelations on the input image. For this purpose, the model para-meters are interpolated and corrected across the image domain.The details of the method are presented in the subsequentsections.

The structure of the paper is as follows. In Section 2, relatedwork on the binarization of document images is discussed.In Section 3, the problem statement is set out. The details ofthe model are given in Sections 4 and 5. In Section 6, the

Page 2: A Spatially Adaptive Statistical Method for the Binarization of Historical Manuscripts and Degraded Document Images

R. Hedjam et al. / Pattern Recognition 44 (2011) 2184–2196 2185

experimental results and evaluations are presented. Finally,in Section 7, we provide our conclusions and prospects for futureextensions.

2. Related work

Many methods, such as global thresholding [34], local thresh-olding [36,3,18,15], the statistical approaches [44,45], theentropy-based method [7], feature-based methods, such as theedge-based methods [43], and multi-level classifiers [14], havebeen used for the enhancement and binarization of documentimages. Because of the presence of gray-level degradation overthe image (i.e. shadows, non-uniform illumination, defects insome areas of the document), local thresholding methods arerequired to adapt to the degradation and text changes. However,these methods are ineffective when the extent of the degradationis less than the document average line height, the latter being theaverage distance between baselines [15]. Therefore, degradationand changes in intensity, which are very local and confined toa small part of the strokes, cannot be captured by traditional

Fig. 1. A sample set of some historical document images: (a) the original image [1],

which suffers from low-contrast degradation, (b) the original image [17], which

suffers from a degraded background, (c) the original image [9], which suffers from

bleed-through degradation, and (d) the original image [17], which suffers from

bleed-through and show-through degradation.

Fig. 2. An example of rough initialization: (a) original image, (b) initialization map (Sa

truth map.

local and adaptive methods. They usually appear as holes ordiscontinuities on the final output. Therefore, to achieve asuitable binarization, a robust method that is capable of preser-ving very weak connections and strokes is needed.

One of the successful approaches to this problem is Bayesian–Markovian classification [45,20]. Some of the works associatedwith these methods are discussed in Section 2. In Bayesian–Markovian classification, the Markov model controls the class of apixel based on its spatial relationship to its neighbors. Although, thisapproach is very effective and intuitive, its performance in thecase of highly degraded document images is limited. Even withan accurate initialization (which can be a rough binarization map),the final factor that determines the output is the complexity ofthe Markov model which is lower than the complexity of thedegradation. Therefore, the Markov-based methods may not, forexample, be able to preserve the small holes and loops whilecorrecting the degraded text. In order to maximize the use ofthe a priori information and to avoid the complexity of the Markovmodel (as in the case of a posteriori parameter estimation),the spatial relationship is transferred to the proposed model viaan initial binary map, including certain a priori information suchas the average stroke width and the average line height. Thisinformation is useful for estimating the local intensity averageand variance, which are at the core of the clustering problem. Infact, our model uses a simple maximum likelihood (ML) approachbased on an assumption of Gaussian models for the distribution ofdata pixels in each class. The ML, compared to other classificationmethods like neural networks [30] and support vector machines[25], remains a popular classification tool. However, it requires alarge on sample set with a normal distribution [10,22]. In addition, amajor consideration in applying the ML rule is its approximationof the normal distribution and sample variances that can be usedto generate confidence bounds and hypothesis tests for theparameters.

Recently, an international contest on binarization methods(DIBCO’09 [17]) was held at the ICDAR’09 conference. Thedatabase used in the contest is available, and we use it here asa reference to compare our method with other methods. Amongthe binarization methods, Sauvola’s is a standard documentimage binarization method. Its threshold value is locally adaptive,and is calculated based on a local gray level mean and standarddeviation. The method has the advantage of being able to detectbackground regions and prevent noise pixels from appearing onthe output. Although the original method is time-consuming,various computationally low cost implementations, such as anintegral image method [38] and a grid-based method [15], areavailable. An example of local and adaptive thresholding, which isbased on the detection of edges and using information aroundboundaries, is presented in [5]. Recently, in [19], a new adaptiveapproach has been proposed based on the combination of several

uvola binarization), the arrows showing the lost parts of the text, and (c) ground

Page 3: A Spatially Adaptive Statistical Method for the Binarization of Historical Manuscripts and Degraded Document Images

R. Hedjam et al. / Pattern Recognition 44 (2011) 2184–21962186

state-of-the-art binarization methods, which considers the edgeinformation as well.

Bayesian methods have also been used for document imageenhancement and binarization [42,26,27,44,45,28]. In anotherwork [14], a Bayesian model for the binarization of historicaland degraded documents is presented, which uses the output of aPDE-based enhancement method developed in the same work asthe training data. Then, the trained model is used to binarizeother document images. This model has been successful, espe-cially with historical manuscripts consisting of many pages in asimilar state of degradation. Its drawbacks are its ignorance ofspatial correlation (however, the PDE-based part considers thesedata implicitly) and its inability to adapt to individual images. Inthis work, our model adapts to each input image and considersthe spatial relation explicitly. In [42], a Bayesian model with ahidden Markov model has been proposed to binarize double-sided document images suffering from bleed-through degrada-tion. The method is limited to linear mixtures, however, andsuffers from noisy pixels on the output because of its globalnature. In another work [26], an MAP-MRF model has been usedto binarize non-uniform illuminated document images. Themodel assumes a Gaussian distribution for the Bayes model witha fixed variance value, as well as using the same variance valuefor both text and background, which limits the ability of themethod to adapt to different behaviors of text and backgrounddistributions. Also, it is not applicable to cases with stronglyinterfering patterns, such as bleed-through degradation. The mainapplication of the model is to binarize non-uniformly illuminateddocument images. Another model based on MRF has beenintroduced in [27]. This one has limited performance in terms ofpreserving thin and weak strokes. A linear-mixture model hasbeen presented in [44] which again suffers from noisy artifacts onthe output because of its global and linear nature. In [45], anotherMRF model is presented that requires double-sided documentimages, which are not always available. The method has beensuccessful on 18th century documents, although its performancedecreases for large document images. Also, the small neighbor-hood used in the method limits the extent of its applicability. Thedisadvantage of the MRF-based models is that they use hand-crafted clique potentials and small neighborhood systems (4 or8 connected systems), which is not enough to describe thecomplete spatial relations in the model [28]. Variations in textand background intensity over the document image domain canbe very local, and therefore a simple MRF model cannot handlethem. In this work, instead of using an MRF model, we locallyadapt the model parameters in the spatial domain for each pixelin order to host the spatial relations within the model. The detailsof this adaptation are provided in Section 5.

3. Problem statement

A degraded gray value document image, suffering from variousdegradation phenomena, such as bleed-through, dark back-ground, or weak strokes, is given: u(x), where x¼ ½i,j�T AO�R2.The domain O is an open rectangle. We follow the BW01representation [15]: the image values range from 0 to 1, wherea 0 value means that the pixel is black, and a value of 1 meansthat the pixel is white. Also, we assume that as a priori informa-tion, a rough binarization map of the image, uBW ,0, is available.Although this approximate map may suffer from a large error, it isassumed that its recall value against the ground-truth binariza-tion map is high. A high recall value is chosen to reduce thepresence of interfering patterns (such as degraded background,bleed-through, and show-through) on the map. Although somelow-intensity text will be lost on this map (as shown in Fig. 2(b)),

these parts of the text will be recovered later in the succeedingsteps using a local ML classification, which will be discussedin Section 4.

A set of parameters, such as the average stroke width and theaverage line height (distance between two successive baselines),is considered as a priori information [15]. Also, information aboutthe presence of bleed-through on the document image is con-sidered to be known. The goal is to binarize the true text regionson the image, and separate them from the background andpossible interfering patterns. In other words, a binarized map ofu, uBW, will be the final output. Preserving weak connections andstrokes is the main objective, as they are very important in thesubsequent processing steps.

4. Formulation

The problem of text binarization can be considered as a binaryhypothesis problem with two hypotheses, H0 and H1. The mainhypothesis, H0, assigns a pixel to the text, while H1 rejects thisassignment and labels the pixel with ‘‘background’’ (includingdegradation and interfering pattern pixels). Usually, in orderto make a decision between H0 and H1, posterior probabilitiescalculated based on the data and a priori information are used. Letus denote these probabilities as PðH0ju,uBW ,0Þ and PðH1ju,uBW ,0Þ,respectively. To make the model spatially adaptive, it is assumedthat each pixel on the image has its own binary problem. This willbe discussed later in Section 5.2.

There are several decision rules that can be used in Bayesianmodels: maximum likelihood (ML), minimum probability of error,maximum a posteriori, and Bayes risk decision [23]. Although MLdoes not consider the a priori information on the classes, thiscannot be considered a drawback. Determination of the modelsfor classes is a very difficult task because of the complexity andvariability of text and background on the historical documents. Atthe same time, the local nature of the method enables it to adaptto possible variations on the input image. In particular, variationsin text and background intensity over the document imagedomain can be very local, and therefore a simple model forPðHiÞ is not able to handle them.

Considering the well-known behavior of text and back-ground data in the features space, two simple, basic models canbe use to model each of them: histogram-based model [31] andGaussian models. In histogram-based models, the model of eachclass is estimated according to its probability density func-tion [16]:

PDFoðkÞ ¼1ffiffiffiffiffiffi

2pp

hno

Xnoj

no,jexp�ðk�jÞ2

h

( )

where oAft,bg, no is the total number of pixels in class ocomputed based on uBW,0, j represents the bins of the histogram,no,j is the number of pixels in class o having the intensity in bin j,and h determines the level of smoothing. For example, PDFtðuðxÞÞ

can be used as PðuðxÞjH0Þ.In document image processing, a single-mode distribution for

text and a multi-mode for background are usually used [34,6].This is because of the complex nature of the background ondegraded document images. Normally, a Gaussian model is usedfor each mode in these distributions [34,6,37]. For the case of thebackground distribution, we skip the assumption of having amulti-mode distribution, because the proposed method adapts aseparate model to each pixel on the document image. Therefore,the distributions are highly local, and so in most cases thebackground distribution of a pixel is a single-mode one. Thisallows us to assume two Gaussian models, one for text and one

Page 4: A Spatially Adaptive Statistical Method for the Binarization of Historical Manuscripts and Degraded Document Images

R. Hedjam et al. / Pattern Recognition 44 (2011) 2184–2196 2187

for background:

PðuðxÞjH0Þ ¼ PðuðxÞjtextÞ ¼1ffiffiffiffiffiffiffiffiffiffiffiffi

2ps2t

q exp�1

2

ðuðxÞ�mtÞ2

s2t

!ð1Þ

and

PðuðxÞjH1Þ ¼ PðuðxÞjbkgdÞ ¼1ffiffiffiffiffiffiffiffiffiffiffiffi

2ps2b

q exp�1

2

ðuðxÞ�mbÞ2

s2b

!ð2Þ

where mo and so are the mean and standard deviation foroAft,bg and should be determined, and u(x) represents the grayvalue of a pixel x. Applying the decision rule, the classificationproblem is equivalent to the following minimization problem:

uBW ðxÞ ¼ arg minwA ft,bg

ðuðxÞ�mwÞ2

2s2w

( )ð3Þ

The estimation of mt , mb, and sb will be discussed in the nextsection. As a special case, the parameter st will be discussedin Section 5.4.

5. Methodology

In most documents (degraded or not), the text is blacker thanthe degradation, but, at the same time, it may contain somestrokes that are degraded and have the same intensity as thedegradation, or less (see Fig. 3). The proposed method is com-posed of two steps. The first step consists of under-binarization ofthe document image by eliminating, to the maximum extentpossible, the degradation information and keeping only theblacker part of the text. In this case, interfering patterns, noise,and even weak and low intensity strokes will be erased(see Fig. 2(b)). In the first step, the grid-based Sauvola binariza-tion method [15] is used, because it can produce an under-binarized image when high k values are used. The k value shouldbe close to one in the case of document images with severedegradation like bleed-through. The second step recovers themissing text strokes (lost in the previous step) using a localbinarization based on ML classification. The details are providedin the subsequent subsections.

5.1. Sauvola binarization algorithm

As the first step, the input image u is binarized with a simplebinarization method; in our case, we used Sauvola’s methodwhich processes each pixel locally and independently of the restof the pixels. It is claimed that it gives good results in thebinarization of a document image in which the backgroundcontains large variations and non-uniform illumination. Thisalgorithm calculates a threshold for each pixel according to theinformation contained in its local neighborhood. The idea of the

Fig. 3. An example of degraded text. This is a part of one image of the data set

used in this work. It is zoomed to display the local variation in intensity where

some text pixels are brighter than the degraded background.

method is to vary the threshold in the image based on the valuesof local mean and local standard deviation in a window centeredon each pixel:

TG,Sau,uðxÞ ¼mGðxÞ 1þksGðxÞ

R�1

� �� �ð4Þ

where m and s are the mean and standard deviation of the wholelocal window, R is the maximum value of the standard deviation(R¼ 0:5 for a gray-scale document image), and k is a parameterthat takes positive values. The subscript G refers to grid-basedmodeling [15]. The threshold is adapted using a mean and astandard deviation according to the local contrast. If the contrastis low in some regions ðs5RÞ, it means that the threshold dropsbelow the mean value, which allows the relatively dark regions ofthe background to be removed. The parameter k controls thethreshold in a local neighborhood. If k is high (towards 1), thenonly the darkest objects remain.

5.2. Spatially adaptive model

As discussed in the introduction, a large proportion of the dataof a document image is in the form of the spatial relations amongpixel data. Unfortunately, the spatial relations behave differentlyon different scales (or distances). Therefore, the task of modifyingthe feature vector to include these data would be very different inthe case of small scales relative to large scales. In a typicalprocessing patch, both small and large scales are presented,which requires a multi-scale representation of spatial relationsin the feature vector, which is a difficult task. This is why, thespatial data are ignored in many approaches, particularly in thestatistical ones. In brief, in our model, mo and so are replacedwith their localized equivalents moðxÞ and soðxÞ, which are in turncomputed in a neighboring system using a sliding window(patch). Now, the problem, (3), can be rewritten as follows:

uBW ¼ arg minwðxÞA ft,bg

Xx

ðuðxÞ�mwðxÞðxÞÞ2

2swðxÞðxÞ2

( )ð5Þ

The exact definitions of the model parameters are as follows:mtðxÞ, the gray mean value of class text at pixel x, stðxÞ the graystandard deviation value of the class text at pixel x, mbðxÞ, the graymean value of the class background at pixel x, and sbðxÞ thegray standard deviation value of the class background at pixel x.The neighborhood size used to compute these parameters will bediscussed in subsequent subsections.

5.3. Computing the fields of mt , mb, and sb

In the first step, the input image u is binarized with a simplebinarization method. In our case, we used the grid-based Sauvolamethod [15] which processes each pixel locally and indepen-dently of the rest of the pixels (Fig. 4(b)). Also, a few postproces-sing steps, such as the removal of noise pixels, are applied to theoutput of the method. A priori information, such as the averagestroke width, is used to make the postprocessing steps para-meterless. The result is then called the initial map, uBW ,0,(see Fig. 4(b)), and is used as a mask to generate two new maps:one containing only the text data ut Fig. 4(c) and the second, ub

containing only the background data (Fig. 4(d)). The two maps aregoverned by the following equation:

uo ¼u if uBW ,0 ¼o; oAft,bg

NaN otherwise

�ð6Þ

For statistical purposes, the incomplete data were assigned as thenotation NaN, which means not a number [39].

Page 5: A Spatially Adaptive Statistical Method for the Binarization of Historical Manuscripts and Degraded Document Images

R. Hedjam et al. / Pattern Recognition 44 (2011) 2184–21962188

Now, we consider the problem of estimating the localparameters (local average and variance of intensity). Given thearea delimited by the patch J in Fig. 5(a), as an exampleof background data, we estimate mbðxjÞ and sbðxjÞ by includingonly the information belonging to the gray pixels in that patch(but not to the black ones, which contain NaN values), as in the

Fig. 5. An example showing how to compute the local parameters: (a) computing mbðxjÞ

which are not concerned with the calculation of the parameters. The parameter of the p

belonging to the corresponding patch and (b) the same applies to estimating the text pa

in that patch and ignore the white pixels, which contain NaN values (painted in white

Fig. 6. An example of how to inpaint the data: (a) mt only for masked text data, the

Fig. 4. An example of how we produce the masked data (text and background):

(a) original image, (b) initialization map (Sauvola binarization), (c) masked text

data, the white area containing NaN values, this image is used to compute mt and

st and (d) masked background data, the black areas containing NaN values: this

image is used to compute mb and sb .

following equations:

mwðxÞ ¼1

jNwðxÞj

Xxk ANw ðxÞuw ðxk Þa NaN

uðxkÞ; wAft,bg ð7Þ

s2wðxÞ ¼

1

jNwðxÞj

Xxk ANw ðxÞ

uw ðxk Þa NaN

½uðxkÞ�mwðxÞ�2; wAfbg ð8Þ

where jNwðxÞj represents the number of the neighborhood pixelsxk around the main pixel x, except those having NaN values. Inanother case of the same type, for the pixel xi in Fig. 5(a), theestimation of mbðxiÞ and sbðxiÞ is not possible, because the patch I

centered on pixel xi (called a singular pixel) does not contain anybackground information. In the next paragraph, we explain howto estimate the local parameters of a singular pixel.

Since the mt , mb, and sb fields contain a large number ofunknown NaN values, an inpainting technique using the aver-aging method [4] is applied to fill in the empty areas (NaN valuesor singular pixels). The inpainting process permits propagation ofthe values on the empty area with control of the variation on thegray values while avoiding their amplification, as shown in Fig. 6.In order to ensure a stable output, we first replace all the borderpixels of mt by the average value of its known values, and then weapply the inpainting process, as in the following equation:

mtðxjxA@OÞ ¼MðfmtðxÞj8xAO;utðxÞaNaNgÞ ð9Þ

where MðfzgÞ is a function computing the mean value of the setfzg, and @O is the border of the domain O. For the other fields(mb and sb), a 0 value is used on the border. By applying theinpainting procedure, the smooth and complete fields becomeavailable. A main parameter to be set is the patch size forcalculating mw and sw, wAft,bg. As mentioned before, to reduce

and sbðxjÞ for the background data. In practice, the black area contains NaN values

ixel xj in patch J is computed by using only the gray text pixels (not the black ones)

rameters: for example, to estimate the mtðxjÞ in patch J, we take only the gray pixels

to distinguish them from the gray text).

white area containing NaN values and (b) inpainted mean text data values mt .

Page 6: A Spatially Adaptive Statistical Method for the Binarization of Historical Manuscripts and Degraded Document Images

R. Hedjam et al. / Pattern Recognition 44 (2011) 2184–2196 2189

the computational cost, we use the grid-based approach [15],in which all functions and variables are represented by theirequivalent variables on a grid of size dc. In this approach, thepatch size is represented by 2dcþ1. We set dc as a factor of Ws:dc ¼Ws, where Ws is the average stroke width [15], one of the apriori parameters. Therefore, a patch containing enough informa-tion from both the text and the background classes will be used toestimate the local mean and standard deviation.

5.4. Estimation of the st field

To estimate st , a global estimation of the standard deviation ofthe text intensities is first calculated. Then, this value is mappedonto the image domain using a spatial Euclidean distance. In thisway, very local and sharp fluctuations in the st field are avoided.

5.4.1. Estimation of the global st: St

Local text variations are very small, due to the constant localintensity of the text. Therefore, any estimation of text variance basedon ut will result in a small value, and will prevent the model fromcapturing the degraded parts of text strokes with low intensities in

Fig. 7. An example of text containing low-intensity pixels. For better visualization

only a part of the input image (Fig. 15) is shown.

Fig. 8. An example of how low intensity text pixels are recovered: (a) part of the origin

(c) s text stroke masked by (b), (d) text pixels masked by the output dilation of (b),

presented in Fig. 15

0 0.2 0.4 0.6 0.8 1

0

500

1000

1500

2000

2500

gray−level (x 255)

σ2t

Fig. 9. Estimation of the global St: (a) histogram of the pixels of Fig. 8(c), (b) histogram

low intensity text pixels.

the subsequent classification process. This phenomenon leads tocuts and discontinuities in the strokes, which, as discussed above,have a severe impact on the performance of the subsequent processof understanding. To resolve this problem, we may use another mapinstead of uBW ,0, which is able to catch the degraded parts of strokesand give a more accurate estimation of st . This is achieved bydilating the text region in the initial map with a structuring elementof radius r¼Ws, in order to include low intensity text pixels of theoriginal image u in the calculation of the new variance of text S2

t .This will increase the estimated text variance, because of thepresence of many low intensity pixels. Let us call the dilated mapuD

BW ,0 [32]. The data in the new masked text field utD, which is

obtained using uDBW ,0, consists of both text and background data. In

order to remove the background data, Otsu’s method is applied. Letus consider P as the set of pixels extracted from the image u byusing uD

BW ,0 as a mask:

P ¼ fxjxAO, uDBW ,0ðxÞ ¼ 0g ð10Þ

P contains a subset of high intensity text pixels ðHÞ, a subset ofdegraded low intensity text pixels ðLÞ, and a subset of backgroundpixels ðBÞ (see Fig. 7):

P ¼H [ L [ B ð11Þ

Since the values in L are more similar to those of H, a simple Otsuclassification allows the division of P into two subsets, T and B(see Fig. 9(b)), as follows:

P ¼ T [ B ð12Þ

where T ¼H [ L. The remaining data, uD0t , cover both strong and

degraded text intensities, and are therefore suitable for calculatingthe standard deviation.

uD0t ðxÞ ¼

uðxÞ if xATNaN otherwise

�ð13Þ

al image shown in Fig. 15, (b) the output of Sauvola’s method (initialization map),

and (e) the result of the binarization method. The output of the whole image is

0 0.2 0.4 0.6 0.8 1

0

500

1000

1500

2000

2500

gray−level (x 255)

S2t

of the pixels of Fig. 8(d). We can clearly see that S2t 4s2

t , and this is because of the

Page 7: A Spatially Adaptive Statistical Method for the Binarization of Historical Manuscripts and Degraded Document Images

Fig. 10. Example of how the grid works. Left, a sample image, the black squares representing the nodes of the grid, and the distance between two successive square equal

to dc. Right, the values in the black squares represent the calculated values of the grid. Each of them is computed by the function performed on all values belonging to the

windows shown on the left. In this example, the input function is the mean function.

R. Hedjam et al. / Pattern Recognition 44 (2011) 2184–21962190

The standard deviation value of the dilated text is given by the STDfunction:

St ¼ STDðfuD0t ðxÞjxAO, uD0

t ðxÞaNaNgÞ ð14Þ

Fig. 9(b) shows how Otsu’s thresholding separates P intotwo subsets. The new variance S2

t is then greater than s2t . The

standard deviation values of the map st are substituted by St ,following the spatial correction described in the next section. It isworth noting that, although the intensity variation of the text pixelsacross the image domain may result in the loss of some of the textpixels, the remaining pixels in T could provide an approximation ofthe text intensity variation induced by degradation. Because of thelocal nature of the method, an approximation of the text variance isenough to recover the degraded text pixels.

5.4.2. Spatial adaptation of st

On some images, parts of the document background are verydark, and the intensities on these areas are very close to the textintensity. These pixels can appear as wrong patterns on theoutput because of misclassification. In order to prevent this effect,we define

stðxÞ ¼ Stexp�dðxÞ ð15Þ

where d(x) is the Euclidean distance [11] of a background (white)pixel x from the black text region computed on the initial mapuBW ,0. When a degraded pixel x is far from the main text, exp�dðxÞ

becomes very small (depending on its distance d), and therefore,the degraded pixels will automatically be labeled as backgroundin the classification step.

5.5. Estimation of uBW

Having all four text and background features ready on theimage domain, the decision rule can be applied to the minimiza-tion problem (5) to estimate the final binarization. The classifica-tion problem can be described as follows:

uBW ðxÞ ¼0 if

ðuðxÞ�mtðxÞÞ2

2s2t ðxÞ

oðuðxÞ�mbðxÞÞ

2

2s2bðxÞ

1 otherwise

8><>: ð16Þ

Although direct calculation can be used to compute moðxÞ andsoðxÞ, grid-based modeling is used [15] in order to reduce thecomputational cost and also to smooth these functions. In thisapproach, all functions and variables are calculated from theirequivalent variables on a grid of size dc. In this way, while thepixel-wise nature of the model is preserved, functions andvariables can be transferred easily to any suitable scale. Fig. 10,shows the grid operation [15] for estimating the parameters in afew nodes and then for interpolating the remaining parametersvalues. A schematic algorithm of the proposed method is providedin Algorithm 1.

Algorithm 1. Principle of the proposed algorithm.

1 Input: The original image u;2 Under-binarization of u;3 Calculate the under-binarized map uBW ,0 using Sauvola’s

algorithm; cf.; Fig. 2(b);4 Get the masked text ut and masked background ub maps

cf.; (4);5 Parameter estimation;

6 Estimate mw, w¼ ft,bg and sb; Eq. (8);

7 Use the inpainting technique to propagate the estimatedvalues in the empty areas;

8 Global estimation of st;

9 Dilate the text of uBW ,0 to get uDBW ,0;

10 Get the masked dilated text P from u using uDBW ,0, Eq. (10);

11 Separate P into two subsets: T and B. T is the part of P thatis associated with text;

12 Compute the standard deviation of T , S, Eq. (14);13 Spatial adaptation of st;

14 stðxÞ ¼ Stexp�dðxÞ; Eq. (15);15 Calculate the binarized image of u, Eq. (16);16 Output: The binarized image uBW;

6. Experimental results and discussion

The performance of the proposed method is evaluated ondifferent datasets. The first dataset consists of samples from the

Page 8: A Spatially Adaptive Statistical Method for the Binarization of Historical Manuscripts and Degraded Document Images

Fig. 11. An example of degraded shapes with variable intensities: (a) part of a degraded image (selected for better visualization), (b) the output of Otsu’s method, (c) the

output of Sauvola’s method, and (d) the output of the proposed method, which is continuous and smooth.

Fig. 12. A second example of degraded shapes with variable intensities: (a) the original image, (b) the output of Otsu’s method, (c) the output of Sauvola’s method, and

(d) the output of the proposed method which is continuous and smooth.

Fig. 13. A third example of degraded shapes with a strong interference of bleed-through: (a) the original image, (b) the output of Otsu’s method, (c) the output of Sauvola’s

method, (d) the output of Drira’s method [9], and (e) the output of the proposed method.

R. Hedjam et al. / Pattern Recognition 44 (2011) 2184–2196 2191

Google Book Search dataset [21] which have suffered degradation.The Google Book Search dataset contains the scanned, resized,and flattened images of several books. The DVD of the datasetcontains data for 68 books. The second dataset is from the Juma AlMajid Center (Dubai) [1], and contains 85 images (about 160document pages) of very old and degraded text. The documentimages were acquired by the center using camera imaging (a 10.3megapixel CCD camera), at an acquisition resolution of 150 DPI.The content varies from normal handwritten text to very complexmaps. The third dataset is from the Institute of Islamic Studies1

(IIS) at McGill University, and consists of 280 images correspond-ing to a 260-page manuscript. The document images wereacquired using camera imaging (a 21 megapixel CCD camera), atan acquisition resolution of 300 DPI. Finally, the fourth dataset isa Latin and Arabic manuscript dataset [8], which contains a largenumber of ancient documents, including Arabic manuscripts fromTombouctou, Mali. Below, the performance of the method isevaluated both subjectively and objectively.

6.1. Subjective evaluation

Fig. 11 shows the performance of the method for a subimagefrom the Google dataset. The results of two standard methods arealso shown in the figure for comparison purposes: Otsu’smethod [34], which is a global thresholding method, and the

1 http://www.mcgill.ca/islamicstudies

local-thresholding method of Sauvola [36] are used to binarizethe input image. Because of the variations in text intensity, noneof the global or adaptive methods is able to preserve weakconnections and provides smooth, continuous output. As can beseen from the figure, the outputs of both methods suffer from cutsand false holes, which have a highly negative effect on theperformance of subsequent skeletonization processes.

It is worth noting that holes and loops are very important inthe understanding steps, such as word spotting, and should bepreserved. Because of the presence of loops and holes in thecorrect shapes, morphological opening/closing operators cannotbe applied in many cases. At the same time, the output of theproposed method is continuous, and very weak connections arepreserved thanks to the very local and correlation-based nature ofthe method. In Fig. 12, the same behavior can be observed for asample from the McGill dataset. Not only do the global and localthresholding methods suffer from false discontinuities and holes,some parts of the background appear on the outputs. Again, theproposed method provides a very smooth, continuous output.

The final example, shown in Fig. 13, suffers from the bleed-through effect. As can be seen from the input image (Fig. 13(a)),the degree of the bleed-through effect is high, and at some regionsthe intensity of the interfering patterns is very close to the truetext intensity. Also, it is worth noting that only the image of oneside of the document is available and the double-sided bleed-through removal techniques [44,13] cannot be applied in thiscase. The outputs of the Otsu and Sauvola methods are shownin Figs. 13(b) and (c). None of them is successful in removing the

Page 9: A Spatially Adaptive Statistical Method for the Binarization of Historical Manuscripts and Degraded Document Images

R. Hedjam et al. / Pattern Recognition 44 (2011) 2184–21962192

bleed-through, and the segmented texts suffer from a high degreeof pattern interference. The output of the proposed method isshown in Fig. 13(e). The method is not only able to remove thebleed-through interfering patterns, but also provides continuousstrokes which are ready for the subsequent processes. Because ofsuitable initialization using the recursive Sauvola method [15]and the spatially adaptive classification of the proposed method,the interfering patterns are largely separated from the true text,even though their intensity is very close to the text intensity. Thisexample confirms the discussion in the introduction on theimportance of the a priori information. The initial binarized mapobtained using the recursive Sauvola method is the key toseparating the bleed-through patterns. In the case of a simpleMarkov model, the degradation pixels can easily be added to theoutput. We are currently working on a combined model whichcontains a Markov part adapted to the a priori information.

In Fig. 1, a set of sample degraded document images fromvarious datasets is provided. The documents suffer from degraded

Fig. 14. Binarization results of the images in Fig. 1.

Fig. 15. Subjective comparison with the state of the art: (a) original image, (b) Sauvola

background and bleed-through. For example, Fig. 1(a) has bothdegraded background and degraded text. The outputs of theproposed method for this image are shown in Fig. 14(a). Thebackground is clean and weak strokes are recovered. The outputsof the proposed method for the other samples in Fig. 1 arealso shown in Fig. 14. The method is able to remove strongbleed-through interfering patterns. It is worth noting thatfor Fig. 1(c) the recursive Sauvola method [15] has been used toobtain uBW ,0.

Another subjective comparison with the-state-of-the-art isshown in Fig. 15. Fig. 15(a), which is from the DIBCO’09 dataset,contains a very degraded background and text. Fig. 15(b) is theresult of Sauvola’s thresholding method [36], which is able torecover the text, but it cannot remove the strong backgrounddegradation. Also, the output suffers from noise. Fig. 15(c), whichis taken from [41], performs well in extracting text from back-ground, but its ability to recover the degraded text strokes islimited. The result of the proposed method is shown in Fig. 15(d):the degraded strokes of the text are recovered, while thedegraded background is eliminated. However, the proposedmethod fails to recover very small loops in characters, especiallywhen they belong to a highly degraded background. We areworking to generalize the model to cover this case of degradationas well.

6.2. Objective evaluation against DIBCO’09 [17]

The objective evaluation has been performed using theDIBCO’09 dataset. The results of the proposed method arecompared to top three methods of the DIBCO contest [17],involving 43 participants from 22 laboratories worldwide. It isworth noting that the dataset was not available to the partici-pants in the contest. We used the available input images simplyto determine their class: low contrast images, images with adegraded background, and images with the bleed-through pro-blem, as will be defined in Section 6.2.1. Before conducting anobjective evaluation, a brief description of a few binarizationmethods used for this purpose is provided. The first algorithm,which was proposed by Lu and Tan [17] and won first place in thecompetition, consists of four parts, dealing with document back-ground extraction, stroke edge detection, local thresholding, andpostprocessing. The local threshold is estimated by averagingthe detected edge pixels within a local neighborhood window.The second algorithm considered was proposed by Fabrizio and

thresholding result, (c) Su and Tan [41] result, and (d) the proposed method result.

Page 10: A Spatially Adaptive Statistical Method for the Binarization of Historical Manuscripts and Degraded Document Images

R. Hedjam et al. / Pattern Recognition 44 (2011) 2184–2196 2193

Marcotegui [17], and is based on the toggle mapping operator.The image is first mapped onto the corresponding morphologicalerosion and dilation. Then, if the pixel value is closer to theerosion, it is marked as background, otherwise it is marked asforeground. To avoid salt-and-pepper noise, pixels with erosionand dilation that are too close, are excluded from the analysis.Pixels are then classified into three classes: foreground, back-ground, and homogeneous regions. Finally, the homogeneousregions are assigned to the foreground or background class,according to the class of their boundaries. The third algorithm,proposed by Rivest-henault et al. [17], uses a curve evolutionapproach implemented in the level-set framework to erode astroke map [14] estimated from some statistical informationobtained from the content of the input image. The evolution ofthe curve is based on a term of energy measuring the fitness ofthe stroke pixels with respect to Stroke’s gray level image. Thefourth algorithm used in the comparison task is the well-knownSauvola method, which is based on a local estimation of varianceand mean to obtain the threshold by which the pixel will beclassified as either text or background [36].

Table 1Performance in terms of precision, recall, and F-measure of the proposed

algorithm.

Image no. Image Recall Precision F-measure

1 H01 94.873 88.571 91.614

2 H02 92.574 92.349 92.461

3 H03 89.143 98.972 89.555

4 H04 95.997 81.150 87.952

5 H05 85.924 83.670 84.782

6 P01 87.883 95.709 91.630

7 P02 94.247 97.833 96.007

6.2.1. Evaluation setup

In all the experiments, we have considered the followinginternal parameters for our image binarization model. The firststep consists of creating a black and white initialization map,uBW ,0. The Sauvola binarization algorithm integrated on a gridfunction [15] is used to generate this map by fixing the line heightparameter hl to 100 pixels, and the parameter k is 0.2, 0.4, 0.5, and0.9 for the low contrast, faded text, degraded background, andbleed-through degradation types, respectively. The size of the gridcell dc and the radius of the structure element r are proportionalto the stroke width: dc ¼ ldc

Ws and r¼ lrWs, where Ws is thestroke width, and ldc

and lr are proportional factors. In ourexperiments, ldc

¼ 2 and lr ¼14. The stroke width is estimated

based on a statistical method described in [15].

8 P03 95.298 98.854 97.0439 P04 95.847 87.800 91.647

10 P05 89.213 91.343 90.266

Average 92.100 90.725 91.296

Table 2Comparison of the performance of the proposed algorithm and the others against

DIBCO’09. For Sauvola’s method, the performances reported in [15] have been

used.

Algorithms Method numberin [17]

F-measure

The proposed algorithm – 91.29Lu and Tan [17] 26 91.24

Fabrizio and Marcotegui [17] 14 90.06

Rivest-Henault et al. [17] 24 89.34

Sauvola’s [36] – 87.27

Multi-scale grid-based Sauvola [15] – 89.26

Fig. 16. Printed and handwritten ink degradation: (a) handwritten document and

(b) printed document. The arrows indicate the missing ink or areas with little ink.

6.2.2. Performance measures

We measure the similarity between the output of the proposedalgorithm and the ground-truth made available by the DIBCO’09organizers. The performance measures used to evaluate theproposed algorithm are the well-known recall (R), precision (P),and F-measure:

R¼TP

TPþFNð17Þ

P¼TP

TPþFPð18Þ

F�Measure¼2� R� P

RþPð19Þ

where TP, FP, TN, and FN denote the true positive, false positive,true negative, and false negative values, respectively.

We have replicated the scenario used in the evaluation of thestate-of-the-art binarization methods described in the DIBCOcontest [17]. The dataset consists of five machine printed images(referred to as PR images and labeled as P0X images, where X isfrom 1 to 5) and five handwritten images (referred to as HWimages and labeled as H0X images, where X is from 1 to 5),resulting in a total of ten images for which the associated ground-truths are ready. One of the images is shown in Fig. 17(a). Theseimages suffer from a different kind of degradation, one frequentlyfound in old documents (e.g. variable background intensity,shadows, smearing, smudging, low contrast, bleed-through, andshow-through).

Table 1 shows the recall, precision and F-measure resultsobtained by the proposed algorithm, and Table 2 compares theaverage F-measures generated by the various methods. In Table 1,we observe that the precision is higher than the recall in the caseof the HW images (see Table 1). This means that the algorithm isable to recover the weak strokes by attaching the discontinuousparts. These gaps are mostly created because of low-intensity textcaused by ink shortage on some parts due to rapid movement ofpen on paper during writing. It is worth noting that the dilationoperator of the algorithm may result in the addition of someundesirable pixels, especially on the border of a stroke. In the caseof HW images, we observe that the recall measure is higher thanthe precision. This means that the algorithm is more effective atfilling the gap. This is due to the way in which the ink is flowsonto the paper. Generally, writers move the pen horizontallyover the paper, so the amount of ink is reduced if the writer doesnot lift the pen from the paper to move to the next piece(see Fig. 16(a)). Therefore, in the gap areas, the intensity is closerto the local neighbor text than to the local neighbor background.This helps the proposed algorithm to classify these gaps as text. Incontrast, printing and typesetting devices inject ink verticallyonto the paper, so the discontinuities of the text are sharp, and

Page 11: A Spatially Adaptive Statistical Method for the Binarization of Historical Manuscripts and Degraded Document Images

R. Hedjam et al. / Pattern Recognition 44 (2011) 2184–21962194

the missed parts of strokes are more similar to the background(see Fig. 16(b)). This is why the algorithm classifies these areas asbackground, although they are more likely to be text. From thesame table, we found that the average F-measure of the PR images(93.318%) is greater than the average F-measure of the HW images(87.958%). The explanation for this is that most of the HW imagesare more degraded, and many parts of the text have a lowintensity, close to the intensity of the background. The smallestF-measure (84.782%) is given by image H05, which is a verydegraded document where many parts of strokes are quite closeto the background in terms of intensity. For this type of documentimages, because some strokes may be missed on the initializationmap, achieving a good initialization map for the Bayesian classi-fication process is not easy. This is why their precision measuresare mostly lower for the initialization. For, the HW images, H02has the best F-measure (92.461%), because the text in this image ismore regular and does not contain low intensity pixels. Also, wehave a priori information about the existence of bleed-through, sothe initialization map (obtained using the recursive Sauvolamethod [15]) is more stable and helps the Bayesian classificationto separate the text from the interfering objects accurately. Forthe PR images, the highest F-measure corresponds to image P03,because it has less degradation, Thus, the classification problem ismore efficient. Image P05 contains many interfering objects andmany parts of the text are interrupted, which reduces itsF-measure in comparison to that of the other PR images.

Fig. 17. Visual comparison of the binarization algorithms: (a) original image,

(b) Sauvola’s binarization, (c) Lu and Tan’s binarization, (d) Rivest-Henault, Farrahi

and Cheriet’s binarization, and (e) the proposed binarization.

Table 3Performance in terms of precision, recall, and F-measure of the proposed algo-

rithm. For the Otsu and Sauvola methods, the performances reported in [15] have

been used.

Image Otsu [34] Sauvola [36] Multi-scalegrid-basedSauvola [15]

Our method

H01 90.84953 82.02617 91.40965 91.614

H02 86.14536 76.91948 77.62213 92.461

H03 84.11402 88.19474 86.55118 89.555

H04 40.55702 88.94607 85.67288 87.952

H05 28.03838 84.32496 83.13418 84.782

P01 89.80882 90.15921 91.80884 91.630

P02 96.54794 94.57987 95.92604 96.007

P03 96.77968 89.58904 96.99918 97.043

P04 82.71352 91.88228 94.06863 91.647

P05 88.36841 86.07259 89.42637 90.266

Average 78.39227 87.26944 89.26191 91.296

6.2.3. Comparison with the state of the art

As explained above, we have evaluated the proposed algorithmon the DIBCO’09 dataset [17], which consists of 10 ancientdocument images (five are handwritten and five are machine-printed). The images are of different sizes, nine of them are gray,and one is in color. The proposed algorithm is compared againstthe four algorithms discussed previously in Section 6.2. These fouralgorithms have been chosen because of the availability of theirimplementation code (the third and fourth algorithms) and theavailability of their performance measures (the first and secondones) [17]. Table 2 provides the average of the F-measure values ofvarious methods [17]. The proposed method outperforms theothers, although the results are close. The proposed method alsohas a better performance in terms of subjective evaluation(see Fig. 17). Also, because of its ability to preserve weakconnections and recover important holes, the method is alsomore suitable for the skeletonization problem. Sauvola’s algo-rithm gives a good result, but it also suffers from some noise inthe output image, especially in the presence of strong interference(Fig. 17(b)). It is worth noting that Rivest-Henault et al.’s algo-rithm is also competitive. However, it suffers from interferingpatterns and fails to separate some strokes.

In order to obtain a more specific analysis of the performanceof the proposed method, its scores are compared with those ofother methods in Table 3. It is worth noting that we have accessto the performance of just three methods, because the results ofother works are not available to us. From Table 3, we canconclude that the global thresholding method (for example,Otsu’s method) works well in the case of uniform documentimages (images P03, H01) and also in the case of light bleed-through (image P02) [15]. The local methods, such as Sauvola’s,work well in the case of non-uniform background (images H04,H05). In the case of bleed-through, for example image H02, thesemethods fail, while the proposed method with a good initializa-tion achieves a good result for this type of document image. Themulti-scale grid-based Sauvola method [15] performs well com-pared to the Otsu and Sauvola methods in almost all cases. Onlyin the case of bleed-through in image H02 does this method fail to

extract the true text. In contrast, the proposed method achieves agood performance in all cases.

6.3. Computational cost and complexity of the method

The proposed binarization algorithm, implemented in Matlab7.0, takes 36 s on average for all images in the dataset on an AMDAthlon (TM) 64�2 Dual Core Processor 6000+, 3.01 GHz, 3.25 GBof RAM with Windows XP. Fig. 18 concludes that the time cost ofthe algorithm is linearly proportional to the linear size of theimages. The complexity of the algorithm can be expressed as

Page 12: A Spatially Adaptive Statistical Method for the Binarization of Historical Manuscripts and Degraded Document Images

500 600 700 800 900 1000 1100 1200

5.45.65.8

66.2

0

20

40

60

80

Square root of size of image

Tim

e (s

) [x1

0−5]

Tim

e (s

)

Fig. 18. Time cost of the proposed algorithm. The continuous line indicates the

processing time per pixel (the left axes), while the dashed line shows the image

processing time.

R. Hedjam et al. / Pattern Recognition 44 (2011) 2184–2196 2195

Oðn2 �m2=m2Þ¼Oðn2Þ, where n is the size of the images andm¼ 2dc is the size of a grid patch.

7. Conclusions and future prospects

In this paper, we have introduced a new approach to thebinarization of degraded ancient document images. The main ideais to spatially adapt a two-class (text and non-text) ML classifierto the pixels by a local estimation of the parameters of each class.The parameters of a class for each pixel are computed from thegray-level distribution of its neighbors. In order to eliminate thestrongly interfering background while preserving the weak partsof low-intensity text, low intensity text pixels are considered inestimating the variance. This helps to improve the extraction ofthese weak parts. A simple morphological operator is applied onan initialization map, followed by a simple Otsu thresholdingprocess to include the original low-intensity text pixels andseparate them from the background pixels. The high-intensityand low-intensity pixels are then used to estimate the variance.In addition, the variance on each pixel is weighted based on itsdistance from the main text.

The main advantage of the proposed algorithm is that itpreserves weak connections and provides smooth and continuoustext strokes, thanks to its local and correlation-based nature.Furthermore, an under-binarization map is used as the initializa-tion, as well as local ML classification help, on one hand to removenoise from the output images, and in on the other hand to recovertext with very low intensity strokes in order to preserve thetopology of the text. This concept leads to an increase in the recalland precision of the output results, which is very important insubsequent processing. Another advantage of the proposed algo-rithm is that it is a soft decision method that is parameterless forthresholding. In addition, it is simple to implement and of lowcomplexity. However, the proposed method fails to recover verysmall loops in characters, especially when they belong to a highlydegraded background. We are working to generalize the model tocover this case of degradation as well. The proposed algorithm hasbeen tested on the DIBCO’09 dataset. The results show that theproposed algorithm outperforms the other algorithms developedin the literature, in terms of F-measure and subjective evaluation.Also, it has been noted that there are different types of degrada-tion depending in particular on whether the document is hand-written or printed. This information can be helpful in future workwhere we will investigate the integration of the Markov random

field in order to smooth the edges and preserve the connectivitybetween the broken parts of the text using both the direction andintensity gradient of image pixels.

Acknowledgment

The authors would like to thank the NSERC of Canada for theirfinancial support.

References

[1] Juma Al Majid Center for Culture and Heritage, 55106 Dubai, UAE.[2] A. Antonacopoulos, A. Downton, On the analysis of historical documents,

International Journal on Document Analysis and Recognition 9 (2) (2007)75–77 (special issue).

[3] J. Bernsen, Dynamic thresholding of grey-level image, in: Eighth InternationalConference on Pattern Recognition, 1986.

[4] M. Bertalmio, G. Sapiro, V. Caselles, C. Ballester, Image inpainting, in: ACMSIGGRAPH’2000, ACM Press/Addison-Wesley Publishing Co., 2000, pp.417–424.

[5] Q. Chen, Q.-S. Sun, P.A. Heng, D.-S. Xia, A double thresholding imagebinarization method based on edge detector, Pattern Recognition 41 (2008).

[6] M. Cheriet, J.N. Said, C.Y. Suen, A recursive thresholding technique for imagesegmentation, IEEE Transactions on Image Processing 7 (6) (1998) 918–921.

[7] J. Marcelo Monte da Silva, R. Dueire Lins, F. Mario Junqueira Martins,R. Wachenchauzer, A new and efficient algorithm to binarize documentimages removing back-to-front interference, Journal of Universal ComputerScience 14 (2) (2008) 299–313.

[8] R. Deriche, O. Faugeras, Les EDP en traitement des images et vision parordinateur, Technical Report 2697, INRIA, 1996.

[9] F. Drira, Contribution �a la Restauration des Images de Documents Anciens,Ph.D. Thesis, Ecole Doctorale Informatique et Information pour la Societe(EDIIS), LIRIS, UMR 5205 CNRS, December 2007.

[10] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, second ed., Wiley,Springer, 2001.

[11] R. Fabbri, L.D.F. Costa, J.C. Torelli, O.M. Bruno, 2d Euclidean distance trans-form algorithms: a comparative survey, ACM Computer Survey 40 (1) (2008)1–44.

[12] R.F. Moghaddam, M. Cheriet, Application of multi-level classifiers andclustering for automatic word-spotting in historical document images, in:ICDAR’09, Barcelona, Spain, July 26–29, 2009, pp. 511–515.

[13] R.F. Moghaddam, M. Cheriet, Low quality document image modeling andenhancement, International Journal on Document Analysis and Recognition11 (4) (2009) 183–201.

[14] R.F. Moghaddam, M. Cheriet, RSLDI: restoration of single-sided low-qualitydocument images, Pattern Recognition 42 (2009) 3355–3364.

[15] R.F. Moghaddam, M. Cheriet, A multi-scale framework for adaptive binariza-tion of degraded document images, Pattern Recognition 43 (6) (2010)2186–2198.

[16] R.F. Moghaddam, D. Rivest-Henault, M. Cheriet, Restoration and segmenta-tion of highly degraded characters using a shape-independent level setapproach and multi-level classifiers, in: ICDAR’09, Barcelona, Spain, July26–29, 2009, pp. 828–832.

[17] B. Gatos, K. Ntirogiannis, I. Pratikakis, ICDAR 2009 document image binariza-tion contest (DIBCO 2009), in: ICDAR’09, 2009, pp. 1375–1382.

[18] B. Gatos, I. Pratikakis, S.J. Perantonis, Adaptive degraded document imagebinarization, Pattern Recognition 39 (3) (2006) 317–327.

[19] B. Gatos, I. Pratikakis, S.J. Perantonis, Improved document image binarizationby using a combination of multiple binarization techniques and adapted edgeinformation, in: ICPR’08, 2008, pp. 1–4.

[20] S. Geman, D. Geman, Stochastic relaxation, Gibbs distribution and theBayesian restoration of images, IEEE Transactions on Pattern Analysis andMachine Intelligence 12 (1984) 721–741.

[21] Google, Book Search Dataset, Version V edition, 2007.[22] X. Jia, J.A. Richards, Efficient maximum likelihood classification for imaging

spectrometer data sets, IEEE Transactions on Geoscience and Remote Sensing32 (2) (1994) 274–281.

[23] X. Jiang, S. Mahadevan, Bayesian risk-based decision method for modelvalidation under uncertainty, Reliability Engineering & System Safety 92(6) (2007) 707–718.

[24] I. Joosten, Applications of microanalysis in the cultural heritage field,Microchimica Acta 161 (3) (2008) 295–299.

[25] I.-K. Kim, D.-W. Jung, R.-H. Park, Document image binarization based ontopographic analysis using a water flow model, Pattern Recognition 35 (1)(2002) 265–277.

[26] J.G. Kuk, N.I. Cho, K.M. Lee, Map-mrf approach for binarization of degradeddocument image, in: 15th IEEE International Conference on Image Processing2008, ICIP 2008, 2008, pp. 2612–2615.

[27] T. Lelore, F. Bouchara, Document image binarisation using Markov fieldmodel, in: ICDAR’2009, Barcelona, Spain, July 26–29, 2009, pp. 551–555.

Page 13: A Spatially Adaptive Statistical Method for the Binarization of Historical Manuscripts and Degraded Document Images

R. Hedjam et al. / Pattern Recognition 44 (2011) 2184–21962196

[28] M. Lettner, R. Sablatnig, Higher order mrf for foreground-backgroundseparation in multi-spectral images of historical manuscripts, in: DAS’10,ACM, Boston, MA, 2010, pp. 317–324.

[29] M. Manso, M.L. Carvalho, Application of spectroscopic techniques for thestudy of paper documents: a survey, Spectrochimica Acta Part B: Atomic

Spectroscopy 64 (6) (2009) 482–490.[30] S. Marinai, M. Gori, G. Soda, Artificial neural networks for document analysis

and recognition, IEEE Transactions on Pattern Analysis and Machine Intelli-gence 27 (1) (2005) 23–35.

[31] N. Mezghani, A. Mitiche, M. Cheriet, Bayes classification of online

Arabic characters by Gibbs modeling of class conditional densities, IEEETransactions on Pattern Analysis and Machine Intelligence 30 (7) (2008)

1121–1131.[32] S. Mukhopadhyay, B. Chanda, Multiscale morphological segmentation of

gray-scale images, IEEE Transactions on Image Processing 12 (5) (2003)533–549.

[33] K. Nakayama, H. Hasegawa, C.A. Hernandez, Handwritten alphabet and digitcharacter recognition using skeleton pattern mapping with structural con-straints, in: Proc ICANN’93, Amsterdam, September 1993, p. 941.

[34] N. Otsu, A threshold selection method from gray-level histograms, IEEETransactions on Systems, Man and Cybernetics 9 (1979) 62–66.

[35] Recent advances in applications to visual cultural heritage. IEEE SignalProcessing Magazine, 25(4), July 2008. Special issue.

[36] J. Sauvola, M. Pietikainen, Adaptive document image binarization, PatternRecognition 33 (2) (2000) 225–236.

[37] M. Sezgin, B. Sankur, Survey over image thresholding techniques andquantitative performance evaluation, Journal of Electronic Imaging 13 (1)(2004) 146–168.

[38] F. Shafait, D. Keysers, T.M. Breuel, Efficient implementation of local adaptivethresholding techniques using integral images, in: Document Recognitionand Retrieval XV, San Jose, CA, January 2008.

[39] B. David et al., Solit, Braf mutation predicts sensitivity to mek inhibition,Nature 439(7074) (2006) 358–362.

[40] T. Steinherz, N. Intrator, E. Rivlin, A special skeletonization algorithm forcursive words, in: IWFHR’00, 2000, pp. 529–534.

[41] B. Su, S. Lu, C.L. Tan, Binarization of historical document images using thelocal maximum and minimum, in: DAS’10, ACM, Boston, MA, June 9–11,2010, pp. 159–166.

[42] F. Su, A. Mohammad-Djafari, Bayesian separation of document images withhidden Markov model, in: Second International Conference on ComputerVision Theory and Applications, Barcelona, Spain, 2007.

[43] C.L. Tan, R. Cao, P. Shen, Q. Wang, J. Chee, J. Chang, Removal of interferingstrokes in double-sided document images, in: Fifth IEEE Workshop onApplications of Computer Vision 2000, Palm Springs, California, 2000,pp. 16–21.

[44] A. Tonazzini, E. Salerno, L. Bedini, Fast correction of bleed-through distortionin grayscale documents by a blind source separation technique, InternationalJournal on Document Analysis and Recognition 10 (1) (2007) 17–25.

[45] C. Wolf, Document ink bleed-through removal with two hidden Markovrandom fields and a single observation field, IEEE Transactions on PatternAnalysis and Machine Intelligence, 32 (3) (2009) 431–447.

Rachid Hedjam received his M.Sc. degree in Computer Science and Operations Research from University of Montreal (Quebec) in 2009. Since 2009 he has been with theSynchromedia Laboratory for multimedia communication in telepresence where he follows his Ph.D. research in historical document images processing under thesupervision of the professor Mohamed Cheriet. The goal of his research is to improve and enhance the quality of the old document images, statistical image modeling, andmultispectral imaging.

Reza Farrahi Moghaddam received his B.Sc. degree in Electrical Engineering and his Ph.D. degree in Physics from the Shahid Bahonar University of Kerman, Iran, in 1995and 2003, respectively. Since 2003, he has been with Vali-Asr University of Rafsandjan, Iran. Since 2007, he has been a Postdoctoral Fellow with the SynchromediaLaboratory for Multimedia Communication in Telepresence, Ecole de Technologie Superieure (University of Quebec), Montreal, QC, Canada. His research interests includemathematical modeling for image processing and document image processing and understanding.

Mohamed Cheriet was born in Algiers (Algeria) in 1960. He received his B.Eng. from USTHB University (Algiers) in 1984 and his M.Sc. and Ph.D. degrees in ComputerScience from the University of Pierre et Marie Curie (Paris VI) in 1985 and 1988, respectively. Since 1992, he has been a professor in the Automation Engineeringdepartment at the Ecole de Technologie Superieure (University of Quebec), Montreal, and was appointed full professor there in 1998. He co-founded the Laboratory forImagery, Vision and Artificial Intelligence (LIVIA) at the University of Quebec, and was its director from 2000 to 2006. He also founded the SYNCHROMEDIA Consortium(Multimedia Communication in Telepresence) there, and has been its director since 1998. His interests include document image analysis, OCR, mathematical models forimage processing, pattern classification models and learning algorithms, as well as perception in computer vision. Dr. Cheriet has published more than 250 technicalpapers in the field, and has served as chair or co-chair of the following international conferences: VI’1998, VI’2000, IWFHR’2002, and ICFHR’2008. He currently serves onthe editorial board and is associate editor of several international journals: IJPRAI, IJDAR, and Pattern Recognition. He co-authored a book entitled, ‘‘Character RecognitionSystems: A guide for Students and Practitioners,’’ John Wiley and Sons, Spring 2007. Dr. Cheriet is a senior member of the IEEE and the chapter chair of IEEE MontrealComputational Intelligent Systems (CIS).