scribble based interactive page layout segmentation using ...majeek/publications/0981a013.pdf ·...

Scribble Based Interactive Page Layout Segmentation Using Gabor Filter

Majeed Kassis

Department of Computer ScienceBen-Gurion University of the Negev, Israel

[email protected]

Jihad El-Sana

Department of Computer ScienceBen-Gurion University of the Negev, Israel

[email protected]

Abstract—This paper presents an interactive approach forfast and accurate page layout segmentation. It is a scribble-based interactive segmentation approach, where the user drawsscribbles on the various regions and the system performspage layout segmentation. The user can correct and refine theresulting segmentation by drawing new scribbles. To classifythe various regions of the page, we apply a bank of Gaborfilters, in several orientations and multiple frequencies, tocapture the orientation, the stroke width, and size of the text.These properties also implicitly encode the writing style ofthe document. After combining the responses of the Gaborfilter into a feature matrix, we classify various regions of thedocument by applying graph cuts, while taking into accountthe user made scribbles. The presented approach is very fast,easy to use, robust to user interaction, and provides accurateresults.

Keywords-interactive system, page segmentation, gabor filter,scribble-based

I. INTRODUCTION

The study of the past attracts the interest of scholars and

ordinary people, as well. Historical documents are among

the main sources that shade a light on the structure and the

relations among the past societies. The advances in scanning,

storage, and communication have been aiding the creation

of digital copies of large fraction of these documents, which

enable the use of computers to simplify and accelerate the

process of fetching knowledge from these documents.

Page layout analysis, which aims to segment a page into

regions, is among the first basic steps applies to process his-

torical documents images. Page segmentation is often done

in a hierarchical manner, it first segment a page into main

text and marginal note regions, which are further segmented

into text and non-text. Text regions are then segmented

into paragraphs and text lines. The absence of well-defined

page layout complicates the development of efficient and

accurate algorithms for page layout analysis. Nevertheless,

page layout segmentation has been attracting the interest of

researchers and great body of work have been developed.

However these approaches fail to handle complex layout

of handwritten historical documents [1]. Currently, the fully

automated approach can not provide accurate enough seg-

mentation for many datasets. Thus, human interaction is

needed to produce high quality segmentation for pages with

complex layout. This observation led the development of

various semi-automatic approaches for image segmentation,

in general [2], [3], [4].

Scribble-based interactive segmentation approaches are

applied for image segmentation [5] and widely used in

image editing [6], [7], [8], [9], [10], [11]. Users specify

sparse scribbles, which define segments by propagating the

property of the selected pixels to other pixels in the image.

This can be seen as a certain soft segmentation.

Recently Garz et al. [12] proposed a semi-automatic user

assisted interactive system to support historical document

annotations. They binarize the document and connect the

resulting component to generate a graphs representation

which provides sparse page representation that guides the

selection of the regions determined by the drawn scribbles.

The proposed approach provides an elegant and easy way

to generate ground truth. However, their approach requires

intensive interaction to draw many small scribbles to further

improve the results toward quality ground truth. In addition,

the proposed graph representation does not encode the prop-

erties of the text, such as text line orientation and writing

style, which are essential to obtain coherent segmentation.

In this work we present an interactive approach for fast

and accurate page layout segmentation. It is a scribble-based

interactive segmentation approach, where the user draws

scribbles on the various regions of the page. According to

the position of these scribbles and the characteristics of the

marked pixels, the system performs page segmentation. The

user can correct and refine the resulting segmentation by

drawing new scribbles.

Gabor filters are particularly appropriate for capturing

texture. Since it has been shown that the filter’s smooth

terms of the Gaussian envelope plays a major role in texture

classification [13], the Gabor filter is used in many phases

of the processing of historical documents. It has been used

in document binarization [14], layout analysis [15], [16],

document classification [17], writer identification [18], and

even noise reduction [19].

To apply scribble-based interactive segmentation on doc-

ument images, we need to define a metric that separate

main text, marginal notes, and figures from each other. It

should take into account the orientation of the text lines,

the stroke width, the size of the font, and the writing style.

To achieve that we apply a bank of Gabor filters, which

2016 15th International Conference on Frontiers in Handwriting Recognition

2167-6445/16 $31.00 © 2016 IEEE

DOI 10.1109/ICFHR.2016.13

13

aim to capture the orientation, the stroke width, and size of

the characters. These properties also implicitly encode the

writing style of the document. The responses Gabor filters

bank are combined into a feature matrix, which is used to

guide the segmentation procedure.

To classify the different regions of the document, we

adopt a graph-cut algorithm [6], which has been used in

various algorithms for historical document analysis [20],

[21], [22]. We apply graph-cut on the feature matrix and,

as a result, obtain a segmentation of the document image

to background section and foreground section, which are

refined by applying additional scribbles.

In the upcoming sections, we overview the proposed

system, the generation of the feature matrix, and the use of

Graph cuts. Then we explain in detail the interactive system

features, mainly the uses of the scribbles and their effect on

the results. Then we illustrate the uses and provide usage

examples of the system. Finally we draw conclusions and

suggest directions for future work.

II. INTERACTIVE SEGMENTATION

Fully-automatic image segmentation algorithms provide

satisfactory result for many cases, but human interaction

is necessary to obtain high quality segmentation for chal-

lenging images. Scribble-based interactive segmentation ap-

proaches utilize foreground and optionally background scrib-

bles and classify the pixel into foreground and background.

These algorithms rely on the location, color, structure, and

texture of the pixels marked by the scribbles.

The information used, for typical images processing, to

guide the propagation of scribbled pixels properties to the

rest of the image is often not valid for document images.

For example, the color of pixels does not provide any

information to separate the main text from side notes and

considering text as texture is not straightforward procedure.

To apply the scribble-based interactive segmentation tech-

nique to document images, we need to define a metric that

has the ability to separate figure from text, side notes from

main text, and side notes from each other. Toward this goal

we apply a bank of Gabor filters and extract features that

capture these differences.

A. Gabor Filter Bank

We generate a bank of filters using two-dimensional

Gabor transform, which consists of a sinusoidal plane wave

of some frequency and orientation, modulated by a two-

dimensional Gaussian. The Gabor filter in the spatial domain

is given by the real component of the filter as shown in the

formula 1, where ψ is phase offset, λ is the wavelength of

the cosine factor, θ is the orientation of the Gabor function,

γ is standard deviation of the Gaussian, and σ is the spatial

aspect ratio of the Gabor function.

g(x, y) = exp(−x′2 + γ2y′2

2σ2)cos(2π

x′

λ+ ψ)

Where : x′ = xcos(θ) + ysin(θ)

y′ = ycos(θ)− xsin(θ)

(1)

For each θ, λ we apply Gabor filter on the given document

image, then we superimpose the filter responses for each θto combine them together.

B. Feature Matrix

A filter response is the result of applying the Gabor

transform for each orientation and wavelength of sinusoidal

factor to the original image. On each filter response we apply

a nonlinear sigmoidal function which saturates the output of

the result, and extract features as formulated in equation 2,

where α is a parameter, and t is the variable.

tanh(αt) =1− e−2αt

1 + e−2αt(2)

We then apply Gaussian smoothing using σ =

aπ−1 log2

2(2b+1)(2b−1)

λ on each extracted feature, where λ corre-

sponds the one used to generate the features. Feature matrix

examples can be seen in figure 2.

The next step is to incorporate this feature matrix into

a graph cut algorithm, while utilizing the drawn scribbles,

we can interactively segment the document to its different

regions.

III. INTERACTIVE SYSTEM

Upon loading a document image, we maintain two image

representations: One is the original image, and the other is

the feature matrix, which is generated using the Gabor filter

bank. The user applies two types of scribbles: foreground,

and background scribbles, at each iteration. The foreground

scribble defines a new segment, which is extracted from the

image, and the background scribble defines the remaining of

the image. The progress of the segmentation is maintained in

a tree structure. An interaction step is applied to a region, ri,that represents the node, ni, in the layout tree, and generates

two new child nodes for ni, one representing the foreground

region, and the other representing the background region.

The whole page is represented by the root of the tree, and

the leaves of the tree represent the final segmentation.

A. Scribble Processing

The pixels marked by the scribble define the properties

of the marked segment, and form the core pixels of the

segment. The features corresponding to these pixels are

extracted from the feature matrix. Our pixel classifications

scheme is based on GrabCut algorithm [7], with one major

modification. To apply image segmentation to foreground

and background regions, we use Gaussian Mixture Models,

14

Figure 1: Illustration of one application of graph-cut on an

image graph.

a practice successfully used in many image segmentation

approaches. We use two Gaussian mixture models. The first

GMM is used for the foreground region, and the second

GMM is used for the background region. We initialize the

algorithm’s parameters by using the full image as bound-

ing box, and estimate the GMMs of the foreground and

background pixels by first assigning to each pixel its most

likely component and then computing parameters for each

component using maximum likelihood estimation. We then

estimate for each pixel a new label using a min-cut max-flow

algorithm [23]. This iterative process is terminated once we

reach convergence of the GMMs.

DKL(F,B) =∑

i

filogfibi

+∑

i

bilogbifi

(3)

Instead of using an iterative procedure that alternatives

between estimation and parameter learning which stops at

the convergence of Gibbs energy as stated in [7], we termi-

nate as soon as the Gaussian mixture models of foreground

and background are maximally separated, at local level. This

termination criterion provides adequate results in images

with little variation in background values and relatively high

difference between foreground and background sections.

This criterion allows for faster termination in each interactive

step than the original algorithm. We terminate when we

reach a local maximum of the termination criterion, as stated

in equation 3, of the Gaussian mixture models, where F is

the normalized Gaussian mixture model probability matrix

for the foreground region, B is the normalized Gaussian

mixture model probability matrix for background region,

and DKL(F,B) is the symmetric measure of the discrete

version of Kullback Leibler divergence formula. Kullback

Leibler divergence is widely used tool in statistics and

pattern recognition, and using Kullback Leibler divergence

on two Gaussian mixture models is frequently used in the

fields of speech and image recognition. An illustration of

the process and the application of the graph cut can be seen

in figure 1.

IV. EXPERIMENTAL STUDY

We used our system on 38 documents that contain side-

nodes of different sizes and orientations. These images

were extracted from different books written by different

writers. Figure 3 and Figure 4 illustrate drawing a fore-

ground and background scribbles, respectively. Recall that

foreground and background scribbles determine foreground

and background regions. The user only needs to choose

the scribble type, and make a rough draw over the region

she wishes to remove or keep. Figure 5 presents a full

segmentation procedure, it shows consecutive iterations that

removes regions determined by the drawn scribbles. It also

shows the mask containing the segmented regions resulting

from each iteration in different colors.

We evaluate the performance of the system by measuring

time to complete an interactive round, for the foreground and

background scribbles – the time to propagate the scribbles

and complete the segmentation process in seconds. On

average it took less than 5 seconds to complete one iteration.

In comparison, GrabCut algorithm which uses Gibbs energy,

as the termination criterion, requires about 12 iterations

to converge. Adopting KL-divergence as the termination

criterion reduces the number of iterations per interactive step

down to 4, reaching local maximum.

FL(k) =1

4− 2k−

12

wi

FH(k) =1

4+

2k−12

wi

(4)

We also studied the appropriate values for the parameters

of the system. To generate the Gabor filter bank, the values

of λ are set according to equation 4 as recommended in [24],

where k = 1, 2, ..., wi

8 , and wi is the width of the image. We

also set ψ = 0, γ = 1. For θ, we used three orientations:

θ1 = π4 , θ2 = π

2 , and θ3 = 3π4 . This results in 14 different

values for λ and a total of 42 Gabor responses for a given

input image. We extract the required features as formulated

in equation 2, using α = 14 . Finally, we apply Gaussian

smoothing on the results, where a = 3, and b = 1 octave.

V. CONCLUSIONS AND FUTURE WORK

In this paper we presented an interactive approach for fast

and accurate page layout segmentation. The user is assisted

by two types of scribbles: background scribbles and fore-

ground scribbles. Background scribbles notifies the system

that the region should be removed, while the background

scribble notifies the system that the region should never

be removed and should stay in the foreground after each

iteration.

Once the user draws scribbles on the various regions

on the document image, the system performs page layout

segmentation. The user can correct and refine the resulting

segmentation by drawing new scribbles. We apply a bank

of Gabor filters, in several orientations and multiple fre-

quencies, which implicitly encode the writing style of the

15

(a) image (b) feature matrix of (a) (c) image (d) feature matrix of (c)

Figure 2: Each pair, the image and its corresponding Gabor feature matrix

(a) original image (b) initial segmentation (c) scribble as foreground (d) scribble result

Figure 3: Interactive process where the user decides a region is a foreground region.

(a) original image (b) initial segmentation (c) scribble as background (d) scribble result

Figure 4: Interactive process where the user decides a region is a background region.

16

(a) original image (b) feature matrix (c) segmentation illustration

(d) initial segmentation (e) first scribble results (f) second scribble results (g) third scribble results

Figure 5: Interactive process where the user decides a region is a background region, iteratively until all regions are segmented.

document. We combining the responses of the Gabor filter

into a feature matrix, and classify various regions of the

document by applying graph cuts, while taking into account

the user made scribbles. The presented approach is very fast,

easy to use, robust to user interaction, and provides accurate

results.

In the future, we plan to improve the segmentation res-

olution and explore scribble-based interactive approach to

segment text lines.

REFERENCES

[1] A. Antonacopoulos and A. C. Downton, “Special issue onthe analysis of historical documents,” International Journalon Document Analysis and Recognition, vol. 9, no. 2, pp.75–77, 2007.

[2] A. Blake, C. Rother, M. Brown, P. Perez, and P. Torr,“Interactive image segmentation using an adaptive gmmrfmodel,” in Computer Vision-ECCV 2004. Springer, 2004,pp. 428–441.

[3] A. Protiere and G. Sapiro, “Interactive image segmentationvia adaptive weighted distances,” IEEE Transactions on Im-age Processing, vol. 16, no. 4, pp. 1046–1057, April 2007.

[4] B. L. Price, B. Morse, and S. Cohen, “Geodesic graph cutfor interactive image segmentation,” in Computer Vision andPattern Recognition (CVPR), 2010 IEEE Conference on, June2010, pp. 3161–3168.

[5] Y. Boykov and G. Funka-Lea, “Graph cuts and efficientnd image segmentation,” International journal of computervision, vol. 70, no. 2, pp. 109–131, 2006.

[6] Y. Y. Boykov and M.-P. Jolly, “Interactive graph cuts foroptimal boundary & region segmentation of objects in ndimages,” in Computer Vision, 2001. ICCV 2001. Proceedings.Eighth IEEE International Conference on, vol. 1. IEEE,2001, pp. 105–112.

[7] C. Rother, V. Kolmogorov, and A. Blake, “Grabcut: Interac-tive foreground extraction using iterated graph cuts,” in ACMtransactions on graphics (TOG), vol. 23, no. 3. ACM, 2004,pp. 309–314.

17

[8] Y. Li, J. Sun, C.-K. Tang, and H.-Y. Shum, “Lazy snapping,”ACM Transactions on Graphics (ToG), vol. 23, no. 3, pp.303–308, 2004.

[9] X. Bai and G. Sapiro, “A geodesic framework for fastinteractive image and video segmentation and matting,” inComputer Vision, 2007. ICCV 2007. IEEE 11th InternationalConference on. IEEE, 2007, pp. 1–8.

[10] A. Criminisi, T. Sharp, and A. Blake, “Geos: Geodesic imagesegmentation,” in Computer Vision–ECCV 2008. Springer,2008, pp. 99–112.

[11] D. Batra, A. Kowdle, D. Parikh, J. Luo, and T. Chen, “icoseg:Interactive co-segmentation with intelligent scribble guid-ance,” in Computer Vision and Pattern Recognition (CVPR),2010 IEEE Conference on. IEEE, 2010, pp. 3169–3176.

[12] A. Garz, M. Seuret, F. Simistira, A. Fischer, and R. Ingold,“Creating ground truth for historical manuscripts with doc-ument graphs and scribbling interaction,” in Proc. 12th Int.Workshop on Document Analysis Systems, 2016, pp. 126–131.

[13] F. Bianconi and A. Fernandez, “Evaluation of the effectsof gabor filter parameters on texture classification,” PatternRecognition, vol. 40, no. 12, pp. 3325–3335, 2007.

[14] A. Sehad, Y. Chibani, and M. Cheriet, “Gabor filters fordegraded document image binarization,” in Frontiers in Hand-writing Recognition (ICFHR), 2014 14th International Con-ference on. IEEE, 2014, pp. 702–707.

[15] A. Asi, R. Cohen, K. Kedem, J. El-Sana, and I. Din-stein, “A coarse-to-fine approach for layout analysis of an-cient manuscripts,” in Frontiers in Handwriting Recognition(ICFHR), 2014 14th International Conference on. IEEE,2014, pp. 140–145.

[16] A. Asi, R. Cohen, K. Kedem, and J. El-Sana, “Simplifying thereading of historical manuscripts,” in Document Analysis andRecognition (ICDAR), 2015 13th International Conferenceon. IEEE, 2015, pp. 826–830.

[17] V. A. Pavlov and D. S. Shalymov, “Arabic handwrittentexts clusterization based on feature relation graph (frg),”in Document Analysis and Recognition (ICDAR), 2015 13thInternational Conference on. IEEE, 2015, pp. 941–945.

[18] B. Helli and M. E. Moghadam, “Persian writer identificationusing extended gabor filter,” in Image Analysis and Recogni-tion. Springer, 2008, pp. 579–586.

[19] V. Eglin, S. Bres, and C. Rivero, “Hermite and gabortransforms for noise reduction and handwriting classificationin ancient manuscripts,” International Journal of DocumentAnalysis and Recognition (IJDAR), vol. 9, no. 2-4, pp. 101–122, 2007.

[20] F. Wahlberg and A. Brun, “Graph based line segmentationon cluttered handwritten manuscripts,” in Pattern Recognition(ICPR), 2012 21st International Conference on. IEEE, 2012,pp. 1570–1573.

[21] J. Chen and D. Lopresti, “Model-based tabular structuredetection and recognition in noisy handwritten documents,”in Frontiers in Handwriting Recognition (ICFHR), 2012 In-ternational Conference on. IEEE, 2012, pp. 75–80.

[22] R. Cohen, A. Asi, K. Kedem, J. El-Sana, and I. Dinstein,“Robust text and drawing segmentation algorithm for his-torical documents,” in Proceedings of the 2nd InternationalWorkshop on Historical Document Imaging and Processing.ACM, 2013, pp. 110–117.

[23] Y. Boykov and V. Kolmogorov, “An experimental comparisonof min-cut/max-flow algorithms for energy minimization invision,” Pattern Analysis and Machine Intelligence, IEEETransactions on, vol. 26, no. 9, pp. 1124–1137, 2004.

[24] J. Zhang, T. Tan, and L. Ma, “Invariant texture segmentationvia circular gabor filters,” in Pattern Recognition, 2002.Proceedings. 16th International Conference on, vol. 2. IEEE,2002, pp. 901–904.

18

scribble based interactive page layout segmentation using ...majeek/publications/0981a013.pdf ·...

Documents