scribble based interactive page layout segmentation using ...majeek/publications/0981a013.pdf ·...
TRANSCRIPT
Scribble Based Interactive Page Layout Segmentation Using Gabor Filter
Majeed Kassis
Department of Computer ScienceBen-Gurion University of the Negev, Israel
Jihad El-Sana
Department of Computer ScienceBen-Gurion University of the Negev, Israel
Abstract—This paper presents an interactive approach forfast and accurate page layout segmentation. It is a scribble-based interactive segmentation approach, where the user drawsscribbles on the various regions and the system performspage layout segmentation. The user can correct and refine theresulting segmentation by drawing new scribbles. To classifythe various regions of the page, we apply a bank of Gaborfilters, in several orientations and multiple frequencies, tocapture the orientation, the stroke width, and size of the text.These properties also implicitly encode the writing style ofthe document. After combining the responses of the Gaborfilter into a feature matrix, we classify various regions of thedocument by applying graph cuts, while taking into accountthe user made scribbles. The presented approach is very fast,easy to use, robust to user interaction, and provides accurateresults.
Keywords-interactive system, page segmentation, gabor filter,scribble-based
I. INTRODUCTION
The study of the past attracts the interest of scholars and
ordinary people, as well. Historical documents are among
the main sources that shade a light on the structure and the
relations among the past societies. The advances in scanning,
storage, and communication have been aiding the creation
of digital copies of large fraction of these documents, which
enable the use of computers to simplify and accelerate the
process of fetching knowledge from these documents.
Page layout analysis, which aims to segment a page into
regions, is among the first basic steps applies to process his-
torical documents images. Page segmentation is often done
in a hierarchical manner, it first segment a page into main
text and marginal note regions, which are further segmented
into text and non-text. Text regions are then segmented
into paragraphs and text lines. The absence of well-defined
page layout complicates the development of efficient and
accurate algorithms for page layout analysis. Nevertheless,
page layout segmentation has been attracting the interest of
researchers and great body of work have been developed.
However these approaches fail to handle complex layout
of handwritten historical documents [1]. Currently, the fully
automated approach can not provide accurate enough seg-
mentation for many datasets. Thus, human interaction is
needed to produce high quality segmentation for pages with
complex layout. This observation led the development of
various semi-automatic approaches for image segmentation,
in general [2], [3], [4].
Scribble-based interactive segmentation approaches are
applied for image segmentation [5] and widely used in
image editing [6], [7], [8], [9], [10], [11]. Users specify
sparse scribbles, which define segments by propagating the
property of the selected pixels to other pixels in the image.
This can be seen as a certain soft segmentation.
Recently Garz et al. [12] proposed a semi-automatic user
assisted interactive system to support historical document
annotations. They binarize the document and connect the
resulting component to generate a graphs representation
which provides sparse page representation that guides the
selection of the regions determined by the drawn scribbles.
The proposed approach provides an elegant and easy way
to generate ground truth. However, their approach requires
intensive interaction to draw many small scribbles to further
improve the results toward quality ground truth. In addition,
the proposed graph representation does not encode the prop-
erties of the text, such as text line orientation and writing
style, which are essential to obtain coherent segmentation.
In this work we present an interactive approach for fast
and accurate page layout segmentation. It is a scribble-based
interactive segmentation approach, where the user draws
scribbles on the various regions of the page. According to
the position of these scribbles and the characteristics of the
marked pixels, the system performs page segmentation. The
user can correct and refine the resulting segmentation by
drawing new scribbles.
Gabor filters are particularly appropriate for capturing
texture. Since it has been shown that the filter’s smooth
terms of the Gaussian envelope plays a major role in texture
classification [13], the Gabor filter is used in many phases
of the processing of historical documents. It has been used
in document binarization [14], layout analysis [15], [16],
document classification [17], writer identification [18], and
even noise reduction [19].
To apply scribble-based interactive segmentation on doc-
ument images, we need to define a metric that separate
main text, marginal notes, and figures from each other. It
should take into account the orientation of the text lines,
the stroke width, the size of the font, and the writing style.
To achieve that we apply a bank of Gabor filters, which
2016 15th International Conference on Frontiers in Handwriting Recognition
2167-6445/16 $31.00 © 2016 IEEE
DOI 10.1109/ICFHR.2016.13
13
aim to capture the orientation, the stroke width, and size of
the characters. These properties also implicitly encode the
writing style of the document. The responses Gabor filters
bank are combined into a feature matrix, which is used to
guide the segmentation procedure.
To classify the different regions of the document, we
adopt a graph-cut algorithm [6], which has been used in
various algorithms for historical document analysis [20],
[21], [22]. We apply graph-cut on the feature matrix and,
as a result, obtain a segmentation of the document image
to background section and foreground section, which are
refined by applying additional scribbles.
In the upcoming sections, we overview the proposed
system, the generation of the feature matrix, and the use of
Graph cuts. Then we explain in detail the interactive system
features, mainly the uses of the scribbles and their effect on
the results. Then we illustrate the uses and provide usage
examples of the system. Finally we draw conclusions and
suggest directions for future work.
II. INTERACTIVE SEGMENTATION
Fully-automatic image segmentation algorithms provide
satisfactory result for many cases, but human interaction
is necessary to obtain high quality segmentation for chal-
lenging images. Scribble-based interactive segmentation ap-
proaches utilize foreground and optionally background scrib-
bles and classify the pixel into foreground and background.
These algorithms rely on the location, color, structure, and
texture of the pixels marked by the scribbles.
The information used, for typical images processing, to
guide the propagation of scribbled pixels properties to the
rest of the image is often not valid for document images.
For example, the color of pixels does not provide any
information to separate the main text from side notes and
considering text as texture is not straightforward procedure.
To apply the scribble-based interactive segmentation tech-
nique to document images, we need to define a metric that
has the ability to separate figure from text, side notes from
main text, and side notes from each other. Toward this goal
we apply a bank of Gabor filters and extract features that
capture these differences.
A. Gabor Filter Bank
We generate a bank of filters using two-dimensional
Gabor transform, which consists of a sinusoidal plane wave
of some frequency and orientation, modulated by a two-
dimensional Gaussian. The Gabor filter in the spatial domain
is given by the real component of the filter as shown in the
formula 1, where ψ is phase offset, λ is the wavelength of
the cosine factor, θ is the orientation of the Gabor function,
γ is standard deviation of the Gaussian, and σ is the spatial
aspect ratio of the Gabor function.
g(x, y) = exp(−x′2 + γ2y′2
2σ2)cos(2π
x′
λ+ ψ)
Where : x′ = xcos(θ) + ysin(θ)
y′ = ycos(θ)− xsin(θ)
(1)
For each θ, λ we apply Gabor filter on the given document
image, then we superimpose the filter responses for each θto combine them together.
B. Feature Matrix
A filter response is the result of applying the Gabor
transform for each orientation and wavelength of sinusoidal
factor to the original image. On each filter response we apply
a nonlinear sigmoidal function which saturates the output of
the result, and extract features as formulated in equation 2,
where α is a parameter, and t is the variable.
tanh(αt) =1− e−2αt
1 + e−2αt(2)
We then apply Gaussian smoothing using σ =
aπ−1 log2
2(2b+1)(2b−1)
λ on each extracted feature, where λ corre-
sponds the one used to generate the features. Feature matrix
examples can be seen in figure 2.
The next step is to incorporate this feature matrix into
a graph cut algorithm, while utilizing the drawn scribbles,
we can interactively segment the document to its different
regions.
III. INTERACTIVE SYSTEM
Upon loading a document image, we maintain two image
representations: One is the original image, and the other is
the feature matrix, which is generated using the Gabor filter
bank. The user applies two types of scribbles: foreground,
and background scribbles, at each iteration. The foreground
scribble defines a new segment, which is extracted from the
image, and the background scribble defines the remaining of
the image. The progress of the segmentation is maintained in
a tree structure. An interaction step is applied to a region, ri,that represents the node, ni, in the layout tree, and generates
two new child nodes for ni, one representing the foreground
region, and the other representing the background region.
The whole page is represented by the root of the tree, and
the leaves of the tree represent the final segmentation.
A. Scribble Processing
The pixels marked by the scribble define the properties
of the marked segment, and form the core pixels of the
segment. The features corresponding to these pixels are
extracted from the feature matrix. Our pixel classifications
scheme is based on GrabCut algorithm [7], with one major
modification. To apply image segmentation to foreground
and background regions, we use Gaussian Mixture Models,
14
Figure 1: Illustration of one application of graph-cut on an
image graph.
a practice successfully used in many image segmentation
approaches. We use two Gaussian mixture models. The first
GMM is used for the foreground region, and the second
GMM is used for the background region. We initialize the
algorithm’s parameters by using the full image as bound-
ing box, and estimate the GMMs of the foreground and
background pixels by first assigning to each pixel its most
likely component and then computing parameters for each
component using maximum likelihood estimation. We then
estimate for each pixel a new label using a min-cut max-flow
algorithm [23]. This iterative process is terminated once we
reach convergence of the GMMs.
DKL(F,B) =∑
i
filogfibi
+∑
i
bilogbifi
(3)
Instead of using an iterative procedure that alternatives
between estimation and parameter learning which stops at
the convergence of Gibbs energy as stated in [7], we termi-
nate as soon as the Gaussian mixture models of foreground
and background are maximally separated, at local level. This
termination criterion provides adequate results in images
with little variation in background values and relatively high
difference between foreground and background sections.
This criterion allows for faster termination in each interactive
step than the original algorithm. We terminate when we
reach a local maximum of the termination criterion, as stated
in equation 3, of the Gaussian mixture models, where F is
the normalized Gaussian mixture model probability matrix
for the foreground region, B is the normalized Gaussian
mixture model probability matrix for background region,
and DKL(F,B) is the symmetric measure of the discrete
version of Kullback Leibler divergence formula. Kullback
Leibler divergence is widely used tool in statistics and
pattern recognition, and using Kullback Leibler divergence
on two Gaussian mixture models is frequently used in the
fields of speech and image recognition. An illustration of
the process and the application of the graph cut can be seen
in figure 1.
IV. EXPERIMENTAL STUDY
We used our system on 38 documents that contain side-
nodes of different sizes and orientations. These images
were extracted from different books written by different
writers. Figure 3 and Figure 4 illustrate drawing a fore-
ground and background scribbles, respectively. Recall that
foreground and background scribbles determine foreground
and background regions. The user only needs to choose
the scribble type, and make a rough draw over the region
she wishes to remove or keep. Figure 5 presents a full
segmentation procedure, it shows consecutive iterations that
removes regions determined by the drawn scribbles. It also
shows the mask containing the segmented regions resulting
from each iteration in different colors.
We evaluate the performance of the system by measuring
time to complete an interactive round, for the foreground and
background scribbles – the time to propagate the scribbles
and complete the segmentation process in seconds. On
average it took less than 5 seconds to complete one iteration.
In comparison, GrabCut algorithm which uses Gibbs energy,
as the termination criterion, requires about 12 iterations
to converge. Adopting KL-divergence as the termination
criterion reduces the number of iterations per interactive step
down to 4, reaching local maximum.
FL(k) =1
4− 2k−
12
wi
FH(k) =1
4+
2k−12
wi
(4)
We also studied the appropriate values for the parameters
of the system. To generate the Gabor filter bank, the values
of λ are set according to equation 4 as recommended in [24],
where k = 1, 2, ..., wi
8 , and wi is the width of the image. We
also set ψ = 0, γ = 1. For θ, we used three orientations:
θ1 = π4 , θ2 = π
2 , and θ3 = 3π4 . This results in 14 different
values for λ and a total of 42 Gabor responses for a given
input image. We extract the required features as formulated
in equation 2, using α = 14 . Finally, we apply Gaussian
smoothing on the results, where a = 3, and b = 1 octave.
V. CONCLUSIONS AND FUTURE WORK
In this paper we presented an interactive approach for fast
and accurate page layout segmentation. The user is assisted
by two types of scribbles: background scribbles and fore-
ground scribbles. Background scribbles notifies the system
that the region should be removed, while the background
scribble notifies the system that the region should never
be removed and should stay in the foreground after each
iteration.
Once the user draws scribbles on the various regions
on the document image, the system performs page layout
segmentation. The user can correct and refine the resulting
segmentation by drawing new scribbles. We apply a bank
of Gabor filters, in several orientations and multiple fre-
quencies, which implicitly encode the writing style of the
15
(a) image (b) feature matrix of (a) (c) image (d) feature matrix of (c)
Figure 2: Each pair, the image and its corresponding Gabor feature matrix
(a) original image (b) initial segmentation (c) scribble as foreground (d) scribble result
Figure 3: Interactive process where the user decides a region is a foreground region.
(a) original image (b) initial segmentation (c) scribble as background (d) scribble result
Figure 4: Interactive process where the user decides a region is a background region.
16
(a) original image (b) feature matrix (c) segmentation illustration
(d) initial segmentation (e) first scribble results (f) second scribble results (g) third scribble results
Figure 5: Interactive process where the user decides a region is a background region, iteratively until all regions are segmented.
document. We combining the responses of the Gabor filter
into a feature matrix, and classify various regions of the
document by applying graph cuts, while taking into account
the user made scribbles. The presented approach is very fast,
easy to use, robust to user interaction, and provides accurate
results.
In the future, we plan to improve the segmentation res-
olution and explore scribble-based interactive approach to
segment text lines.
REFERENCES
[1] A. Antonacopoulos and A. C. Downton, “Special issue onthe analysis of historical documents,” International Journalon Document Analysis and Recognition, vol. 9, no. 2, pp.75–77, 2007.
[2] A. Blake, C. Rother, M. Brown, P. Perez, and P. Torr,“Interactive image segmentation using an adaptive gmmrfmodel,” in Computer Vision-ECCV 2004. Springer, 2004,pp. 428–441.
[3] A. Protiere and G. Sapiro, “Interactive image segmentationvia adaptive weighted distances,” IEEE Transactions on Im-age Processing, vol. 16, no. 4, pp. 1046–1057, April 2007.
[4] B. L. Price, B. Morse, and S. Cohen, “Geodesic graph cutfor interactive image segmentation,” in Computer Vision andPattern Recognition (CVPR), 2010 IEEE Conference on, June2010, pp. 3161–3168.
[5] Y. Boykov and G. Funka-Lea, “Graph cuts and efficientnd image segmentation,” International journal of computervision, vol. 70, no. 2, pp. 109–131, 2006.
[6] Y. Y. Boykov and M.-P. Jolly, “Interactive graph cuts foroptimal boundary & region segmentation of objects in ndimages,” in Computer Vision, 2001. ICCV 2001. Proceedings.Eighth IEEE International Conference on, vol. 1. IEEE,2001, pp. 105–112.
[7] C. Rother, V. Kolmogorov, and A. Blake, “Grabcut: Interac-tive foreground extraction using iterated graph cuts,” in ACMtransactions on graphics (TOG), vol. 23, no. 3. ACM, 2004,pp. 309–314.
17
[8] Y. Li, J. Sun, C.-K. Tang, and H.-Y. Shum, “Lazy snapping,”ACM Transactions on Graphics (ToG), vol. 23, no. 3, pp.303–308, 2004.
[9] X. Bai and G. Sapiro, “A geodesic framework for fastinteractive image and video segmentation and matting,” inComputer Vision, 2007. ICCV 2007. IEEE 11th InternationalConference on. IEEE, 2007, pp. 1–8.
[10] A. Criminisi, T. Sharp, and A. Blake, “Geos: Geodesic imagesegmentation,” in Computer Vision–ECCV 2008. Springer,2008, pp. 99–112.
[11] D. Batra, A. Kowdle, D. Parikh, J. Luo, and T. Chen, “icoseg:Interactive co-segmentation with intelligent scribble guid-ance,” in Computer Vision and Pattern Recognition (CVPR),2010 IEEE Conference on. IEEE, 2010, pp. 3169–3176.
[12] A. Garz, M. Seuret, F. Simistira, A. Fischer, and R. Ingold,“Creating ground truth for historical manuscripts with doc-ument graphs and scribbling interaction,” in Proc. 12th Int.Workshop on Document Analysis Systems, 2016, pp. 126–131.
[13] F. Bianconi and A. Fernandez, “Evaluation of the effectsof gabor filter parameters on texture classification,” PatternRecognition, vol. 40, no. 12, pp. 3325–3335, 2007.
[14] A. Sehad, Y. Chibani, and M. Cheriet, “Gabor filters fordegraded document image binarization,” in Frontiers in Hand-writing Recognition (ICFHR), 2014 14th International Con-ference on. IEEE, 2014, pp. 702–707.
[15] A. Asi, R. Cohen, K. Kedem, J. El-Sana, and I. Din-stein, “A coarse-to-fine approach for layout analysis of an-cient manuscripts,” in Frontiers in Handwriting Recognition(ICFHR), 2014 14th International Conference on. IEEE,2014, pp. 140–145.
[16] A. Asi, R. Cohen, K. Kedem, and J. El-Sana, “Simplifying thereading of historical manuscripts,” in Document Analysis andRecognition (ICDAR), 2015 13th International Conferenceon. IEEE, 2015, pp. 826–830.
[17] V. A. Pavlov and D. S. Shalymov, “Arabic handwrittentexts clusterization based on feature relation graph (frg),”in Document Analysis and Recognition (ICDAR), 2015 13thInternational Conference on. IEEE, 2015, pp. 941–945.
[18] B. Helli and M. E. Moghadam, “Persian writer identificationusing extended gabor filter,” in Image Analysis and Recogni-tion. Springer, 2008, pp. 579–586.
[19] V. Eglin, S. Bres, and C. Rivero, “Hermite and gabortransforms for noise reduction and handwriting classificationin ancient manuscripts,” International Journal of DocumentAnalysis and Recognition (IJDAR), vol. 9, no. 2-4, pp. 101–122, 2007.
[20] F. Wahlberg and A. Brun, “Graph based line segmentationon cluttered handwritten manuscripts,” in Pattern Recognition(ICPR), 2012 21st International Conference on. IEEE, 2012,pp. 1570–1573.
[21] J. Chen and D. Lopresti, “Model-based tabular structuredetection and recognition in noisy handwritten documents,”in Frontiers in Handwriting Recognition (ICFHR), 2012 In-ternational Conference on. IEEE, 2012, pp. 75–80.
[22] R. Cohen, A. Asi, K. Kedem, J. El-Sana, and I. Dinstein,“Robust text and drawing segmentation algorithm for his-torical documents,” in Proceedings of the 2nd InternationalWorkshop on Historical Document Imaging and Processing.ACM, 2013, pp. 110–117.
[23] Y. Boykov and V. Kolmogorov, “An experimental comparisonof min-cut/max-flow algorithms for energy minimization invision,” Pattern Analysis and Machine Intelligence, IEEETransactions on, vol. 26, no. 9, pp. 1124–1137, 2004.
[24] J. Zhang, T. Tan, and L. Ma, “Invariant texture segmentationvia circular gabor filters,” in Pattern Recognition, 2002.Proceedings. 16th International Conference on, vol. 2. IEEE,2002, pp. 901–904.
18