text extraction from natural scene image, a survey

Text extraction from natural

scene image: A surveyHonggang Zhang, Kaili Zhao, Yi-Zhe Song, Jun Guo

Neurocomputing 122 (2013)

Natural images everywhere

We want to detect text from natural images

Overview

Input Images Pre-processingText Detection

& Localization

Detect text locations and boundary boxes

Overview

Text Enhancement

& Segmentation

Text

Recognition

(OCR)

Text

Text regions - low-resolution & noise

Segment text from the background

Text detection & localization

a. Edge based methods

b. Texture based methods

c. Connected Component(CC)-based methods

d. Stroke based methods

e. Others

Edge based text detection

Idea : Scene texts are designed to be easily read, thus have strong edges

Methods : Edge detector (e.g. Canny operator) and binarization method are

used to extract text and to eliminate non-text regions

+ Efficient and simple !

- Sensitive to the influence of shadow or highlight

N. Ezaki, M. Bulacu, and L. Schomaker, “Text detection from natural scene images: Towards a system for

visually impaired persons,” in Int. Conf. on Pattern Recognition, Cambridge, UK, Aug. 2004, pp. 683–686

Texture based text detection

Idea : Find distinct textural properties from non-text regions(background)

Methods : Gaussian filtering, Histogram of oriented gradients (HOG), Wavelet decomposition, Fourier transform, Discrete Cosine Transform (DCT), Local Binary Pattern (LBP)

Extract features over a certain region

Identify the existence of text by classifier

+ Can detect and localize texts

accurately even from noisy images

- Relatively slow, sensitive to

text alignment & orientation

Some advanced techniques:

Coars-to-fine strategy -> fast

Local Haar Binary Pattern (LHBP) –> preserve & uniform inconsistent text-background contrasts

(a) input image (640 480) (b) texture classification result

Kim, Kwang In, Keechul Jung, and Jin Hyung Kim. "Texture-based approach for text detection in images using support vector machines and

continuously adaptive mean shift algorithm." Pattern Analysis and Machine Intelligence, IEEE Transactions on 25.12 (2003): 1631-1639.

Connected component-based text detection

Idea : Segment candidate text components by edge detection or color clustering, and prune non-text components with classifiers

Methods :

group small components into successively larger components until all regions are identified in the image (bottom-up approach)

Identify text components and group them to localize text regions

Block adjacency graph(BAG) - connected component extraction

Priority Adaptive segmentation(PAS) – character segmentation

+ low computation cost, can be directly used for text recognition

- Cannot segment accurately without prior knowledge (text position, scale)

- Designing fast and reliable connected component analyzer is difficult due to many confusing non-text regions

Stroke based text detection

Idea : Text = a combination of stroke components

Methods :

1) By segmentation, text stroke candidates are extracted

(Gabor filter, Stroke Width Transform(SWT))

1) verification by feature extraction and classification

2) grouping by clustering

+ provide robust and nearly constant stroke features

(e.g. stroke width)

+ Intuitive & simple, therefore easy to implement

- complex backgrounds can be problem

Text tends to maintain fixed stroke width

Epshtein, Boris, Eyal Ofek, and Yonatan Wexler. "Detecting text in natural scenes with stroke width

transform." Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010.

An example of SWT based text detection

Others

1. Some hybrid approaches to deal with many

variations in text

2. Detect texts of arbitrary orientations with

rotation-invariant features based on SWT

3. Color reduction method: reduce the total

number of colors in each RGB components

4. Small letter detection in images, Limited to

some standard font sizes (remove less than 10

pixels) …

Yao, Cong, et al. "Detecting texts of arbitrary orientations in natural images."Computer

Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.

Kumar, Manoj, Young Chul Kim, and Guee Sang Lee. "Text detection using multilayer separation in real scene

images." Computer and Information Technology (CIT), 2010 IEEE 10th International Conference on. IEEE, 2010.

Text enhancement & segmentation

Tranditional OCR software are suffering from natural scene, low resolution

images

Enhancing and segmenting text with complex background (noisy images)

Many advanced binarization algorithm for text enhancement is proposed

ex) Transform the gray level of each pixel to the new domain

(a) Badly illuminated document images (b) binarization

Valizadeh, M., et al. "A novel hybrid algorithm for binarization of badly illuminated document

images." Computer Conference, 2009. CSICC 2009. 14th International CSI. IEEE, 2009.

Further survey - OCR with deep learning OCR with Convolutional Neural Network(CNN) on some challenging images

8 dataset from sports video, google street view, google image search, natural scene

images, news image) – total 9 million images (900k validation set)

Outperform existing state-of-the-art approaches (90~98% accuracy)

Ex) BBC news text search

Jaderberg, Max, et al. "Reading Text in the Wild with Convolutional Neural Networks." arXiv preprint arXiv:1412.1842 (2014).

Result sample

Many word bounding box proposals Reduce FP by random forest classifier

http://zeus.robots.ox.ac.uk/textsearch/#/search/

Public datasetA. 2003/2005 ICDAR Text Localization Contest trail

test database

251 images, ground truth of the word bounding boxes

Most widely used database

- Most of the texts are horizontal.

- All the texts are in English

B. KAIST Scene Text Database

3000 images in different environments (outdoors, indoors, under different lighting conditions)

Captured either by high-resolution camera or low-resolution mobile phone camera

Scene texts are in Korean, English, and mixed language

C. The Street View Text (SVT) dataset

Google street view images

High variation, low resolution

D. NEOCR (Natural Environment OCR dataset)

659 real world images with 5238 annotated bounding boxes

A

B

C

D

Applications

Google Goggles : translate the world into text information

Baidu translation

Thank you !

Q & A

text extraction from natural scene image, a survey

Data & Analytics