bl demo day - july2011 - (3) image enhancement for ocr

Post on 24-May-2015

3.795 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Slides from Niall Anderson's 1st presentation on Image Enhancement and OCR at the British Library Demo-day on the 12th July 2011.

TRANSCRIPT

Image Enhancement and OCR

Niall Anderson, The British Library, 12 July 2010

2

What is Image Enhancement?

Image enhancement is a suite of technical solutions to improve display or delivery of digital images – particularly text-based images

Main areas of improvement• Removing noise and other digital

artefacts• Geometric correction for skewed images• Geometric correction for warped pages

in paper original

3

Example of an enhanced image

Warped Dewarped

4

Why Image Enhancement?

To increase quality of image for display

To increase quality of image for printing (especially for Print On Demand)

To increase quality of Optical Character Recognition results

5

OCR and Image Enhancement

OCR will produce its best results on material with the following characteristics

• The layout of the text is simple, with no tables or illustrations;

• The text itself is in a modern, computer-generated typeface;

• The digital image preserves a high contrast between the text block and non-text detail (including blank space)

• The image has been created from a perfectly flat and straight scan (if a digital copy from an analogue source)

• The text of the analogue source is clear, well aligned and consistently presented

• The basic material of the analogue source is undamaged; the text is in a single language

• The image has been taken from the original physical source and not a degraded surrogate (such as microfilm)

6

IMPACT Image Enhancement toolkit

7

Types of image enhancement in toolkit

Binarisation

8

Types of image enhancement in toolkit

Border removal

9

Types of image enhancement in toolkit

Page splitting

10

Types of Image Enhancement in toolkit

Dewarping

11

Using the IMPACT Image Enhancement toolkit - 1

Select the directory with your images or copy your images to directory

12

Using the IMPACT Image Enhancement toolkit - 2

Select the directory for saving the results

13

Using the IMPACT Image Enhancement toolkit - 3

Select one or more document images

14

Using the IMPACT Image Enhancement toolkit - 4

Define a processing workflow

15

Using the IMPACT Image Enhancement toolkit - 5

Select the method for every processing module

16

Using the IMPACT Image Enhancement toolkit - 6

Execute workflow by pressing "Apply Processes"

17

Using the IMPACT Image Enhancement toolkit - 7

View results on the preview window or right click on any module at the workflow line and select "View Result".

18

Indicative results – Border Removal

22383 images to test border removal

BL: 7% BNE: 34%BNF: 34% BSB: 11%JSI: 6% NLB: 2%ONB: 6%

Only images with borders

38718 images to test border removal

BL: 9% BNE: 29%BNF: 32% BSB: 12%JSI: 11% NLB: 2%ONB: 5%

19

Indicative results – Page splitting

458 images from BNF to

test page split

3009 images to test page split

BL: 72% BSB: 10% JSI: 18%

20

Indicative results - Dewarping

IMPACT Page Curl Correction v.4

87.78%(81.98% only coarse correction)

BookRestorer

80.87%

21

Research and references

N. Stamatopoulos, B. Gatos, I. Pratikakis and S.J. Perantonis, Goal-oriented Rectification of Camera-Based Document Images, IEEE Transactions on Image Processing, vol. 20, no. 4, pp. 910-920, 2011.

N. Stamatopoulos, B. Gatos, T. Georgiou, Page frame detection for double page document images, 9th IAPR International Workshop on Document Analysis Systems (DAS 2010), pp. 401-408, Cambridge, MA, USA, June 2010

B. Gatos, I. Pratikakis and S. J. Perantonis, Adaptive Degraded Document Image Binarization, Pattern Recognition, Vol. 39, pp. 317-327, 2006

22

Questions?

top related