compression of the image adolf knoll national library of the czech republic

43
Compression of the image Adolf Knoll National Library of the Czech Republic

Upload: cory-mccoy

Post on 28-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Compression of the image Adolf Knoll National Library of the Czech Republic

Compression of the image

Adolf Knoll

National Library of the Czech Republic

Page 2: Compression of the image Adolf Knoll National Library of the Czech Republic

General schemes for application of compression

The schemes adapt to the character of the represented objects:

Bitonal image (1-bit, black-and-white) Colour photorealistic image Mixed document (two above-mentioned

components)

Page 3: Compression of the image Adolf Knoll National Library of the Czech Republic
Page 4: Compression of the image Adolf Knoll National Library of the Czech Republic
Page 5: Compression of the image Adolf Knoll National Library of the Czech Republic
Page 6: Compression of the image Adolf Knoll National Library of the Czech Republic

Trends

Bitonal from CCITT Gr. Fax 3 and 4 to JBIG variants

Photorealistic Lossless compression: PNG, TIFF/LZW Lossy: from JPEG DCT to wavelet

Mixed document Both applied (Mixed Raster Content –

usually vertically)

Page 7: Compression of the image Adolf Knoll National Library of the Czech Republic

How is it built into formats?

Trying to have it in ISO TIFF (even JPEG, LZW, or PNG) – but it is not enough due to lack of tools for conversion and display.

That is why the other more suitable formats are used: JPEG, PNG

That is why there is a lot of development in the area of mixed formats – they do not aim to become ISO

Page 8: Compression of the image Adolf Knoll National Library of the Czech Republic

Relevant directions

Bitonal image JBIG2 (ISO) – no support (exc. Xerox), but

many similar activities Photorealistic image

wavelet JPEG2000 and many other non-ISO initiatives (WI, LWF, IW44, SID, Imagepower IW, …)

Mixed content DjVu, LDF, Imagepower MRC

Page 9: Compression of the image Adolf Knoll National Library of the Czech Republic

Aims

Image Archiving standardized

archival format (TIFF, JPEG, PNG, …)

Image Delivery More efficient

modern format (JB2, MrSID, DjVu, LDF, …)

Which relationship will be between both of them?It will be defined by the goal of the project.

Page 10: Compression of the image Adolf Knoll National Library of the Czech Republic

Around compression

Pre-processing of the image Compression Encoding in a format De-coding from the format De-compression Display – print-out

Page 11: Compression of the image Adolf Knoll National Library of the Czech Republic

Pre-processing of the bitonal image - I

Efficient schemes are built on possibilities to apply vocabularies of pixel chunks/groups: E.g. a text is an image that can be interpreted as

several dozens of images of letters, while the repeated occurrence of each letter can be represented by its coordinates (x,y) and reference to a dictionary in which there is only one representation of similar letters (digitized only once as a bitmap)

This method is called PATTERN MATCHING, but…

Page 12: Compression of the image Adolf Knoll National Library of the Czech Republic

Pre-processing of the bitonal image - II

However, scanned texts have a lot of information noise in individual pixel chunks representing, for instance, letters in text

Therefore, it is convenient to reduce differences between identically indentifiable chunks smoothing pixel flipping noise removal

Page 13: Compression of the image Adolf Knoll National Library of the Czech Republic

Smoothing and pixel flipping

Page 14: Compression of the image Adolf Knoll National Library of the Czech Republic

Problems in pattern matching

Česká republika

Low quality original and/or scan + inappropriate processing

Page 15: Compression of the image Adolf Knoll National Library of the Czech Republic

Soft pattern matching

Better work with dictionaries; replacement only there, where the threshold value of the pixel chunk is satisfied

If not, the whole small bitmap is stored Tuning of these mechanisms is a key

to successful application of the lossy compression of a bitonal image.

Page 16: Compression of the image Adolf Knoll National Library of the Czech Republic

How to know…

Libraries have documents of various qualities- also very bad

These documents are more difficult to process than good samples presented by software producers

Tests… tests… tests… on typical materials

Page 17: Compression of the image Adolf Knoll National Library of the Czech Republic

Bitonal compression

Lossless (LZW, PNG, …, CCITT Fax Group 3 a 4, JB2, JBIG, JBIG2, Algo Vision/Luratech (1-bit LDF component)

Lossy modern schemes: AT&T (Lizardtech) (JB2) – soft pattern

matching ImagePower Inc. JBIG2 (JB2) – only pattern

matching Summus Inc. (Lightning Strike), ...

Page 18: Compression of the image Adolf Knoll National Library of the Czech Republic

GIF would beslightly worsethan PNG

Page 19: Compression of the image Adolf Knoll National Library of the Czech Republic

Květy české – 19th century Czech journal

Page 20: Compression of the image Adolf Knoll National Library of the Czech Republic
Page 21: Compression of the image Adolf Knoll National Library of the Czech Republic

Impact of the quality of digitized originals on performance of compression schemes

Page 22: Compression of the image Adolf Knoll National Library of the Czech Republic

JB2

Most efficient compression schemes JB2 from the DjVu format (AT&T).

It enables compression: lossless lossy aggressive – while preserving high

quality

Page 23: Compression of the image Adolf Knoll National Library of the Czech Republic

JB2 as a component part of the DjVu format

More files can be merged and saved into one (as PDF) – they have the common dictionary so that together their size will be smaller than the sum of all individual files

More files can be virtually joined (they are called one after another from the server)

More advantages: display, references, OCR, … (DjVu plug-in)

Expensive or free software for Linux or Solaris

Page 24: Compression of the image Adolf Knoll National Library of the Czech Republic

Samples and résumé

Monitor and test new approaches for image processing

They can be very suitable for document delivery services Image servers Scanned content CLICK!!!

Page 25: Compression of the image Adolf Knoll National Library of the Czech Republic

Which formats to use for bitonal image?

If you have no special tools: GIF

If you wish smaller files, use PNG Both are recommended for WWW However, TIFF/CCITT Fax Gr. 4 is

better Use DjVu, if you wish very small files

Page 26: Compression of the image Adolf Knoll National Library of the Czech Republic

Problems

Good image editing software does not support TIFF with Gr. 4 encoding

Display possible within normal Windows tools

GIF and PNG support also higher brightness resolution (8-bit / 24-bit) – take care not to save bi-level image in higher image depth

DjVu – necessary to solve authoring software problem

Page 27: Compression of the image Adolf Knoll National Library of the Czech Republic

Lossy compression – bitonal image

Page 28: Compression of the image Adolf Knoll National Library of the Czech Republic

Compression of colour images

Lossless LZW

GIF (8-bit only) TIFF (5.0)

PNG Wavelet

JPEG2000 (JP2) …

Lossy DCT (JPEG) Fractals Wavelet

IW44 LWF, WI JPEG2000 (JP2) MrSID, …

Classical (LZW, RLE, DCT) versus wavelet approaches.

Page 29: Compression of the image Adolf Knoll National Library of the Czech Republic
Page 30: Compression of the image Adolf Knoll National Library of the Czech Republic

True colour image

DCT

wavelet

Page 31: Compression of the image Adolf Knoll National Library of the Czech Republic

Testing compression efficiency

Sample Reference Full-colour (JPEG, wavelet) 1-bit (establish tresholds – Paint Shop

Pro, LuraWave) MRC (same sample – DjVu Solo)

Page 32: Compression of the image Adolf Knoll National Library of the Czech Republic

Compression efficiency – bitonal image

Page 33: Compression of the image Adolf Knoll National Library of the Czech Republic

Compression efficiency

True colour

Page 34: Compression of the image Adolf Knoll National Library of the Czech Republic

How to apply compression?

It depends on the character of objects in the image: Photorealistic image (JPEG, wavelet) Text and simple blac-and-white graphics (Fax

Group 4, JB2, …) Colour graphics (problem to compress with losses

– better lossless PNG or GIF – application area of vector graphics - SVG)

Mixed content (composed solutions: DjVu, LDF, …)

Page 35: Compression of the image Adolf Knoll National Library of the Czech Republic

The most efficient solution

To segment images into two or more groups of objects:

1. Objects good for bitonal conversion

2. Objects good for true colour representation

Tto compress each group separately and then merge into one format.

Page 36: Compression of the image Adolf Knoll National Library of the Czech Republic

Horizontal segmentation/zoning

Horizontally- Text- Grafics- Photographs

Imagepower Inc.

Page 37: Compression of the image Adolf Knoll National Library of the Czech Republic

Vertical segmentation/zoning

Vertically Foreground Background

Lizardtech Inc. (AT&T)Luratech GmBH

DjVu, LDF

Page 38: Compression of the image Adolf Knoll National Library of the Czech Republic

Comparison of DjVu and LDF

DjVu

6 layers

Foreground: JB2 IW44

Background: 4 layers IW44

LDF

3 layers

Foreground: LDF 1-bit Comp. LFW

Background: 1 layer LWF, JP2

Page 39: Compression of the image Adolf Knoll National Library of the Czech Republic

Bitonal versus composed image

Page 40: Compression of the image Adolf Knoll National Library of the Czech Republic

Grey level

Page 41: Compression of the image Adolf Knoll National Library of the Czech Republic

Other DjVu properties

More images in one: as TIFF, PDF, LDF, …, with use of the

common dictionary of pixel chunks Virtually: pages remaion on server and

only that page that is called is delivered

Page 42: Compression of the image Adolf Knoll National Library of the Czech Republic

Multiresolution image

MrSID In one file several (up to 8) images in

various resolutions Sample Efficient with an image server

Page 43: Compression of the image Adolf Knoll National Library of the Czech Republic

SAMPLES

Samples of various compression solutions