Download - The DCT domain and JPEG

The DCT domain and JPEGCSM25 Secure Information Hiding

Dr Hans Georg Schaathun

University of Surrey

Spring 2009 – Week 3

Dr Hans Georg Schaathun The DCT domain and JPEG Spring 2009 – Week 3 1 / 47

Learning Objectives

Be able to work with and JPEG images and other representationsin the transform domain.Understand what happens during JPEG compression, and itspotential consequence to watermarking and steganography.Be able to apply simple LSB embedding in the JPEG domain.

Overview

Outline

1 Overview

2 Image Representation

3 Compression

4 Steganography in JPEG

5 Matlab

Overview

The elements of JPEG

Operates on luminence and chrominance (YCbCr) (not on RGB)Grayscale images have luminence component only.

DownsamplingWorks in the DCT domain (not the spatial domain)QuantisationEntropy coding (lossless compression)

Overview

JPEG is not a file format

JPEG is a compression systemThe system employs three different compression techniques

JPEG is not a file format.Files with extension .jpeg are often JFIF or EXIF.

JFIF is traditionally the most common file format for JPEG.EXIF is made for digital cameras and contain extra metainformation.

Overview

Reading

Core ReadingDigital Image Processing Using MATLAB.

Chapter 6: colour imagesRepresentationProcessingConversion

Chapter 8.5: JPEG compressionChapter 4: Frequency domain processing

Image Representation

Outline

1 Overview

2 Image RepresentationAlternatives to RGBThe blockwise DCT domain

3 Compression

5 Matlab

Image Representation Alternatives to RGB

The RGB colour representations

RGB : A colour is a vector (R, G, B)

R is amount of red light.G is amount of green light.B is amount of blue light.

Each pixel can be eitherA colour vector (R, G, B); ora reference to an array of colour vectors (the palette)

Each coefficient can be∈ [0, 1]; floating point (double in matlab)∈ {0, 1, . . . , 255}; 8-bit integer (uint8 in matlab)∈ {0, 1, . . . , 216 − 1}; 16-bit integer (uint16 in matlab)

Image Representation Alternatives to RGB

Alternatives to RGB

NTSC : (Y , I, Q)YIQ

=

0.299 0.587 0.1140.596 −0.274 −0.3220.211 −0.523 0.312

RGB

where R, G, B ∈ [0, 1].YCbCr : (Y , Cb, Cr) Y

CbCr

=

16128128

+

65.481 128.553 24.966−37.797 −74.203 112.000112.000 −93.786 −18.214

RGB

where R, G, B ∈ [0, 1] and Y , Cb, Cr ∈ [0, 255].

Image Representation The blockwise DCT domain

Block-wise

Each colour-channel (Y,Cb,Cr) considered separatelyM × N matrix divided into 8× 8 blocksEach block is handled separately

The DCT transform

Several different DCT transform.We use the following.

Tf (u, v) =M−1∑x=0

N−1∑y=0

f (x , y)

√α(u)α(v)

MNcos

(2x + 1)uπ

2Mcos

(2y + 1)vπ

2N

where

α(a) =

{1, if a = 0,

2, otherwise.

M = N = 8,

The DCT transform

The inverse is similar

f (x , y) =M−1∑u=0

N−1∑v=0

Tf (u, v)

√α(u)α(v)

MNcos

(2x + 1)uπ

2Mcos

(2y + 1)vπ

2N

where

α(a) =

{1, if a = 0,

2, otherwise.

M = N = 8,

Matlab

Matlab functionsdct2 (2D DCT transform)idct2 (Inverse)blkproc ( X, [M N], FUN )

For instanceblkproc ( X, [8 8], @dct2 )

Use help system for detailsDo not use imread for JPEG images

converts images to the spatial domain.you don’t even know the compression parameters...

Transform image

Linear combination of patterns (seeright)DC (upper left) gives average colourintensityLow frequency: coarse structureHigh frequency: fine details

Compression

Outline

1 Overview

3 CompressionDownsamplingQuantisationSource Coding

5 Matlab

Compression Downsampling

What is sampling?

FactThe human eye is more sensitive to changes in luminance than inchrominance.

To sample is to collect measurements.Each pixel is a sample (measuring the colour of the image).Lower resolution means fewer samples.Reducing resolution = downsampling

Basic M × N image: N ·M samples per component (Y , Cb, Cr ).Y is more useful than Cb and Cr .Therefore we can downsample Cb and Cr

M/2× N/2 is common for Cb and CrStill use M ×M for Y

What is sampling?

What is sampling?

What is sampling?

What is sampling?

What is sampling?

What is sampling?

What is sampling?

What is sampling?

What is sampling?

What do we save?

Original: M × N pixels ×3 components .Compressed:

2× M2× N

2+ M × N = 1

12

M × N

RatioCompressed

Original=

112MN

3MN=

12.

We just saved 50%

What do we save?

2× M2× N

2+ M × N = 1

12

M × N

RatioCompressed

Original=

112MN

3MN=

12.

We just saved 50%

What do we save?

2× M2× N

2+ M × N = 1

12

M × N

RatioCompressed

Original=

112MN

3MN=

12.

We just saved 50%

What do we save?

2× M2× N

2+ M × N = 1

12

M × N

RatioCompressed

Original=

112MN

3MN=

12.

We just saved 50%

Chrominance versus Luminence

Watermarking tend to embed in Y (luminence)Embedding in Cb and Cr would more easily be destroyed byJPEG

Downsampling in JPEG

1 Translation to YCbCr.2 Downsampling3 DCT transform

Each downsampled component matrix Y , Cb, Cr isDivided into 8× 8 blocksDCT transformed blockwise

An 8× 8 block in Cb can be associated with 1,2 or 4 Y blocksdepending on downsampling.

Compression Quantisation

What is quantisation?Rounding in general

Rounding numbers is quantisation.Measuring gives continuous numbers

Whether you measure pixel luminence, or the length ofyour garage.No matter how close to points are,

there is a point in between.

However, our precision is limited.We give lengths to the nearest unit.Luminence is categorised into 256 intervals (8bitintegers).

Computer memory is finite,256 different possibilites for a byte

Quantisation in JPEG

Quantisation in the DCT domainEach coefficient is divided by the quantisation constant.The result is rounded to nearest integer.Different quantisation constants for each coefficient in the block.

ExampleQuantisation in JPEG

2666666666664

DCT matrix−415 −30 −61 27 56 −20 −2 0

4 −22 −61 10 13 −7 −9 5−47 7 77 −25 −29 10 5 −6−49 12 34 −15 −10 6 2 212 −7 −13 −4 −2 2 −3 3−8 3 2 −6 −2 1 4 2−1 0 0 −2 −1 −3 4 −10 0 −1 −4 −1 0 1 2

3777777777775./

2666666666664

Quantisation matrix16 11 10 16 24 40 51 6112 12 14 19 26 58 60 5514 13 16 24 40 57 69 5614 17 22 29 51 87 80 6218 22 37 56 68 109 103 7724 35 55 64 81 104 113 9249 64 78 87 103 121 120 10172 92 95 98 112 100 103 99

3777777777775

≈

2666666666664

Quantised DCT matrix−26 −3 −6 2 2 −1 0 0

0 −2 −4 1 1 0 0 0−3 1 5 −1 −1 0 0 0−4 1 2 −1 0 0 0 01 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0

3777777777775Example from Wikipedia.

Compression Source Coding

Entropy Coding

Recall

−26 −3 −6 2 2 −1 0 00 −2 −4 1 1 0 0 0−3 1 5 −1 −1 0 0 0−4 1 2 −1 0 0 0 01 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0

Observe

0 is extremely common±1 is commonTwo-digit numbers are very rare

This is typical

Compression Source Coding

Entropy Coding

In order to compress the dataUse few bits (short codewords) for frequent symbolsMany bits (long codewords) only for rare symbols

Usually, JPEG uses a simple Huffman code.it can use other codes (saving space, but computionally costly)

For instance, a single short codeword to say‘the rest of the block is zero’

Steganography in JPEG

Outline

1 Overview

3 Compression

4 Steganography in JPEGInto the compressed domainJSteg and OutGuess 0.1

5 Matlab

Steganography in JPEG Into the compressed domain

Before or after compression

pixmapLSB

embedding// JPEG

compression// stego-

image//

message��

pixmapJPEG

compression// LSB

embedding// stego-

image//

message��

Fragility of LSB

LSB embedding is criticised for being fragileJPEG removes insignificant information

... such as the LSB

JPEG compression after embedding (probably) ruins the messageWhen is this a problem?

When fragility is a problem

It is a problem in robust watermarkingJPEG compression is common-placemost applications need robustness

If the purpose is steganography,and Alice and Bob are allowed to exchange pixmaps,then it is not a problem.

Obviously, if your steganogram is supposed to be JPEG... Do not do LSB in the pixmap.

Double compression

Common bug in existing softwareRead an arbitrary image file

JPEG is decompressed on reading... → pixmap

Embedding works on JPEGimage is compressed to produce JPEG signalquality factor (QF) either default or supplied by user

A JPEG steganogram has now been compressed twicedifferent QF produces an artifact

Is there any reason for de- and recompressing?

Important lessons

1 Do not make unnecessary image conversions.2 Many techniques apply to any format

LSB applies to JPEG signals... but it is called Jsteg

3 Use a technique which fits the target (stego-) formati.e. the format you are allowed to use on the channel.

Main developmentThe past at a glance

Core Reading

Hide and Seek: An Introduction to Steganography by Niels Provos andPeter Honeyman, in IEEE Security & Privacy 2003.

1 JSteg was published2 JSteg was broken3 OutGuess was published4 OutGuess was broken5 F5 was published6 F5 was broken

Core Reading

Core Reading

Core Reading

Core Reading

Core Reading

Steganography in JPEG JSteg and OutGuess 0.1

The JSteg algorithm

JSteg denotes a software package.Approach: simple LSB in the DCT domain.No different from LSB in the spatial domainχ2 analysis applies

PseudocodeThe JSteg algorithm

Input: Image I, Message ~mOutput: Image J

for each bit b of ~mc := next DCT coefficient from Iwhile c = 0 or c = 1,

c := next DCT coefficient from Iend whilec := c − c mod 2 + breplace coefficient in I by c

end for

May ignore high and/or low frequency coefficients

Bitflips in JSteg

0 +1 +2 +3 +4 +5−1−2−3−4−5Cover

0 +1 +2 +3 +4 +5−1−2−3−4−5Stego��

��

��....

....

�� ....

....

�� ....

....

�� ....

....

��

Typical JPEG. Typical JSteg.

Pros and Cons

Why is JSteg important?First publicly available solution.Simple solution

What are the disadvantages?Similar to LSB in Spatial domain.Histogramme analysis appliesχ2/pairs of values applies

Pros and Cons

Pros and Cons

Pros and Cons

Pros and Cons

Pros and Cons

Pros and Cons

Outguess 0.1

How can we improve JSteg?How did we improve LSB in the Spatial Domain?

SolutionChoose random coefficients from the entire image.

Outguess 0.1

Outguess 0.1

PseudocodeOutguess 0.1

Input: Image I, Message ~m, Key kOutput: Image J

Seed PRNG with kfor each bit b of ~m

c := pseudo-random DCT coefficient from Iwhile c = 0 or c = 1,

c := pseudo-random DCT coefficient from Iend whilec := c − c mod 2 + breplace coefficient in I by c

end for

May ignore high and/or low frequency coefficients

Histogramme analysisSteganalysis

1 Pairs of valuesGeneralised χ2 works

2 SymmetryEmbedding exchange +2 ↔ +3 and −2 ↔ −1.The DCT histogramme is expected to be symmetricOutguess/JSteg destroy the symmetry.

Matlab

Outline

1 Overview

3 Compression

5 MatlabThe JPEG toolboxRandomnessMasking

Matlab The JPEG toolbox

JPEG ToolboxReading a JPEG image

» im = jpeg_read ( ’Kerckhoffs.jpg’ )

im =image_width: 180image_height: 247

image_components: 3image_color_space: 2jpeg_components: 3jpeg_color_space: 3

comments: {}coef_arrays: {[248x184 double] [128x96 double] [128x96 double]}quant_tables: {[8x8 double] [8x8 double]}

ac_huff_tables: [1x2 struct]dc_huff_tables: [1x2 struct]optimize_coding: 0

comp_info: [1x3 struct]progressive_mode: 0

»

JPEG ToolboxJPEG data

» Xy = im.coef_arrays{im.comp_info(1).component_id} ;» whos XyName Size Bytes Class AttributesXy 248x184 365056 double

» Q = im.quant_tables{im.comp_info(1).quant_tbl_no}

Q =6 4 4 6 10 16 20 245 5 6 8 10 23 24 226 5 6 10 16 23 28 226 7 9 12 20 35 32 257 9 15 22 27 44 41 3110 14 22 26 32 42 45 3720 26 31 35 41 48 48 4029 37 38 39 45 40 41 40

»

w/o the toolboxLoad/Save workspace

jpeg_read and jpeg_write are compiled functions,Available on web page for Intel/Linux and Intel/WindowsCompilation required for other systems.

If you have trouble with the toolbox, it is possible to load/save filesas Matlab workspaces.

» load ’image.mat’» whosName Size Bytes Class Attributesim 1x1 575836 struct

There is one mat-file for each image on the web page

Other toolbox functions

If you want to write your own (de)compressor, the followingfunctions are useful.

bdct, ibdct, quantize, dequantize

Matlab Randomness

Pseudo-random number generators

A pseudo-random number generator (PRNG)finite state machinestarting state given by user input – the seedeach transition outputs a numberobserving a sequence of number,

it is computationally infeasible to predict the next number

looks randomIt is deterministic

if sender and receiver share the seed (a key)they can generate the same sequence

Matlab Randomness

Random locations

Many ways to do itbut some makes it hard to avoid using the same pixel twice

A simple approach1 randomly permute the pixels2 embed in the first pixels in the permuted sequence3 reverse the permutation

If the sender and receiver use the same PRNG seed,they get the same pseudo-random permutation

Matlab Randomness

Matlab functions

Many functions using a PRNGrand, randperm, randn

And one dedicated for performing random permutationsnew = randintrlv ( old, seed )the seed is given directly when the permutation is usedreverse: old = randdeintrlv ( new, seed )

new/old should be a 1D vectorserialise the image into a vector (im(:))

before randintrlv

reshape ( vector, [ N, M ] ) to recover the 2D imageafter randdeintrlv

Matlab Masking

MaskingBoolean matrix as index

Matlab allows Boolean (logical) matrices as indicesSay A is an n ×m matrixB = logical(ones(n,m))B(1,1) = 0A(B) will be all elements of A except (1, 1)

repmat(B,m,n) replicates B to form a larger matrixm times the heighn times the width

How do you extract the AC coefficients in a JPEG image?Make an 8× 8 mask of onesSet the DC entry to 0, e.g. (1, 1)Replicate the mask using repmat

Matlab Masking

Matlab Masking

Download - The DCT domain and JPEG

Top Related