cs/ee 5590 / eng 401 multimedia communication, spring...

CS/EE 5590 / ENG 401 Multimedia Communication, Spring 2017

Lec 06

Transforms and Quantization I

Zhu Li

Course Web:

http://l.web.umkc.edu/lizhu/

Z. Li, Multimedia Communciation, Spring 2017 p.1

Outline

Lecture 05 Arithmetic Coding Re-Cap

Signal Transform

KLT/PCA

DCT


Arithmetic Coding ReCAp

Arithmetic coding:

Symbols A={a1, a2, …, an}, with prob={p1, p2, …, pn}

For a sequence of symbol, X=[x1, x2, …, xm] compute the sequence prob, P(X), the binary code is,

of the F(X) representation.

Sequence prob based, more efficient than Huffman codes


Aliens visit earth….

bits. 1)(

1log)(

XpXl

Example - Arithmetic

Encoding: CDF: Fx()=[0.7 0.8 1]

T(X)=(0.546+0.560)/2=0.553

P(X)= 0.014; l(X)=6 bits

Code: (0.553)2 = 0.100011011


LOW=0.0, HIGH=1.0;

while (not EOF) {

n = ReadSymbol();

RANGE = HIGH - LOW;

HIGH = LOW + RANGE * CDF(n);

LOW = LOW + RANGE * CDF(n-1);

}

output LOW;

T(X)

Encoding of Binary Arithmetic Coding

Prob(0)=0.6. Sequence: 0110LOW = 0, HIGH = 1

0 0.6 1LOW = 0, HIGH = 0.6

0 0.36 0.6

LOW = 0.36, HIGH = 0.6

0.36 0.504 0.6LOW = 0.504, HIGH = 0.6

0.504 0.5616 0.6

LOW = 0.504, HIGH = 0.5616

Only need to update LOW or HIGH for each symbol.

)1(

)(

nCDFRANGELOWLOW

nCDFRANGELOWHIGH


Adaptive Arithmetic Coding

Observation:

The partition of [0, 1) can be different from symbol to symbol

The bit stream can be decoded perfectly as long as both encoder and decoder are synchronized (use the same probability).

Only need to update the probabilities

Adaptive Huffman code:

Has to redesign the codebook each time

Time-consuming.

Binary arithmetic

EncoderX(n)

p0(n)

Binary arithmetic

Decoder

p0(n)

X(n)bitstream


Example

0 0.5 1

Binary sequence: 01111

Initial counters for 0’s and 1’s:

C(0)=C(1)=1.

P(0)=P(1)=0.5

0 0.3333 0.5 After encoding 0: C(0)=2, C(1)=1.

P(0)=2/3, P(1)=1/3

0.3333 0.4167 0.5 After encoding 01: C(0)=2, C(1)=2.

P(0)=1/2, P(1)=1/2


P(0)=2/5, P(1)=3/5


P(0)=1/3, P(1)=2/3.

Encode 0.4667.


Decoding

Input 0.4667.

Initial counters for 0’s and 1’s:

C(0)=C(1)=1 P(0)=P(1)=0.5

Decode 0

0 0.5 1

0 0.3333 0.5 After decoding 0: C(0)=2, C(1)=1.

P(0)=2/3, P(1)=1/3

Decode 1

After decoding 01: C(0)=2, C(1)=2.

P(0)=1/2, P(1)=1/2

Decode 1

0.3333 0.4167 0.5


P(0)=2/5, P(1)=3/5

Decode 1

0.4167 0.45 0.5


P(0)=1/3, P(1)=2/3.

Decode 1

0.45 0.4667 0.5


Context-adaptive Arithmetic Coding

In many cases, a sample has strong correlation with its near neighbors.

Idea:

Collect conditional probability distribution of a symbol for given neighboring symbols (context):

P(x(n) | x(n-1), x(n-2), … x(n-k))

Use this conditional probability to encode the next symbol

More skewed probability distribution can be obtained (desired for arithmetic coding)

a b c b c a b

1-D

Context

template

a b c b c a b

c b a b c b a2-D

Context

template


Example

Binary sequence: 0, 1

Neighborhood (template) size: 3

2^3=8 possible combinations (contexts) of 3 neighbors

(x(n-1), x(n-2), x(n-3)).

Collect frequencies of 0’s and 1’s under each context

0 1 1 0 1 0 1

Context C(0) C(1)

(0, 0, 0) 9 2

(0, 0, 1) 3 6

(0, 1, 0) 10 10

(0, 1, 1) … …

(1, 0, 0) … …

(1, 0, 1) … …

(1, 1, 0) … …

(1, 1, 1) … …

0 1

0 1

0 1

Each symbol is coded with

the probability distribution

associated with its context.


…….

Modeling Large Context

Condition reduces entropy, H(Y|X1) > H(Y|X1, X2, ….)

How to model very large context (x1, x2, ….)? Use a (shallow/deep) neural network to model context

Large window of 552 input nodes

7 sets of 3080 hidden nodes

Potential project: deeper neural network for HEVC CABAC context modeling


CABAC in H.264

Definition of context indices in H.264:

400 contexts in total.

0-72 are for mode decisions

73-398: coding of residual data.

Pure frame or field coding mode only uses 0-277.

o 278-398 are for field coding.

Example: syntax element mb_skip_flag

0: a macroblock (MB) is skipped (no nonzero coefficient)

1: a macroblock is not skipped, has nonzero coefficients to encode

The context index to encode the mb_skip_flag of a macroblock is defined byIndex = 11 + mb_skip_flag(A) + mb_skip_flag(B)

where A is the left neighboring MB and B is the top MB. A

B


Coefficient Coding: HEVC vs H.264


Summary of CABAC in H.264 and HEVC

Context Modeling is the major bottleneck

HEVC did a good job in reducing context dependency

Significantly reduced the context, while maintain the compression performance


Outline


Signal Transform

KLT/PCA

DCT


Transform - Objective

Objective: to find alternative representation of the image/signal that is more compact


*=

Quant Table:

*

8 24 -2 0 0 0 0 0

-31 -4 6 -1 0 0 0 0

0 -12 -1 2 0 0 0 0

0 0 -2 -1 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

Block Transform

Divide input data into blocks, encode each block separately

Matrix representation:

11 NNNN xTy TX y

Inverse Transform:

yTx1 TX

y

T x-1

Orthogonal (orthonormal) Transform:

TTT 1

TXy

T xT

Biorthogonal Transform: TTT 1


Property of Orthogonal Transform

Txy T orthogonal.

22xy

1. Orthogonal transform preserves the energy:

Proof:

2 2T T T T y y y x T Tx x x x


Property of Orthogonal Transform

Txy T orthogonal.

TXy

TT

e

x̂

22x̂xe

2. Orthogonal transform does not amplify the noise:

Proof:

2 2 2ˆthen x x v eLet ,Tv T e


Degree of Freedom in Orthogonal Transforms

Given a set of orthogonal basis A=[a1, a2,…, ak] in Rd, what is the the DoF of matrix A?

Not k x d, because ATA=Id, DoF(A) = kd – (1/2)k(k-1) ? {StiefleManifold}

No, because rotation of the basis should be invariant, i.e, if span(A1)=span(A2), then A1=A2.

The true DoF(A) is characterized by the Grassmann manifold, it is kd– k2.


Karhunen-Loéve Transform (KLT)

Also known as Hotelling transform, Principle Component method (energy preserving interpretation).

Goal: To decorrelate the input with an orthogonal transform:

T is orthogonal

We want uncorrelated diagonal.iy yyR

.T Txx yyR T T R

Rows of T should be the eigenvectors of Rxx.

the variances of {yi} should be the eigenvalues of Rxx.

1.

2.

.U TT

Recall: Eigen-decomposition: If A is square,

D: diagonal matrix with engenvalues on the diagonal.

U: columns are eigenvectors.

A symmetric U orthonormal

The KLT transform should be chosen as

.1UDUA

DAUU T

𝑦 = 𝑇𝑥, 𝑅𝑦𝑦 = 𝑦𝑦𝑇 𝑦𝑦𝑇 = 𝑇𝑥 𝑇𝑥 𝑇 = 𝑇 𝑥𝑥𝑇 𝑇𝑇 = 𝑇𝑅𝑥𝑥𝑇

𝑇


Principal Component Analysis (PCA)

• Given: N data points x1, … ,xN in Rd

• We want to find a new set of features that are linear combinations of original ones:

u(xi) = uT(xi – µ)

• (µ: mean of data points)

• Choose unit vector u in Rd that captures the most data variance (max energy preservation)


Principal Component Analysis

• Direction that maximizes the variance of the projected data:

Projection of data point

Covariance matrix of data

The direction that maximizes the variance is the eigenvector associated with the

largest eigenvalue of Σ

N

N

1/N

Maximizesubject to ||u||=1


PCA- Principal Component Analysis

Formulation:

Find projections, that the information/energy of the data are maximally preserved

Matlab: [A, s, eigv]=princomp(X);


max𝑊𝐸 𝑥𝑇𝑊𝑥 , 𝑠. 𝑡. ,𝑊𝑇𝑊 = 𝐼

PCA algorithm

PCA algorithm(X, k): top k eigenvalues/eigenvectors

% X = N m data matrix,

% … each data point xi = column vector, i=1..m

•

• X subtract mean x from each column vector xi in X

• XXT … covariance matrix of X

• { i, ui }i=1..N = eigenvectors/eigenvalues of

... 1 2 … N

• Return { i, ui }i=1..k% top k principle components

m

im 1

1ixx


PCA Algorithm

Center the data:

X = X – repmat(mean(x), [n, 1]);

Principal component #1 points in the direction of the largest variance

Each subsequent principal component…

is orthogonal to the previous ones, and

points in the directions of the largest variance of the residual subspace

Solved by finding Eigen Vectors of the Scatter/Covarinace matrix of data:

S = cov(X); [A, eigv]=Eig(S)


PCA projection of Images

• Divide the original 372x492 image into patches:• Each patch is an instance that contains 12x12 pixels on a grid

• View each as a 144-D vector


Eigen Values (energy) and PCA dim


PCA compression: kd=144 >> kd=60


PCA compression: 144d > 16d


16 most important eigenvectors

2 4 6 8 10 12

2

4

6

8

10

122 4 6 8 10 12

2

4

6

8

10

122 4 6 8 10 12

2

4

6

8

10

122 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

122 4 6 8 10 12

2

4

6

8

10

122 4 6 8 10 12

2

4

6

8

10

122 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

122 4 6 8 10 12

2

4

6

8

10

122 4 6 8 10 12

2

4

6

8

10

122 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

122 4 6 8 10 12

2

4

6

8

10

122 4 6 8 10 12

2

4

6

8

10

122 4 6 8 10 12

2

4

6

8

10

12


PCA compression: 144D ) 6D


SVD

PCA: pull the n x m image blocks as vector in Rnxm.

The Singular Value Decomposition (SVD) of an nxm matrix A, is,

Where the diagonal of S are the eigen values of AAT, [𝜎1, 𝜎2, … , 𝜎𝑛]

U are eigenvectors of AAT, and V are eigen vectors of ATA, the outer product of uivi

T, are basis of A in reconstruction:


𝐴 = 𝑈𝑆𝑉𝑇 =

𝑖

𝜎𝑖𝑢𝑖𝑣𝑖𝑡

A(mxn) = U(mxm) S(mxn)

V(nxn)

The 1st order SVD approx. of A is:

𝜎1 ∗ 𝑈 : , 1 ∗ 𝑉 : , 1𝑇

SVD approximation of an image

Very easy…

function [x]=svd_approx(x0, k)

dbg=0;

if dbg

x0= fix(100*randn(4,6));

k=2;

end

[u, s, v]=svd(x0);

[m, n]=size(s);

x = zeros(m, n);

sgm = diag(s);

for j=1:k

x = x + sgm(j)*u(:,j)*v(:,j)';

end


Outline


Signal Transform

KLT/PCA

DCT


2-D Block Transform

Problem with SVD/PCA

Data dependent

Non-separable Transform

Separable approach:

Apply transform to each row, then to each column

T

NNNNNN NN TXTY

col tx row tx

Matrix form:

An N x N block


2-D Transform via Kronecker Product

Kronecker product:

The result is a large matrix formed by taking all possible

products between the elements of A and those of B.

Example: If A is 2x3 matrix, then

BA Matlab function: kron ( )

BBB

BBBBA

3 ,22 ,21 ,2

3 ,12 ,11 ,1

AAA

AAA

Note: .ABBA

Example:

11

11 ,

10

01BA

11

11

11

11

BA

1

1

1

1

1

1

1

1

AB



Y. of colth -i :iy

TTT

N

1yyy

TTT

N

1xxx

Def:

2-D separable transform can be implemented as 1-D transform via Kronecker product:

then TTXTY xTTy

X. of colth -i :ix

Filtering complexity instead of O(n2), now O(n).DCT, SIFT DoG



TTXTY xTTy

1110

0100

tt

ttT

db

caX

1101

1000

11101110

01000100

1101

1000

1110

0100

tt

tt

dtctbtat

dtctbtat

tt

tt

db

ca

tt

ttY

dttcttbttattdttcttbttatt

dttcttbttattdttcttbttatt

11111011111010011101100111001000

01110011011000010101000101000000

xTTxTT

TTy

1101

1000

1111101111101001

0111001101100001

1101100111001000

0101000101000000

11

01

10

00

tt

tt

d

c

b

a

tttttttt

tttttttt

tttttttt

tttttttt

y

y

y

y

Example:


Discrete Cosine Transform (DCT)

DCT is the approximation of the KLT of AR(1) signal when its correlation coefficient ρ is close to 1 (e.g. 0.95)

1.-N ..., 1,ifor /2

0,ifor /1

1.-N ..., 0,j i, ,2

)12(cos ,

Na

Na

N

ijaji

C

Definition:

Matlab function:

dct(eye(N));


DCT

Definition:

1.-N ..., 1,ifor /2

0,ifor /1

1.-N ..., 0,j i, ,2

)12(cos ,

Na

Na

N

ijaji

C

N = 2: DCT=Haar Transform:

11

11

2

12C

11

10

1

0

1

0

2

1

0

2

1

11

11

2

1

xx

xx

x

x

x

x

y

yC

y0 captures the mean of x0 and x1 (low-pass)

x0 = x1 = 1 y0 = sqrt(2) (DC), y1 = 0

y1 captures the difference of x0 and x1 (high-pass)

x0 = 1, x1 = -1 y0 = 0 (DC), y1 = sqrt(2).


4-point DCT Freq Response

Four subbands2 sym. filters2 anti-sym.

0.5000 0.5000 0.5000 0.5000

0.6533 0.2706 -0.2706 -0.6533

0.5000 -0.5000 -0.5000 0.5000

0.2706 -0.6533 0.6533 -0.2706

0 0.1 0.2 0.3 0.4 0.5-40

-35

-30

-25

-20

-15

-10

-5

0

5DC Att. 406.0206, Mirr Att. 324.2604, Stopband 8.3456, Coding Gain = 7.5701 dB

Normalized Frequency

Magnitude R

esponse (

dB

)

x 2π


8-point DCT

Eight subbands:4 sym., 4 anti-sym.

x 2π

0 0.1 0.2 0.3 0.4 0.5-40

-35

-30

-25

-20

-15

-10

-5

0

5DC Att. 409.0309, Mirr Att. 320.1639, Stopband 9.9559, Coding Gain = 8.8259 dB

Normalized Frequency

Magnitude R

esponse (

dB

)


Example: 1-D DCT

x = [100 110 120 130 140 150 160 170]T;

8-point DCT:

[381.8377, -64.4232, 0.0, -6.7345, 0.0, -2.0090, 0.0, -0.5070]

Most energy are in the first 2 coefficients.

1 2 3 4 5 6 7 850

100

150

200

250

1 2 3 4 5 6 7 8-100

0

100

200

300

400

x(t)

dct_x(t)


2-D 8-point DCT Example

89 78 76 75 70 82 81 82

122 95 86 80 80 76 74 81

184 153 126 106 85 76 71 75

221 205 180 146 97 71 68 67

225 222 217 194 144 95 78 82

228 225 227 220 193 146 110 108

223 224 225 224 220 197 156 120

217 219 219 224 230 220 197 151

Original Data:

2-D DCT Coefficients (after rounding to integers):

1155 259 -23 6 11 7 3 0

-377 -50 85 -10 10 4 7 -3

-4 -158 -24 42 -15 1 0 1

-2 3 -34 -19 9 -5 4 -1

1 9 6 -15 -10 6 -5 -1

3 13 3 6 -9 2 0 -3

8 -2 4 -1 3 -1 0 -2

2 0 -3 2 -2 0 0 -1Most energy is in the upper-

left corner


Matlab Exercise: SVD, PCA, and DCT approximation

For the lena image, process it as 8x8 blocks

Compute m-coefficients reconstructions


Summary

Adaptive Arithmetic Coding

Binary AC has wide applications

AC can easily adapt to the changing prob of symbols

CABAC: Context Aware Binary AC is widely used in HEVC/AVC (one project: deep dive and review of CABAC in HEVC)

Transforms

KL Transform/PCA: de-correlation and energy preserving formulations

SVD: dealing with n x m tensorial data approximation

DCT: separable transform, much faster, good de-correlation and energy preserving property


cs/ee 5590 / eng 401 multimedia communication, spring...

Documents