mpeg-2 to h.264/avc transcoding techniques

MPEG-2 to H.264/AVC Transcoding Techniques

Jun Xin

Xilient Inc.Cupertino, CA

Digital Video TranscodingNovember 07

Digital Video Transcoder

“A” and “B” may differ in many aspects: coding formats: e.g. MPEG-2 to H.264/AVC bit-rate, frame rate, resolution … features: error resilience features contents: e.g. logo insertion

Transcoder

Coded digitalvideo bit-stream “A”

Coded digitalvideo bit-stream “B”


Applications

Media Storage Transcode broadcasting MPEG-2 video to H.264/AVC

format: enable long-time recording Effective for multi-channel recording

Home Gateway Provide connection to IPTV set-top box

Box only supports H.264/AVC Over wireless network with bandwidth limitation

Other potential uses: Export to mobile Internet streaming … …


Goals and Challenges H.264/AVC: latest video compression standard

Promises same quality as MPEG-2 at half the bit-rate Is being widely adopted

HD Consumer Storage, e.g., HD-DVD and Blu-Ray Mobile Devices, e.g., Apple iPod, iPhone, Sony PSP

Convert MPEG-2 video to H.264/AVC format More efficient storage, export to mobile devices, etc.

Challenges Yield similar quality as full re-encoding, but with much lower cost Key to lower-cost/high-quality: how to intelligently reuse available

information from the incoming bitstream May be loosely considered as a “two-pass coder”

Could achieve better quality than full re-encoding given same complexity


Outline

Intra-only transcoding techniques Efficient compressed domain processing

Inter transcoding techniques Motion mapping / motion reuse

Intra Transcoding Techniques


Intra Transcoder – Pixel Domain

Q

Inverse Q

H.264 Entropy Coding

IDCT

VLD/IQ

Input MPEG-2Bitstream

Intra Prediction

(Pixel-domain) Pixel BufferMode

decision

HT

VLD: variable length decoding(I)Q: (inverse) quantizationIDCT: inverse discrete cosine transformHT: H.264/AVC 4x4 transform

Inverse HT


Compressed Domain Processing?

Q

Inverse Q


VLD/IQ


Intra Prediction

(Comp-domain) Coeff BufferMode

decision



AVC 4x4 Transform Motivation:

DCT requires real-number operations, which may cause inaccuracies in inversion

Better prediction means less spatial correlation – no strong need for real-number operations

H.264 uses a simple integer 4x4 transform Approximation to 4x4 DCT Transform and inverse transform

note: ½ in inverse transform represents right shift, so it is non-linear


Intra Prediction in H.264/AVC Motivation: intra-frames are natural images, so they exhibit strong

spatial correlation Pixels in intra-coded frames are predicted based on previously-coded

ones Prediction can be based on 4x4 blocks or 16x16 macroblocks (or 8x8

blocks for high profile)

An encoded mode specifies which neighbor pixels should be used to predict, and how


Current block:

Prediction blocks:Vertical Horizontal

Diagonal_Down_Right

4x4 Intra Prediction Example



Challenges Different transforms

MPEG-2 uses DCT, floating point H.264/AVC uses an integer transform

New prediction modes in H.264/AVC Can prediction be performed in compressed domain?

Goals Simpler computation and architecture



Q

Inverse Q


VLD/IQ


Intra Prediction

(Comp-domain) Coeff BufferMode

decision



Intra Transcoder – Proposed

Q

Inverse Q

Entropy Coding

DCT-to-HT conversion(S-Transform)

VLD/IQ


Pixel BufferMode

decision (HT-domain)

Inverse HTIntra

Prediction(HT-domain)



Techniques

DCT-to-HT conversion Compressed (HT) domain prediction

Very simple for some prediction modes

Compressed domain distortion calculation in mode decision

Advantages lower computational complexity No quality loss


DCT-to-HT Conversion


DCT-to-HT Conversion:Transform Kernel Matrix


Fast Algorithm (1D)


Complexity Analysis Transform-domain DCT-to-HT (S-Transform): 704

operations 352 multiplications 352 additions

Pixel-domain mapping (IDCT* followed by HT): 992 operations 256 multiplications 64 shifts 672 additions

Advantage 29% saving in total operations Two-stage vs. six-stage implementation Better performance: no intermediate rounding

* W.H. Chen, C.H. Smith, and S.C. Fralick, ``A Fast Computational Algorithm for the Discrete Cosine Transform,'' IEEE Trans. on Communications, Vol. COM-25, pp. 1004-1009, 1977


Intra Transcoder – Proposed

Q

Inverse Q

Entropy Coding

DCT-to-HT conversion(S-Transform)

VLD/IQ


Pixel BufferMode

decision (HT-domain)

Inverse HTIntra

Prediction(HT-domain)



Conventional Mode Decisions Given all possible prediction modes, encoder needs to decide which

one to use Low-complexity mode decision rule (RDO_Off):

or

High-complexity mode decision rule with rate distortion optimization (RDO_On):

)(~minarg 2

2 * R(m)λmss MODE

m

)(ˆminargk

mmss ))(ˆ(minarg

1mssT

m

SATD Cost

RD Cost


Conventional RD Cost Computation

Entire encoding/decoding need to be performed for every mode

HT QCompute

rate

Inverse Q

Inverse HT

Intra Prediction

PixelBuffers

Determine distortion

Compute cost(J=D+λ×R)

D

R

PredictionMode

sp

e E

p s~

e~

E~

s

HT QCompute

rate

Inverse Q

Inverse HT

Intra Prediction

PixelBuffers



D

R

PredictionMode

HT QCompute

rate

Inverse Q

Inverse HT

Intra Prediction

PixelBuffers



D

R

PredictionMode

sp

e E

p s~

e~

E~

s


Motivation & Previous Approaches RD_Cost based mode decision gives best performances, but

very expensive to compute Previous efforts in fast intra mode decisions

Directional field Edge histogram Other pixel-domain approaches They all lead to lower coding performance

Our approach is based on transform domain processing – no loss in coding performance


Transform Domain RD Cost Computation

No inverse transform Transformations of some prediction signals are easy to compute Distortion calculated in transform domain

QDetermine

rate

Inverse Q


(HT-domain)


HT

S-Transform

PixelBuffer

PredictionMode

R

DIntra

Prediction

E

E~

QDetermine

rate

Inverse Q


(HT-domain)


HT

S-Transform

PixelBuffer

PredictionMode

R

DIntra

Prediction

QDetermine

rate

Inverse Q


(HT-domain)


HT

S-Transform

PixelBuffer

PredictionMode

R

DIntra

Prediction

E

E~


HT of DC Prediction

dddd

dddd

dddd

dddd

pdc HT

0000

0000

0000

00016 d

Pdc

• No HT needs to be performed

• Pdc has only one non-zero elements


HT of Horizontal Prediction

000h

hhhh

H

HHHHH

H

hhhh

hhhh

hhhh

hhhh

H

HpHP

T

T

Thh

4

4444

3333

2222

1111

4

3

2

1

h

h

h

h

h

• Only one 1-D HT is needed

• Ph has only four non-zero elements (the first column)

0

0

0

0

0


HT of Vertical Prediction

TT

TTTTT

T

Tvv

H

HHHHH

H

vvvv

vvvv

vvvv

vvvv

H

HpHP

000v

vvvv

4

4321

4321

4321

4321 4321 vvvvv

0

0

0

0

0

• Only one 1-D HT is needed

• Pv has only four non-zero elements (the first row)


Calculate Distortion in Transform Domain

Distortion in pixel domain:

Distortion in transform domain:


Ranking-based Fast Mode Decision

Two cost functions: SATD_Cost & RD_Cost

Observation: the best mode according to RD_Cost usually has smaller SATD_Cost

Proposed algorithm (mode reduction): to rank different modes using SATD_Cost, then calculate RD_Cost for top several modes Algorithm can be conducted in transform domain


Verification Experiment

Count the percentage of times when the best mode according to RD_Cost are within the best k modes ranked by SATD_Cost

k fixed as 3 in all simulations

1 2 3 4 5 6 7 80.75

0.8

0.85

0.9

0.95

1

Number Of Modes

Mod

e P

redi

ctio

n A

ccur

acy

Fast Mode Decision Algorithm Verification


Simulation Conditions

Three transcoders PDT – reference pixel domain transcoder, with fast IDCT

implemented TDT – transform domain transcoder TDT-R – transform domain transcoder with ranking-based mode

decision

Test sequences 100 frames, CIF size, 30 fps Input: MPEG-2 all-I at 6Mbps


Simulation – “Mobile”

26.5

27.5

28.5

29.5

30.5

3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0Bitrate (Mbps)

PS

NR

-Y (

dB

)

PDTTDTTDT-RPDT-RDOoff


Simulation – “Stefan”

29.0

30.0

31.0

32.0

33.0

34.0

35.0

36.0

2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5Bitrate (Mbps)

PS

NR

-Y (

dB

)

PDTTDTTDT-RPDT-RDOoff


Complexity: Run-time Results


Summary of Intra Transcoding

Efficient transcoder architecture Efficient mode decision

Transform domain distortion calculation Ranking-based mode decision

Achieved virtually same quality as reference transcoder with significantly lower complexity

Inter Transcoding Techniques


Transcoder Architecture

Inverse Q/ Inverse HT

entropy coding

MPEG-2 decoder P

rediction Motion/mode

mapping

HT/Q

Decoded picture and macroblock

data

Deblocking filter

Pixel buffers

Motion and modes


Assumptions

Input MPEG-2 frame pictures

Output H.264/AVC baseline profile (no B slices) and main

profile Frame pictures, MBAFF not considered Block partition sizes considered for motion

compensation: 16x16, 16x8, 8x16 and 8x8


Motion Mapping: Problems

MPEG-2 H.264/AVC

Frame/field motion vector Frame motion vector

B, P pictures Baseline profile has no B picture support

One motion vector per macroblock

Motion vectors for different partition sizes: 16x16, 16x8, 8x16, 8x8


Motion Mapping Algorithm

1. Field-to-frame mapping: convert MPEG-2 field motion vectors (if any) to frame vector

2. Reference picture mapping: for B to P frame type conversion

3. Block size mapping: map the MPEG-2 motion vectors to target H.264/AVC motion vectors of different block size

Algorithm: distance weighted average (DWA)

4. Motion refinement: (1+1/2+1/4) around estimated motion vectors for all block partitions

Note: for B slice output, the above mapping is performed for motion vectors of both directions


Field-to-frame Conversion


Reference Picture Mapping

I B B P

ti=3

to=1

I P P P

Input

Output

I B B P

I P P P

Input

Output

MVi,forw MVi,back

MVcol

MVo


Block Size Mapping: 16x8 8x16

A

B

a1 a2 a3

a6 a5 a4

b1 b2 b3

b5 b4

b6

A B

b4 a1

a2

a3

a5

a6

b5

b6

a4 b1

b2

b3


Block Size Mapping: 8x8

B A D C

b2

b3

a2 b4

a3

a4

c4 d2

d3 c2

c3 d4

a1 b1

c1 d1


Simulation Conditions Test sequences:

1920x1080i, 30fps, 450 frames MPEG-2 input:

30 Mbps, (30,3) H.264/AVC output:

UVLC, output bit-rate of interest ~10 Mbps Baseline profile (needs to convert B pictures to P slices) & Main profile

Comparison points Mapping algorithm B slices RD optimization


Baseline output: no B slices

HarborScene

29.0

29.5

30.0

30.5

31.0

31.5

32.0

32.5

6 8 10 12 14 16 18 20Bit rate (Mbps)

PS

NR

-Y (

dB

)

DWA+RDO

DWA+ranking

DWA

REF+RDO


Baseline output: no B slices

StreetCar

34.5

35.0

35.5

36.0

4 5 6 7 8 9 10 11 12 13Bit rate (Mbps)

PS

NR

-Y (

dB

)

DWA+RDO

DWA+Ranking

DWA

REF+RDO


Main Output: with B slices

HarborScene

29.0

29.5

30.0

30.5

31.0

31.5

32.0

32.5

4 6 8 10 12 14 16 18 20Bit rate (Mbps)

PS

NR

-Y (

dB

)

DWA

DWA_IPB

DWA_IPB+RDO


Main Output: with B slices

StreetCar

34.5

35.0

35.5

36.0

4 5 6 7 8 9 10 11 12 13

Bit rate (Mbps)

PS

NR

-Y (

dB

)

DWA

DWA_IPB

DWA_IPB+RDO


Complexity: Run-time Results


Conclusions

Efficient motion mapping schemes that directly map MPEG-2 motion vectors to H.264/AVC motion vectors

Evaluated the complexity-performance tradeoff of B-slices and RD optimization

Achieved good rate-distortion performance with low complexity

Thank you

mpeg-2 to h.264/avc transcoding techniques

Documents