mpeg-2 to h.264/avc transcoding techniques
DESCRIPTION
MPEG-2 to H.264/AVC Transcoding Techniques. Jun Xin Xilient Inc. Cupertino, CA. Digital Video Transcoder. “A” and “B” may differ in many aspects: coding formats: e.g. MPEG-2 to H.264/AVC bit-rate, frame rate, resolution … features: error resilience features - PowerPoint PPT PresentationTRANSCRIPT
MPEG-2 to H.264/AVC Transcoding Techniques
Jun Xin
Xilient Inc.Cupertino, CA
Digital Video TranscodingNovember 07
Digital Video Transcoder
“A” and “B” may differ in many aspects: coding formats: e.g. MPEG-2 to H.264/AVC bit-rate, frame rate, resolution … features: error resilience features contents: e.g. logo insertion
Transcoder
Coded digitalvideo bit-stream “A”
Coded digitalvideo bit-stream “B”
Digital Video TranscodingNovember 07
Applications
Media Storage Transcode broadcasting MPEG-2 video to H.264/AVC
format: enable long-time recording Effective for multi-channel recording
Home Gateway Provide connection to IPTV set-top box
Box only supports H.264/AVC Over wireless network with bandwidth limitation
Other potential uses: Export to mobile Internet streaming … …
Digital Video TranscodingNovember 07
Goals and Challenges H.264/AVC: latest video compression standard
Promises same quality as MPEG-2 at half the bit-rate Is being widely adopted
HD Consumer Storage, e.g., HD-DVD and Blu-Ray Mobile Devices, e.g., Apple iPod, iPhone, Sony PSP
Convert MPEG-2 video to H.264/AVC format More efficient storage, export to mobile devices, etc.
Challenges Yield similar quality as full re-encoding, but with much lower cost Key to lower-cost/high-quality: how to intelligently reuse available
information from the incoming bitstream May be loosely considered as a “two-pass coder”
Could achieve better quality than full re-encoding given same complexity
Digital Video TranscodingNovember 07
Outline
Intra-only transcoding techniques Efficient compressed domain processing
Inter transcoding techniques Motion mapping / motion reuse
Intra Transcoding Techniques
Digital Video TranscodingNovember 07
Intra Transcoder – Pixel Domain
Q
Inverse Q
H.264 Entropy Coding
IDCT
VLD/IQ
Input MPEG-2Bitstream
Intra Prediction
(Pixel-domain) Pixel BufferMode
decision
HT
VLD: variable length decoding(I)Q: (inverse) quantizationIDCT: inverse discrete cosine transformHT: H.264/AVC 4x4 transform
Inverse HT
Digital Video TranscodingNovember 07
Compressed Domain Processing?
Q
Inverse Q
H.264 Entropy Coding
VLD/IQ
Input MPEG-2Bitstream
Intra Prediction
(Comp-domain) Coeff BufferMode
decision
VLD: variable length decoding(I)Q: (inverse) quantizationIDCT: inverse discrete cosine transformHT: H.264/AVC 4x4 transform
Digital Video TranscodingNovember 07
AVC 4x4 Transform Motivation:
DCT requires real-number operations, which may cause inaccuracies in inversion
Better prediction means less spatial correlation – no strong need for real-number operations
H.264 uses a simple integer 4x4 transform Approximation to 4x4 DCT Transform and inverse transform
note: ½ in inverse transform represents right shift, so it is non-linear
Digital Video TranscodingNovember 07
Intra Prediction in H.264/AVC Motivation: intra-frames are natural images, so they exhibit strong
spatial correlation Pixels in intra-coded frames are predicted based on previously-coded
ones Prediction can be based on 4x4 blocks or 16x16 macroblocks (or 8x8
blocks for high profile)
An encoded mode specifies which neighbor pixels should be used to predict, and how
Digital Video TranscodingNovember 07
Current block:
Prediction blocks:Vertical Horizontal
Diagonal_Down_Right
4x4 Intra Prediction Example
Digital Video TranscodingNovember 07
Compressed Domain Processing?
Challenges Different transforms
MPEG-2 uses DCT, floating point H.264/AVC uses an integer transform
New prediction modes in H.264/AVC Can prediction be performed in compressed domain?
Goals Simpler computation and architecture
Digital Video TranscodingNovember 07
Compressed Domain Processing?
Q
Inverse Q
H.264 Entropy Coding
VLD/IQ
Input MPEG-2Bitstream
Intra Prediction
(Comp-domain) Coeff BufferMode
decision
VLD: variable length decoding(I)Q: (inverse) quantizationIDCT: inverse discrete cosine transformHT: H.264/AVC 4x4 transform
Digital Video TranscodingNovember 07
Intra Transcoder – Proposed
Q
Inverse Q
Entropy Coding
DCT-to-HT conversion(S-Transform)
VLD/IQ
Input MPEG-2Bitstream
Pixel BufferMode
decision (HT-domain)
Inverse HTIntra
Prediction(HT-domain)
VLD: variable length decoding(I)Q: (inverse) quantizationIDCT: inverse discrete cosine transformHT: H.264/AVC 4x4 transform
Digital Video TranscodingNovember 07
Techniques
DCT-to-HT conversion Compressed (HT) domain prediction
Very simple for some prediction modes
Compressed domain distortion calculation in mode decision
Advantages lower computational complexity No quality loss
Digital Video TranscodingNovember 07
DCT-to-HT Conversion
Digital Video TranscodingNovember 07
DCT-to-HT Conversion:Transform Kernel Matrix
Digital Video TranscodingNovember 07
Fast Algorithm (1D)
Digital Video TranscodingNovember 07
Complexity Analysis Transform-domain DCT-to-HT (S-Transform): 704
operations 352 multiplications 352 additions
Pixel-domain mapping (IDCT* followed by HT): 992 operations 256 multiplications 64 shifts 672 additions
Advantage 29% saving in total operations Two-stage vs. six-stage implementation Better performance: no intermediate rounding
* W.H. Chen, C.H. Smith, and S.C. Fralick, ``A Fast Computational Algorithm for the Discrete Cosine Transform,'' IEEE Trans. on Communications, Vol. COM-25, pp. 1004-1009, 1977
Digital Video TranscodingNovember 07
Intra Transcoder – Proposed
Q
Inverse Q
Entropy Coding
DCT-to-HT conversion(S-Transform)
VLD/IQ
Input MPEG-2Bitstream
Pixel BufferMode
decision (HT-domain)
Inverse HTIntra
Prediction(HT-domain)
VLD: variable length decoding(I)Q: (inverse) quantizationIDCT: inverse discrete cosine transformHT: H.264/AVC 4x4 transform
Digital Video TranscodingNovember 07
Conventional Mode Decisions Given all possible prediction modes, encoder needs to decide which
one to use Low-complexity mode decision rule (RDO_Off):
or
High-complexity mode decision rule with rate distortion optimization (RDO_On):
)(~minarg 2
2 * R(m)λmss MODE
m
)(ˆminargk
mmss ))(ˆ(minarg
1mssT
m
SATD Cost
RD Cost
Digital Video TranscodingNovember 07
Conventional RD Cost Computation
Entire encoding/decoding need to be performed for every mode
HT QCompute
rate
Inverse Q
Inverse HT
Intra Prediction
PixelBuffers
Determine distortion
Compute cost(J=D+λ×R)
D
R
PredictionMode
sp
e E
p s~
e~
E~
s
HT QCompute
rate
Inverse Q
Inverse HT
Intra Prediction
PixelBuffers
Determine distortion
Compute cost(J=D+λ×R)
D
R
PredictionMode
HT QCompute
rate
Inverse Q
Inverse HT
Intra Prediction
PixelBuffers
Determine distortion
Compute cost(J=D+λ×R)
D
R
PredictionMode
sp
e E
p s~
e~
E~
s
Digital Video TranscodingNovember 07
Motivation & Previous Approaches RD_Cost based mode decision gives best performances, but
very expensive to compute Previous efforts in fast intra mode decisions
Directional field Edge histogram Other pixel-domain approaches They all lead to lower coding performance
Our approach is based on transform domain processing – no loss in coding performance
Digital Video TranscodingNovember 07
Transform Domain RD Cost Computation
No inverse transform Transformations of some prediction signals are easy to compute Distortion calculated in transform domain
QDetermine
rate
Inverse Q
Determine distortion
(HT-domain)
Compute cost(J=D+λ×R)
HT
S-Transform
PixelBuffer
PredictionMode
R
DIntra
Prediction
E
E~
QDetermine
rate
Inverse Q
Determine distortion
(HT-domain)
Compute cost(J=D+λ×R)
HT
S-Transform
PixelBuffer
PredictionMode
R
DIntra
Prediction
QDetermine
rate
Inverse Q
Determine distortion
(HT-domain)
Compute cost(J=D+λ×R)
HT
S-Transform
PixelBuffer
PredictionMode
R
DIntra
Prediction
E
E~
Digital Video TranscodingNovember 07
HT of DC Prediction
dddd
dddd
dddd
dddd
pdc HT
0000
0000
0000
00016 d
Pdc
• No HT needs to be performed
• Pdc has only one non-zero elements
Digital Video TranscodingNovember 07
HT of Horizontal Prediction
000h
hhhh
H
HHHHH
H
hhhh
hhhh
hhhh
hhhh
H
HpHP
T
T
Thh
4
4444
3333
2222
1111
4
3
2
1
h
h
h
h
h
• Only one 1-D HT is needed
• Ph has only four non-zero elements (the first column)
0
0
0
0
0
Digital Video TranscodingNovember 07
HT of Vertical Prediction
TT
TTTTT
T
Tvv
H
HHHHH
H
vvvv
vvvv
vvvv
vvvv
H
HpHP
000v
vvvv
4
4321
4321
4321
4321 4321 vvvvv
0
0
0
0
0
• Only one 1-D HT is needed
• Pv has only four non-zero elements (the first row)
Digital Video TranscodingNovember 07
Calculate Distortion in Transform Domain
Distortion in pixel domain:
Distortion in transform domain:
Digital Video TranscodingNovember 07
Ranking-based Fast Mode Decision
Two cost functions: SATD_Cost & RD_Cost
Observation: the best mode according to RD_Cost usually has smaller SATD_Cost
Proposed algorithm (mode reduction): to rank different modes using SATD_Cost, then calculate RD_Cost for top several modes Algorithm can be conducted in transform domain
Digital Video TranscodingNovember 07
Verification Experiment
Count the percentage of times when the best mode according to RD_Cost are within the best k modes ranked by SATD_Cost
k fixed as 3 in all simulations
1 2 3 4 5 6 7 80.75
0.8
0.85
0.9
0.95
1
Number Of Modes
Mod
e P
redi
ctio
n A
ccur
acy
Fast Mode Decision Algorithm Verification
Digital Video TranscodingNovember 07
Simulation Conditions
Three transcoders PDT – reference pixel domain transcoder, with fast IDCT
implemented TDT – transform domain transcoder TDT-R – transform domain transcoder with ranking-based mode
decision
Test sequences 100 frames, CIF size, 30 fps Input: MPEG-2 all-I at 6Mbps
Digital Video TranscodingNovember 07
Simulation – “Mobile”
26.5
27.5
28.5
29.5
30.5
3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0Bitrate (Mbps)
PS
NR
-Y (
dB
)
PDTTDTTDT-RPDT-RDOoff
Digital Video TranscodingNovember 07
Simulation – “Stefan”
29.0
30.0
31.0
32.0
33.0
34.0
35.0
36.0
2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5Bitrate (Mbps)
PS
NR
-Y (
dB
)
PDTTDTTDT-RPDT-RDOoff
Digital Video TranscodingNovember 07
Complexity: Run-time Results
Digital Video TranscodingNovember 07
Summary of Intra Transcoding
Efficient transcoder architecture Efficient mode decision
Transform domain distortion calculation Ranking-based mode decision
Achieved virtually same quality as reference transcoder with significantly lower complexity
Inter Transcoding Techniques
Digital Video TranscodingNovember 07
Transcoder Architecture
Inverse Q/ Inverse HT
entropy coding
MPEG-2 decoder P
rediction Motion/mode
mapping
HT/Q
Decoded picture and macroblock
data
Deblocking filter
Pixel buffers
Motion and modes
Digital Video TranscodingNovember 07
Assumptions
Input MPEG-2 frame pictures
Output H.264/AVC baseline profile (no B slices) and main
profile Frame pictures, MBAFF not considered Block partition sizes considered for motion
compensation: 16x16, 16x8, 8x16 and 8x8
Digital Video TranscodingNovember 07
Motion Mapping: Problems
MPEG-2 H.264/AVC
Frame/field motion vector Frame motion vector
B, P pictures Baseline profile has no B picture support
One motion vector per macroblock
Motion vectors for different partition sizes: 16x16, 16x8, 8x16, 8x8
Digital Video TranscodingNovember 07
Motion Mapping Algorithm
1. Field-to-frame mapping: convert MPEG-2 field motion vectors (if any) to frame vector
2. Reference picture mapping: for B to P frame type conversion
3. Block size mapping: map the MPEG-2 motion vectors to target H.264/AVC motion vectors of different block size
Algorithm: distance weighted average (DWA)
4. Motion refinement: (1+1/2+1/4) around estimated motion vectors for all block partitions
Note: for B slice output, the above mapping is performed for motion vectors of both directions
Digital Video TranscodingNovember 07
Field-to-frame Conversion
Digital Video TranscodingNovember 07
Reference Picture Mapping
I B B P
ti=3
to=1
I P P P
Input
Output
I B B P
I P P P
Input
Output
MVi,forw MVi,back
MVcol
MVo
Digital Video TranscodingNovember 07
Block Size Mapping: 16x8 8x16
A
B
a1 a2 a3
a6 a5 a4
b1 b2 b3
b5 b4
b6
A B
b4 a1
a2
a3
a5
a6
b5
b6
a4 b1
b2
b3
Digital Video TranscodingNovember 07
Block Size Mapping: 8x8
B A D C
b2
b3
a2 b4
a3
a4
c4 d2
d3 c2
c3 d4
a1 b1
c1 d1
Digital Video TranscodingNovember 07
Simulation Conditions Test sequences:
1920x1080i, 30fps, 450 frames MPEG-2 input:
30 Mbps, (30,3) H.264/AVC output:
UVLC, output bit-rate of interest ~10 Mbps Baseline profile (needs to convert B pictures to P slices) & Main profile
Comparison points Mapping algorithm B slices RD optimization
Digital Video TranscodingNovember 07
Baseline output: no B slices
HarborScene
29.0
29.5
30.0
30.5
31.0
31.5
32.0
32.5
6 8 10 12 14 16 18 20Bit rate (Mbps)
PS
NR
-Y (
dB
)
DWA+RDO
DWA+ranking
DWA
REF+RDO
Digital Video TranscodingNovember 07
Baseline output: no B slices
StreetCar
34.5
35.0
35.5
36.0
4 5 6 7 8 9 10 11 12 13Bit rate (Mbps)
PS
NR
-Y (
dB
)
DWA+RDO
DWA+Ranking
DWA
REF+RDO
Digital Video TranscodingNovember 07
Main Output: with B slices
HarborScene
29.0
29.5
30.0
30.5
31.0
31.5
32.0
32.5
4 6 8 10 12 14 16 18 20Bit rate (Mbps)
PS
NR
-Y (
dB
)
DWA
DWA_IPB
DWA_IPB+RDO
Digital Video TranscodingNovember 07
Main Output: with B slices
StreetCar
34.5
35.0
35.5
36.0
4 5 6 7 8 9 10 11 12 13
Bit rate (Mbps)
PS
NR
-Y (
dB
)
DWA
DWA_IPB
DWA_IPB+RDO
Digital Video TranscodingNovember 07
Complexity: Run-time Results
Digital Video TranscodingNovember 07
Conclusions
Efficient motion mapping schemes that directly map MPEG-2 motion vectors to H.264/AVC motion vectors
Evaluated the complexity-performance tradeoff of B-slices and RD optimization
Achieved good rate-distortion performance with low complexity
Thank you