overview of the scalable video coding extension of the h.264/avc standard kai-chao yang...
TRANSCRIPT
Overview of the Scalable Video Coding Extension of the H.264/AVC StandardKai-Chao Yang
12007/8 Kai-Chao Yang, NTHU, Taiwan
Outline Introduction
Problems Definition Functionality Goal Competition Applications Targets
History of SVC Structure of SVC Temporal Scalability Spatial Scalability Quality Scalability Combined Scalability Profiles of SVC Conclusions
2007/8 Kai-Chao Yang, NTHU, Taiwan 2
Introduction - problem
Non-Scalable Video Streaming Multiple video streams are needed for
heterogeneous clients
2007/8 Kai-Chao Yang, NTHU, Taiwan 3
8Mb/s
6Mb/s4Mb/s
1Mb/s
512Kb/s
Introduction - definition
Scalable video stream
Scalability Removal of parts of the video bit-stream to
adapt to the various needs of end users and to varying terminal capabilities or network conditions
Sub-stream 1Sub-stream 2
Sub-stream n
…
Sub-stream k1
Sub-stream k2
Sub-stream ki…reconstruc
tion
High quality
Low quality
42007/8 Kai-Chao Yang, NTHU, Taiwan
Introduction - functionality
Functionality of SVC Graceful degradation when “right” parts of the
bit-stream are lost Bit-rate adaptation to match the channel
throughput Format adaptation for backwards compatible
extension Power adaptation for trade-off between
runtime and quality
2007/8 Kai-Chao Yang, NTHU, Taiwan 5
Introduction - mode Example
Scalability mode Fidelity reduction (SNR scalability) Picture size reduction (spatial scalability) Frame rate reduction (temporal scalability) Sharpness reduction (frequency scalability) Selection of content (ROI or object-based
scalability)2007/8 Kai-Chao Yang, NTHU, Taiwan 6
01101
10110
01010
01001
10100
01011
Enhancement 1Enhancement 2Enhancement 3Enhancement 4Enhancement 5
residualMost significant bit
Base layer
Enhancement layer
Structure of SVC
2007/8 Kai-Chao Yang, NTHU, Taiwan 7
Spatial decimation
Temporal scalable coding
Temporal scalable coding
Prediction
Prediction
Base layer coding
Base layer coding
SNR scalable coding
SNR scalable coding
Multiplex
Temporal Scalability
Hierarchical prediction structures
0 1234 5 67 8 9101112 13 1415 16
0 123 4 56 7 8 9 101112 13 1415 16 17 18
0 1 2 3 4 5 6 7 8 9 1011 1213 14 15 16
Hierarchical B pictures
Non-dyadic hierarchical prediction
Hierarchical prediction with zero delay
GOP
82007/8 Kai-Chao Yang, NTHU, Taiwan
Temporal Scalability
2007/8 Kai-Chao Yang, NTHU, Taiwan 9
I
I
I
I
P P P P P P P P
P P PP
P P
P
B0 B0 B0 B0
B0B0
B0
B1 B1 B1 B1
B1 B1B2 B2 B2 B2
N=1
N=2
N=4
N=8
Temporalscalability
Video Coding Experiment with H.264/MPEG4-AVCForeman, CIF 30Hz @ 1320kbpsPerformance as a function of N
Cascaded QP assignmentQP(P) QP(B0)-3 QP(B1)-4 QP(B2)-5
This slide is copied from JVT-W132-Talk
Spatial Scalability
2007/8
H.264/AVC MCP & Intra-prediction
Hierarchical MCP & Intra-prediction
Hierarchical MCP & Intra-prediction
Base layer coding
Base layer coding
Base layer coding
texture
motion
texture
motion
texture
motion
Inter-layer prediction•Intra•Motion•Residual
Inter-layer prediction•Intra•Motion•Residual
Spatial decimation
Spatial decimation
MultiplexScalable
bit-stream
10Kai-Chao Yang, NTHU, Taiwan
H.264/AVC compatible coder
H.264/AVC compatible base layer bit-stream
Spatial Scalability
Similar to MPEG-2, H.263, and MPEG-4 Arbitrary resolution ratio The same coding order in all spatial layers Combination with temporal scalability Inter-layer prediction
2007/8 Kai-Chao Yang, NTHU, Taiwan 11
Intra
IntraSpatial 0Temporal 0Temporal 1
Spatial 1Temporal 2
Spatial Scalability The prediction signals are formed by
MCP inside the enhancement layer (Temporal) (small motion and high spatial detail)
Up-sampling from the lower layer (Spatial) Average of the above two predictions (Temporal +
Spatial) Inter-layer prediction
Three kinds of inter-layer prediction Inter-layer motion prediction Inter-layer residual prediction Inter-layer intra prediction
Base mode MB Only residual are transmitted, but no additional side info.
2007/8 Kai-Chao Yang, NTHU, Taiwan 12
Spatial Scalability Inter-layer motion prediction
base_mode_flag = 1 The reference layer is inter-coded Data are derived from the reference layer
MB partitioning Reference indices MVs
motion_pred_flag 1: MV predictors are obtained from the reference layer 0: MV predictors are obtained by conventional spatial
predictors.
2007/8 Kai-Chao Yang, NTHU, Taiwan 13
(x1,y1)
Reference layer
1616
88
(x2,y2)
(2x2,2y2) (2x1,2y1)
Spatial Scalability
Inter-layer residual prediction residual_pred_flag = 1 Predictor
Block-wise up-sampling by a bi-linear filter from the corresponding 88 sub-MB in the reference layer
Transform block basis
2007/8 Kai-Chao Yang, NTHU, Taiwan 14
Spatial Scalability
Inter-layer intra prediction base_mode_flag = 1 The reference layer is intra-coded Up-sampling from the reference layer
Luma: one-dimensional 4-tap FIR filter Chroma: bi-linear filter
2007/8 Kai-Chao Yang, NTHU, Taiwan 15
Spatial Scalability Past spatial scalable video:
Inter-layer intra prediction requires completely decoding of base layer.
Multiple motion compensation and deblocking filter are needed.
Full decoding + inter-layer prediction: complexity > simulcast.
Single-loop decoding Inter-layer intra prediction is restricted to MBs for
which the co-located base layer is intra-coded
2007/8 Kai-Chao Yang, NTHU, Taiwan 16
Spatial Scalability
Single-loop vs. multi-loop decoding
2007/8 Kai-Chao Yang, NTHU, Taiwan 17This slide is copied from http://iphome.hhi.de/wiegand/assets/pdfs/H264AVC_SVC.pdf
Inter
I B P
Spatial Scalability
Generalized spatial scalability in SVC Arbitrary ratio
Neither the horizontal nor the vertical resolution can decrease from one layer to the next.
Cropping Containing new regions Higher quality of interesting regions
2007/8 Kai-Chao Yang, NTHU, Taiwan 18
Spatial Scalability
Encoder control (JSVM) Base layer
p0’ is optimized for base layer
Enhancement layer p1’ is optimized for enhancement layer
Decisions of p1 depend on p0
Efficient base layer coding but inefficient enhancement layer coding
2007/8 Kai-Chao Yang, NTHU, Taiwan 19
)}()({minarg' 00000}{
00
pRpDpp
)}|()|({minarg' 0111011}|{
101
ppRppDppp
Spatial Scalability
Encoder control (optimization) Base layer
Considering enhancement layer coding Eliminating p0’s disadvantaging enhancement layer coding
Enhancement layer
No change w
w = 0: JSVM encoder control w = 1: Single-loop encoder control (base layer is not
controlled)
2007/8 Kai-Chao Yang, NTHU, Taiwan 20
)]}|()|([)]()()[1{(minarg' 011101100000}|,{
0010
ppRppDwpRpDwpppp
Quality Scalability
Coarse-grain quality scalability (CGS) A special case of spatial scalability
Identical sizes for base and enhancement layers Smaller quantization step sizes of for higher
enhancement residual layers Designed for only several selected bit-rate
points Supported bit-rate points = Number of layers
Switch can only occur at IDR access units
2007/8 Kai-Chao Yang, NTHU, Taiwan 21
Quality Scalability
Medium-grain quality scalability (MGS) More enhancement layers are supported
Refinement quality layers of residual Key pictures
Drift control Switch can occur at any access units CGS + key pictures + refinement quality layers
2007/8 Kai-Chao Yang, NTHU, Taiwan 22
Quality Scalability
Drift control Drift: The effect caused by unsynchronized MCP
at the encoder and decoder side Trade-off of MCP in quality SVC
Coding efficiency drift
2007/8 Kai-Chao Yang, NTHU, Taiwan 23
Quality Scalability
MPEG-4 quality scalability with FGS
Base layer is stored and used for MCP of following pictures Drift: Drift free Complexity: Low Efficiency: Efficient based layer but inefficient enhancement
layer Refinement data are not used for MCP
Base layer
Refinement(possibly lost or truncated)
2007/8 24Kai-Chao Yang, NTHU, Taiwan
Quality Scalability MPEG-2 quality scalability (without FGS)
Only 1 reference picture is stored and used for MCP of following pictures
Drift: Both base layer and enhancement layer Frequent intra updates is necessary
Complexity: Low Efficiency: Efficient enhancement layer but inefficient base
layer
2007/8 Kai-Chao Yang, NTHU, Taiwan 25
Base layer
Refinement(possibly lost or truncated)
Quality Scalability 2-loop prediction
Several closed encoder loops run at different bit-rate points in a layered structure
Drift: Enhancement layer Complexity: High Efficiency: Efficient base layer and medium efficient
enhancement layer
Base layer
Refinement(possibly lost or truncated)
2007/8 26Kai-Chao Yang, NTHU, Taiwan
Quality Scalability
SVC concepts
Key picture Trade-off between coding efficiency and drift MPEG-4 FGS: All key pictures MPEG-2 quality scalability: No key pictures
Base layer
Refinement(possibly lost or truncated)
2007/8 27Kai-Chao Yang, NTHU, Taiwan
Quality Scalability
Drift control with hierarchical prediction
Key pictures Based layer is stored and used for the MCP of following pictures
Other pictures Enhancement layer is stored and used for the MCP of following
pictures GOP size adjusts the trade-off between enhancement layer
coding efficiency and drift
Base layer
Refinement(possibly lost or truncated)
2007/8 28Kai-Chao Yang, NTHU, Taiwan
P P PB1B1B2 B2 B2 B2
Combined Scalability
SVC encoder structure
De
pen
den
cy laye
r
2007/8 29Kai-Chao Yang, NTHU, Taiwan
The same motion/prediction
information
The same motion/prediction
information
Temporal Decomposition
Dependency and Quality refinement layers
Combined Scalability
2007/8 Kai-Chao Yang, NTHU, Taiwan 30
D = 2
Q = 2
Q = 1
Q = 0
D = 1
Q = 2
Q = 1
Q = 0
D = 0
Q = 2
Q = 1
Q = 0
Scalable bit-stream
Combined Scalability
Bit-stream format
2007/8 Kai-Chao Yang, NTHU, Taiwan 32
NAL unit header NAL unit header extension NAL unit payload
1 1 1 1 1 323362
P T D Q
P (priority_id): indicates the importance of a NAL unitT (temporal_id): indicates temporal levelD (dependency_id): indicates spatial/CGS layerQ (quality_id): indicates MGS/FGS layer
Combined Scalability
Bit-stream switching Inside a dependency layer
Switching everywhere Outside a dependency layer
Switching up only at IDR access units Switching down everywhere if using multiple-loop
decoding
2007/8 Kai-Chao Yang, NTHU, Taiwan 33
Profiles of SVC
Scalable Baseline For conversational and surveillance applications
requiring low decoding complexity Spatial scalability: fixed ratio (1, 1.5, or 2) and MB-
aligned cropping Temporal and quality scalability: arbitrary No interlaced coding tools B-slices, weighted prediction, CABAC, and 8x8 luma
transform The base layer conforms Baseline profile of H.264/AVC
2007/8 Kai-Chao Yang, NTHU, Taiwan 34
Profiles of SVC
Scalable High For broadcast, streaming, and storage Spatial, temporal, and quality scalability:
arbitrary The base layer conforms High profile of
H.264/AVC Scalable High Intra
Scalable High + all IDR pictures
2007/8 Kai-Chao Yang, NTHU, Taiwan 35
References H. Schwarz, D. Marpe, and T. Wiegand, “Overview of
the Scalable Video Coding Extension of the H.264/AVC Standard,” CSVT 2007.
T. Wiegand, “Scalable Video Coding,” Joint Video Team, doc. JVT-W132, San Jose, USA, April 2007.
T. Wiegand, “Scalable Video Coding,” Digital Image Communication, Course at Technical University of Berlin, 2006. (Available on http://iphome.hhi.de/wiegand/dic.htm)
H. Schwarz, D. Marpe, and T. Wiegand, “Constrained Inter-Layer Prediction for Single-Loop Decoding in Spatial Scalability,” Proc. of ICIP’05.
2007/8 Kai-Chao Yang, NTHU, Taiwan 36