overview of the scalable video coding extension of the h.264/avc standard kai-chao yang...

Overview of the Scalable Video Coding Extension of the H.264/AVC StandardKai-Chao Yang

12007/8 Kai-Chao Yang, NTHU, Taiwan

Outline Introduction

Problems Definition Functionality Goal Competition Applications Targets

History of SVC Structure of SVC Temporal Scalability Spatial Scalability Quality Scalability Combined Scalability Profiles of SVC Conclusions

2007/8 Kai-Chao Yang, NTHU, Taiwan 2

Introduction - problem

Non-Scalable Video Streaming Multiple video streams are needed for

heterogeneous clients


8Mb/s

6Mb/s4Mb/s

1Mb/s

512Kb/s

Introduction - definition

Scalable video stream

Scalability Removal of parts of the video bit-stream to

adapt to the various needs of end users and to varying terminal capabilities or network conditions

Sub-stream 1Sub-stream 2

Sub-stream n

…

Sub-stream k1

Sub-stream k2

Sub-stream ki…reconstruc

tion

High quality

Low quality


Introduction - functionality

Functionality of SVC Graceful degradation when “right” parts of the

bit-stream are lost Bit-rate adaptation to match the channel

throughput Format adaptation for backwards compatible

extension Power adaptation for trade-off between

runtime and quality


Introduction - mode Example

Scalability mode Fidelity reduction (SNR scalability) Picture size reduction (spatial scalability) Frame rate reduction (temporal scalability) Sharpness reduction (frequency scalability) Selection of content (ROI or object-based

scalability)2007/8 Kai-Chao Yang, NTHU, Taiwan 6

01101

10110

01010

01001

10100

01011

Enhancement 1Enhancement 2Enhancement 3Enhancement 4Enhancement 5

residualMost significant bit

Base layer

Enhancement layer

Structure of SVC


Spatial decimation

Temporal scalable coding

Temporal scalable coding

Prediction

Prediction

Base layer coding

Base layer coding

SNR scalable coding

SNR scalable coding

Multiplex

Temporal Scalability

Hierarchical prediction structures

0 1234 5 67 8 9101112 13 1415 16

0 123 4 56 7 8 9 101112 13 1415 16 17 18

0 1 2 3 4 5 6 7 8 9 1011 1213 14 15 16

Hierarchical B pictures

Non-dyadic hierarchical prediction

Hierarchical prediction with zero delay

GOP


Temporal Scalability


I

I

I

I

P P P P P P P P

P P PP

P P

P

B0 B0 B0 B0

B0B0

B0

B1 B1 B1 B1

B1 B1B2 B2 B2 B2

N=1

N=2

N=4

N=8

Temporalscalability

Video Coding Experiment with H.264/MPEG4-AVCForeman, CIF 30Hz @ 1320kbpsPerformance as a function of N

Cascaded QP assignmentQP(P) QP(B0)-3 QP(B1)-4 QP(B2)-5

This slide is copied from JVT-W132-Talk

Spatial Scalability

2007/8

H.264/AVC MCP & Intra-prediction

Hierarchical MCP & Intra-prediction

Hierarchical MCP & Intra-prediction

Base layer coding

Base layer coding

Base layer coding

texture

motion

texture

motion

texture

motion

Inter-layer prediction•Intra•Motion•Residual

Inter-layer prediction•Intra•Motion•Residual

Spatial decimation

Spatial decimation

MultiplexScalable

bit-stream

10Kai-Chao Yang, NTHU, Taiwan

H.264/AVC compatible coder

H.264/AVC compatible base layer bit-stream

Spatial Scalability

Similar to MPEG-2, H.263, and MPEG-4 Arbitrary resolution ratio The same coding order in all spatial layers Combination with temporal scalability Inter-layer prediction


Intra

IntraSpatial 0Temporal 0Temporal 1

Spatial 1Temporal 2

Spatial Scalability The prediction signals are formed by

MCP inside the enhancement layer (Temporal) (small motion and high spatial detail)

Up-sampling from the lower layer (Spatial) Average of the above two predictions (Temporal +

Spatial) Inter-layer prediction

Three kinds of inter-layer prediction Inter-layer motion prediction Inter-layer residual prediction Inter-layer intra prediction

Base mode MB Only residual are transmitted, but no additional side info.


Spatial Scalability Inter-layer motion prediction

base_mode_flag = 1 The reference layer is inter-coded Data are derived from the reference layer

MB partitioning Reference indices MVs

motion_pred_flag 1: MV predictors are obtained from the reference layer 0: MV predictors are obtained by conventional spatial

predictors.


(x1,y1)

Reference layer

1616

88

(x2,y2)

(2x2,2y2) (2x1,2y1)

Spatial Scalability

Inter-layer residual prediction residual_pred_flag = 1 Predictor

Block-wise up-sampling by a bi-linear filter from the corresponding 88 sub-MB in the reference layer

Transform block basis


Spatial Scalability

Inter-layer intra prediction base_mode_flag = 1 The reference layer is intra-coded Up-sampling from the reference layer

Luma: one-dimensional 4-tap FIR filter Chroma: bi-linear filter


Spatial Scalability Past spatial scalable video:

Inter-layer intra prediction requires completely decoding of base layer.

Multiple motion compensation and deblocking filter are needed.

Full decoding + inter-layer prediction: complexity > simulcast.

Single-loop decoding Inter-layer intra prediction is restricted to MBs for

which the co-located base layer is intra-coded


Spatial Scalability

Single-loop vs. multi-loop decoding

2007/8 Kai-Chao Yang, NTHU, Taiwan 17This slide is copied from http://iphome.hhi.de/wiegand/assets/pdfs/H264AVC_SVC.pdf

Inter

I B P

Spatial Scalability

Generalized spatial scalability in SVC Arbitrary ratio

Neither the horizontal nor the vertical resolution can decrease from one layer to the next.

Cropping Containing new regions Higher quality of interesting regions


Spatial Scalability

Encoder control (JSVM) Base layer

p0’ is optimized for base layer

Enhancement layer p1’ is optimized for enhancement layer

Decisions of p1 depend on p0

Efficient base layer coding but inefficient enhancement layer coding


)}()({minarg' 00000}{

00

pRpDpp

)}|()|({minarg' 0111011}|{

101

ppRppDppp

Spatial Scalability

Encoder control (optimization) Base layer

Considering enhancement layer coding Eliminating p0’s disadvantaging enhancement layer coding

Enhancement layer

No change w

w = 0: JSVM encoder control w = 1: Single-loop encoder control (base layer is not

controlled)


)]}|()|([)]()()[1{(minarg' 011101100000}|,{

0010

ppRppDwpRpDwpppp

Quality Scalability

Coarse-grain quality scalability (CGS) A special case of spatial scalability

Identical sizes for base and enhancement layers Smaller quantization step sizes of for higher

enhancement residual layers Designed for only several selected bit-rate

points Supported bit-rate points = Number of layers

Switch can only occur at IDR access units


Quality Scalability

Medium-grain quality scalability (MGS) More enhancement layers are supported

Refinement quality layers of residual Key pictures

Drift control Switch can occur at any access units CGS + key pictures + refinement quality layers


Quality Scalability

Drift control Drift: The effect caused by unsynchronized MCP

at the encoder and decoder side Trade-off of MCP in quality SVC

Coding efficiency drift


Quality Scalability

MPEG-4 quality scalability with FGS

Base layer is stored and used for MCP of following pictures Drift: Drift free Complexity: Low Efficiency: Efficient based layer but inefficient enhancement

layer Refinement data are not used for MCP

Base layer

Refinement(possibly lost or truncated)

2007/8 24Kai-Chao Yang, NTHU, Taiwan

Quality Scalability MPEG-2 quality scalability (without FGS)

Only 1 reference picture is stored and used for MCP of following pictures

Drift: Both base layer and enhancement layer Frequent intra updates is necessary

Complexity: Low Efficiency: Efficient enhancement layer but inefficient base

layer


Base layer


Quality Scalability 2-loop prediction

Several closed encoder loops run at different bit-rate points in a layered structure

Drift: Enhancement layer Complexity: High Efficiency: Efficient base layer and medium efficient

enhancement layer

Base layer



Quality Scalability

SVC concepts

Key picture Trade-off between coding efficiency and drift MPEG-4 FGS: All key pictures MPEG-2 quality scalability: No key pictures

Base layer



Quality Scalability

Drift control with hierarchical prediction

Key pictures Based layer is stored and used for the MCP of following pictures

Other pictures Enhancement layer is stored and used for the MCP of following

pictures GOP size adjusts the trade-off between enhancement layer

coding efficiency and drift

Base layer



P P PB1B1B2 B2 B2 B2

Combined Scalability

SVC encoder structure

De

pen

den

cy laye

r


The same motion/prediction

information

The same motion/prediction

information

Temporal Decomposition

Dependency and Quality refinement layers



D = 2

Q = 2

Q = 1

Q = 0

D = 1

Q = 2

Q = 1

Q = 0

D = 0

Q = 2

Q = 1

Q = 0

Scalable bit-stream



T0

D1

Q1

Q0

D0

Q1

Q0

T2 T1 T2 T0


Bit-stream format


NAL unit header NAL unit header extension NAL unit payload

1 1 1 1 1 323362

P T D Q

P (priority_id): indicates the importance of a NAL unitT (temporal_id): indicates temporal levelD (dependency_id): indicates spatial/CGS layerQ (quality_id): indicates MGS/FGS layer


Bit-stream switching Inside a dependency layer

Switching everywhere Outside a dependency layer

Switching up only at IDR access units Switching down everywhere if using multiple-loop

decoding


Profiles of SVC

Scalable Baseline For conversational and surveillance applications

requiring low decoding complexity Spatial scalability: fixed ratio (1, 1.5, or 2) and MB-

aligned cropping Temporal and quality scalability: arbitrary No interlaced coding tools B-slices, weighted prediction, CABAC, and 8x8 luma

transform The base layer conforms Baseline profile of H.264/AVC


Profiles of SVC

Scalable High For broadcast, streaming, and storage Spatial, temporal, and quality scalability:

arbitrary The base layer conforms High profile of

H.264/AVC Scalable High Intra

Scalable High + all IDR pictures


References H. Schwarz, D. Marpe, and T. Wiegand, “Overview of

the Scalable Video Coding Extension of the H.264/AVC Standard,” CSVT 2007.

T. Wiegand, “Scalable Video Coding,” Joint Video Team, doc. JVT-W132, San Jose, USA, April 2007.

T. Wiegand, “Scalable Video Coding,” Digital Image Communication, Course at Technical University of Berlin, 2006. (Available on http://iphome.hhi.de/wiegand/dic.htm)

H. Schwarz, D. Marpe, and T. Wiegand, “Constrained Inter-Layer Prediction for Single-Loop Decoding in Spatial Scalability,” Proc. of ICIP’05.


overview of the scalable video coding extension of the h.264/avc standard kai-chao yang...

Documents