multimedia communications lecture 10: video standards part

Multimedia CommunicationsLecture 10: Video StandardsPart I. Videophone and video conferencing: H.261/H.263

Dr. Tian-Sheuan [email protected]. Electronics EngineeringNational Chiao-Tung University

Dept. Electronics Engineering,N

ational Chiao T

ung University

Adapted from Prof. Hang’s slides

Introduction to Video Standard


ational Chiao T

ung University

2

Institute of Electronics,National C

hiao Tung U

niversity

Behind the Scene

• Why we can do compression?– Observation

• Significant amount of statistical and subjective redundancy within and between frames

– Statistical redundancy• Lossless compression• e.g. 000000000000000… -> run length coding, arithmetic coding,

huffman coding– Subjective redundancy

• Lossy compression• Explore characteristics of Human Visual System

– Not sensitive to high frequency component• Spatial redundancy

– DCT transform, quantized high freq. component• Temporal redundancy

– Motion estimation

3


hiao Tung U

niversity

The Scope of Picture and Video Coding Standardization• Only the Syntax and Decoder are standardized:

– Permits optimization beyond the obvious– Permits complexity reduction for implementability– Provides no guarantees of Quality

Pre-Processing EncodingSource

DestinationPost-Processing& Error Recovery

Decoding

Scope of Standard

4


hiao Tung U

niversity

Development of Coding Tools and Standards

DPCM 1952-1980

Transform Coding 1965-1980

Motion Compensated Prediction 1972-1989

Entropy Coding 1949-1976

H.261 1984-1990

MPEG1 1988-1992

MPEG2 1991-1994

JPEG 1984-1992

MPEG4

H.263

1950s 1960s 1970s 1980s 1990s

5


hiao Tung U

niversity

ITU/MPEG Standards• H.261

– ITU H.261– Optimized for CIF@384Kbps, focus on video phone over ISDN– First design (late ‘90) embodying typical structure that dominates today

• 16x16 macroblock motion compensation, 8x8 DCT, scalar quantization, and variable-length coding

• MPEG-1– ISO/IEC 11172– 1993 IS, design focus on VHS quality (352x240)@1.5Mbps

• MPEG-2– ISO/IEC 13818– 1994 IS, Optimized at “NTSC quality” CCIR601 video@6-10Mbps

• H.263– ITU H.263– Focus on video phone over phone lines/wireless

• MPEG-4– officially ISO/IEC 14496– Part 2. video : 2001 IS, content based video coding, interactive video– Part 10. advance video coding (AVC) – ITU H.264

• 2004 IS, 50% bit rate reduction than other video standard

H.261


ational Chiao T

ung University

7


hiao Tung U

niversity

ITU-T Video Standard: H.261 - History

• CCITT Study Group (SG) XV — Videophone and videoconferencing at bit rate: ~40 kb/s -- 2 Mb/s

• Defines only the decoder; a reference encoder model was developed to test the decoder.

• History:– Dec. 1984: The specialists group established.– 1984~1988: Algorithm developed for nx384 kb/s, n =

1, …, 5.– 1989: Modified for px64 kb/s, p = 1, …, 30.– Dec. 1990: Standards approved

8


hiao Tung U

niversity

ITU-T Multimedia Communications Standards

/3

9


hiao Tung U

niversity

H.324 Terminal(multimedia communication over PSTN)

10


hiao Tung U

niversity

H.261 Overall Codec System

11


hiao Tung U

niversity

Quick View of H.261

• ITU-T H.261: The basis of modern video compression– The first widespread practical success– First design (late ‘90) embodying typical structure that

dominates today• 16x16 macroblock motion compensation, 8x8 DCT, scalar

quantization, and variable-length coding– Other key aspects

• loop filter, integer-pel motion compensation accuracy, 2-D VLC for coefficients

– Operated at 64-2048 kbps– Still in use

• although mostly as a backward compatibility feature –overtaken by H.263

12


hiao Tung U

niversity

Picture Partition (1)• Picture size: CIF, QCIF• Macroblock (MB): Contains six 8x8 blocks (motion compensation,

quantizer adjustment, …)

Cb CrY

• Group of Block (GOB): Contains 33 MB‘s (synchronization, quantizer adjustment, …)

1 2 3 4 5 6 7 8 9 10 1112 13 14 15 16 17 18 19 20 21 2223 24 25 26 27 28 29 30 31 32 33

43

215 6

13


hiao Tung U

niversity

Picture Partition (2)

• Picture: Contains 3 or 12 GOB‘s (picture sync., time reference, …)

14


hiao Tung U

niversity

Syntax

• Picture Layer: Picture Start Code (PSC: 20 bits), Temporal Reference (TR: 5), Picture Type (PTYPE), Picture Extra Insertion(PEI), Picture Spare (PSPARE)

• GOB Layer: Group Start Code (GBSC), Group Number (GN), Quantizer (GQUANT), Extra Insertion (GEI), ...

15


hiao Tung U

niversity

Syntax (cont.)

• Macroblock (MB) Layer: MB Address — deffenrential (MBA), MB Type (MTYPE), Quantizer (MQUANT), Motion Vector Data —differential (MVD), Coded Block Pattern (CBP)

16


hiao Tung U

niversity

Syntax (cont.)

• Block Layer: (DCT) Transform Coefficients (TCOEFF), End of Block (EOB: ‘10’)

17


hiao Tung U

niversity

H.261 Frame Sequence

18


hiao Tung U

niversity

H.261 Frame Sequence• Two types of image frames are defined: Intra-frames (I-frames) and

Inter-frames (P-frames):

– I-frames are treated as independent images. Transform coding method similar to JPEG is applied within each I-frame, hence “intra".

– P-frames are not independent: coded by a forward predictive codingmethod (prediction from a previous P-frame is allowed --- not just from a previous I-frame).

– Temporal redundancy removal is included in P-frame coding, whereas I-frame coding performs only spatial redundancy removal.

– To avoid propagation of coding errors, an I-frame is usually sent a couple of times in each second of the video. (intra refresh)

• Motion vectors in H.261 are always measured in units of full pixeland they have a limited range of 15 pixels, i.e., p = 15.

19


hiao Tung U

niversity

Intra-frame (I-frame) Coding• Macroblocks are of size 16x16 pixels for the Y frame, and 8x8 for Cb

and Cr frames, since 4:2:0 chroma subsampling is employed. A macroblock consists of four Y, one Cb, and one Cr 8x8 blocks.

• For each 8x8 block a DCT transform is applied, the DCT coefficients then go through quantization zigzag scan and entropy coding.

20


hiao Tung U

niversity

P-Frame (Inter-frame) Coding

21


hiao Tung U

niversity

P-Frame (Inter-frame) Coding

• The P-frame coding encodes the difference macroblock(not the Target macroblock itself).

• Sometimes, a good match cannot be found, i.e., the prediction error exceeds a certain acceptable level.– The MB itself is then encoded (treated as an Intra MB) and in this

case it is termed a non-motion compensated MB.

• For motion vector, the difference MVD is sent for entropy coding:– MVD = MVPreceding −MVCurrent

22


hiao Tung U

niversity

H.261 Encoder (Nonstandard)

Loop filter

23


hiao Tung U

niversity

H.261 Decoder (Standard)

24


hiao Tung U

niversity

A Glance at Syntax of H.261 Video Bitstream

25


hiao Tung U

niversity

Parameter Selection and Rate Control

• MTYPE (intra vs. inter, zero vs. non-zero MV in inter, loop filter on/off)

• CBP (which blocks in a MB have non-zero DCT coefficients)

• MQUANT (allow the changes of the quantizer step size at the MB level)– should be varied to satisfy the rate constraint

• MV (ideally should be determined not only by prediction error but also the total bits used for coding MV and DCT coefficients of prediction error)

26


hiao Tung U

niversity

Quantization

• 8x8 DCT; zig-zag scan• Uniform quantizer with a dead-zone:

Odd QUANTREC = QUANT•(2•level + 1); for level > 0REC = QUANT•(2•level – 1); for level < 0

Even QUANTREC = QUANT•(2•level + 1) – 1; for level > 0REC = QUANT•(2•level – 1) + 1; for level < 0REC = 0; for level = 0

QUANT value: 1 – 31 (5 bits); may be changed for every MB and /or GOBException: Intra-block dc coeff — step size = 8 (fixed) and no dead-zone

27


hiao Tung U

niversity

Quantization

• The quantization in H.261 uses a constant step size, for all DCT coefficients within a macroblock.

• If we use DCT and QDCT to denote the DCT coefficients before and after the quantization, then for DC coefficients in Intra mode

– For other coefficients (floor function for center deadzone)

– Scale: an integer in the range of [1, 31].

28


hiao Tung U

niversity

DCT Coefficient Quantization

Deadzone:To avoid too many small coefficients being coded, which are typically due to noise

29


hiao Tung U

niversity

Variable Length Coding • DCT coefficients are converted into runlength representations and then coded

using VLC (Huffman coding for each pair of symbols)– Symbol: (Zero run-length, non-zero value range)

• Other information are also coded using VLC (Huffman coding)

Bits 1 2 3 4 5 6 7 8 …. 15 16 ………. 128 0 1 2 3 4 5 6 7 8 . . .1112 . .

27 . .63

2(3 ) 5 6 8 9 9 11 13 ….. 14 20 …….. 20 4 7 9 11 13 14 14 20 ….. 5 8 11 13 14 20 …. 6 9 13 14 20 … 6 11 13 20 … 7 11 14 20 … 7 13 20 … 7 13 20 … 8 13 20 … . . . . 20 …

9 20 … 9 . 20 … . 20 ….

20 … 20 20 20

20-bits fixed length codes

Escape(6 bits)+Run(6)+Level(8)

R

u

n

↓

Absolute Level→

30


hiao Tung U

niversity

Motion Estimation and Compensation

• Integer-pel accuracy in the range [-16,16]• Methods for generating the MVs are not specified in the

standard – Standards only define the bitstream syntax, or the decoder

operation)

• MVs coded differentially (DMV)• Encoder and decoder uses the decoded MVs to perform

motion compensation • Loop-filtering can be applied to suppress propagation of

coding noise temporally– Separable filter [1/4,1/2,1/4]– Loop filter can be turned on or off

H.263


ational Chiao T

ung University

32


hiao Tung U

niversity

Very Low Bit Rate Coding• ITU-T Study Group (SG) 15/16: Very low Bit-Rate Visual Telephony

(LBC)• History:

Sept. 1993: Started new work item.Near-term: Improving H.261

Nov. 1995 — H.263 decidedJan. 1998 — H.263+ (H.263 Ver.2) decided2000 — Finished H.263++ ( H.263 Ver.3)Long-term: Draft H.26L H.264 (2003)— Different from H.261 (H.263)— Collaborate with MPEG-4 (JVT = AVC)

• Goal: Improved quality at lower rates• Result: Significantly better quality at lower rates

– Better video at 18-24 Kbps than H.261 at 64 Kbps– Enable video phone over regular phone lines (28.8 Kbps) or wireless

modem

33


hiao Tung U

niversity

Different From H.261

• A combination of H.261 and MPEG• Various picture formats such as sub-QCIF, 4CIF,

etc.• Half-pel motion compensation (~MPEG)• No loop filter• No microblock addressing (included in MB header)• Quantizer stepsize: 5-bit in picture and GOB

layers; differential MQUANT stepsize: 2-bit in MB layer

• 3D VLC for transform coeffs.• Four negotiable options

34


hiao Tung U

niversity

Video Format and Picture Partition in H.263

• Wider application range from sub-QCIF to 16CIF

35


hiao Tung U

niversity

H.263 Typical Encoder (Nonstandard)

• A general source coder model

36


hiao Tung U

niversity

DCT, Quantization and 3-D VLC

• DCT and Zig-zag scan: same as H.261 (JPEG)• Inverse-Quantization: same at that of H.261• At MB layer, the QUANT value can only be

increased / decreased by 1 and 2• 3-D VLC: An event (symbol) is made of (Last, Run,

Level).– ‘Last’ = 1 indicates the last coeff.

37


hiao Tung U

niversity

3-D VLC

Last Run Level (Bits) VCL Code0 0 1 3 10s0 0 2 5 1111s0 0 3 7 0101 01s

…1 0 1 5 0111s1 0 2 10 0000 1100 1s1 0 3 12 0000 0000 101s

•••

38


hiao Tung U

niversity

Motion Estimation: Median Prediction for MV

1. horizontal and vertical components are seperatedly calculated2. The difference between MV and the predictor is VLC-coded

39


hiao Tung U

niversity

Motion Estimation: Half-Pel Precision

• Half-pixel prediction by bilinear interpolation– to reduce the prediction error,– default range MV(u; v) are now [−16; 15:5].– Half pels are generated by bilinear interpolation

40


hiao Tung U

niversity

H.263 Negotiable Options

-- Negotiable between encoder and decoderUnrestricted motion vectors (UMV) mode:Motion vectors are allowed to point outside the pictureSyntax-based arithmetic coding (SAC) mode: VLC is replaced by arithmetic codingAdvanced prediction (AP) mode: One MV for each 8x8 blockPB-frame (PB) mode: Introduce a ‘constrained version’ of (MPEG) B-frame

41


hiao Tung U

niversity

Advanced Prediction Mode

Four MV's can be used in a MB: The 1st (differential) MV is MVD and the rest, MVD2-4The MV predictor for each 8x8 block is formed by using 3 nearby MV's as shown below

42


hiao Tung U

niversity

AP Mode: Overlapped Motion Compensation

Each pel in the current 8x8 luminance block is predicted using the weighted sum of the pels of three previous frame predictors: current, left (or right), top (or bottom). For example, the upper left 4x4 corners uses the current, top and left predictors; the upper right 4x4 corners uses the current, top and right predictors; etc.The current predictors is the previous-frame pels displaced using the current MV, the left predictor is displaced using the left block MV, etc.Four MV‘s enable more accurate MV for each block. Overlapped compensation achieves smooth transitionbetween nearby blocks.

43


hiao Tung U

niversity

Overlapped Motion Compensation

where is the pels displaced by the current MV, is the pel displaced by (MV of the top or the

bottom block), is the pel displaced by (MV of the left or the right block).

8/)4),(),(),(),(),(),((),( 0

+×+×+×=

jiHjisjiHjirjiHjiqjip

s

r

),( jiq),( jir rMV

),( jis MVs

0MV

44


hiao Tung U

niversity

Overlapped MC (cont.)

45


hiao Tung U

niversity

Motion Estimation: PB-Picture Mode

PB-picture mode codes two pictures as a group. The second picture (P) is coded first, then the first picture (B) is coded using both the P-picture and the previously coded picture. This is to avoid the reordering of pictures required in the normal B-mode. But it still requires additional coding delay than P-frames only.

In a B-block, forward prediction (predicted from the previous frame) can be used for all pixels;backward prediction (from the future frame) is only used for those pels that the backward motion vectoraligns with pels of the current MB. Pixels in the “white area” use only forward prediction.

Under large motions, PB-frames do not compress as well as B-frames. An improved PB-frame mode was defined in H.263+, that removes the previous restriction.

46


hiao Tung U

niversity

Performance of H.261 and H.263

Forman, QCIF, 12.5 Hz

Integer MC, +/- 16

Half-pel MC, +/- 32

Integer MC, +/- 16, loop filter

Integer MC, +/- 32

OBMC, 4 MVs, etc

47


hiao Tung U

niversity

Advantages of Options

(Girod and et al., Performance of the H.263 Video Compression Standard, VLSI Signal Proc., 1997)At 64 kbps, QCIF pictures, ~12.5 frames / secH.261 vs. H.263: (1) w/o options ~2 dB PSNRimprovement; (2) with all options ~ 3 dB.Key factor: Half-pel motion estimation.H.263 SAC option: 0.2 dB improvement (vs. w/o)H.263 AP option: 1.2 dB (vs. w/o)H.263 PB option: P-pic PSNR is higher but B-pic PSNR is lower; Better subjective quality

48


hiao Tung U

niversity

H.263+ (H.263 v2)

Enhance H.263 with additional options (Draft 20, Sept. ‘97)Coding efficiency:

— Advanced intra coding mode— Deblocking filter mode— Improved PB-frames mode— Reference picture resampling mode— Alternative inter VLC mode— Modified quantization mode

49


hiao Tung U

niversity

H.263+ (cont.)

Error robustness:— Slice structured mode— Referenced picture selection mode— Independently segmented decoding modeEnhanced Communication:— Temporal, SNR, and spatial scalability mode— Reduced-resolution updated mode

multimedia communications lecture 10: video standards part

Documents