multimedia communications lecture 10: video standards part
TRANSCRIPT
Multimedia CommunicationsLecture 10: Video StandardsPart I. Videophone and video conferencing: H.261/H.263
Dr. Tian-Sheuan [email protected]. Electronics EngineeringNational Chiao-Tung University
Dept. Electronics Engineering,N
ational Chiao T
ung University
Adapted from Prof. Hang’s slides
Introduction to Video Standard
Dept. Electronics Engineering,N
ational Chiao T
ung University
2
Institute of Electronics,National C
hiao Tung U
niversity
Behind the Scene
• Why we can do compression?– Observation
• Significant amount of statistical and subjective redundancy within and between frames
– Statistical redundancy• Lossless compression• e.g. 000000000000000… -> run length coding, arithmetic coding,
huffman coding– Subjective redundancy
• Lossy compression• Explore characteristics of Human Visual System
– Not sensitive to high frequency component• Spatial redundancy
– DCT transform, quantized high freq. component• Temporal redundancy
– Motion estimation
3
Institute of Electronics,National C
hiao Tung U
niversity
The Scope of Picture and Video Coding Standardization• Only the Syntax and Decoder are standardized:
– Permits optimization beyond the obvious– Permits complexity reduction for implementability– Provides no guarantees of Quality
Pre-Processing EncodingSource
DestinationPost-Processing& Error Recovery
Decoding
Scope of Standard
4
Institute of Electronics,National C
hiao Tung U
niversity
Development of Coding Tools and Standards
DPCM 1952-1980
Transform Coding 1965-1980
Motion Compensated Prediction 1972-1989
Entropy Coding 1949-1976
H.261 1984-1990
MPEG1 1988-1992
MPEG2 1991-1994
JPEG 1984-1992
MPEG4
H.263
1950s 1960s 1970s 1980s 1990s
5
Institute of Electronics,National C
hiao Tung U
niversity
ITU/MPEG Standards• H.261
– ITU H.261– Optimized for CIF@384Kbps, focus on video phone over ISDN– First design (late ‘90) embodying typical structure that dominates today
• 16x16 macroblock motion compensation, 8x8 DCT, scalar quantization, and variable-length coding
• MPEG-1– ISO/IEC 11172– 1993 IS, design focus on VHS quality (352x240)@1.5Mbps
• MPEG-2– ISO/IEC 13818– 1994 IS, Optimized at “NTSC quality” CCIR601 video@6-10Mbps
• H.263– ITU H.263– Focus on video phone over phone lines/wireless
• MPEG-4– officially ISO/IEC 14496– Part 2. video : 2001 IS, content based video coding, interactive video– Part 10. advance video coding (AVC) – ITU H.264
• 2004 IS, 50% bit rate reduction than other video standard
H.261
Dept. Electronics Engineering,N
ational Chiao T
ung University
7
Institute of Electronics,National C
hiao Tung U
niversity
ITU-T Video Standard: H.261 - History
• CCITT Study Group (SG) XV — Videophone and videoconferencing at bit rate: ~40 kb/s -- 2 Mb/s
• Defines only the decoder; a reference encoder model was developed to test the decoder.
• History:– Dec. 1984: The specialists group established.– 1984~1988: Algorithm developed for nx384 kb/s, n =
1, …, 5.– 1989: Modified for px64 kb/s, p = 1, …, 30.– Dec. 1990: Standards approved
8
Institute of Electronics,National C
hiao Tung U
niversity
ITU-T Multimedia Communications Standards
/3
9
Institute of Electronics,National C
hiao Tung U
niversity
H.324 Terminal(multimedia communication over PSTN)
10
Institute of Electronics,National C
hiao Tung U
niversity
H.261 Overall Codec System
11
Institute of Electronics,National C
hiao Tung U
niversity
Quick View of H.261
• ITU-T H.261: The basis of modern video compression– The first widespread practical success– First design (late ‘90) embodying typical structure that
dominates today• 16x16 macroblock motion compensation, 8x8 DCT, scalar
quantization, and variable-length coding– Other key aspects
• loop filter, integer-pel motion compensation accuracy, 2-D VLC for coefficients
– Operated at 64-2048 kbps– Still in use
• although mostly as a backward compatibility feature –overtaken by H.263
12
Institute of Electronics,National C
hiao Tung U
niversity
Picture Partition (1)• Picture size: CIF, QCIF• Macroblock (MB): Contains six 8x8 blocks (motion compensation,
quantizer adjustment, …)
Cb CrY
• Group of Block (GOB): Contains 33 MB‘s (synchronization, quantizer adjustment, …)
1 2 3 4 5 6 7 8 9 10 1112 13 14 15 16 17 18 19 20 21 2223 24 25 26 27 28 29 30 31 32 33
43
215 6
13
Institute of Electronics,National C
hiao Tung U
niversity
Picture Partition (2)
• Picture: Contains 3 or 12 GOB‘s (picture sync., time reference, …)
14
Institute of Electronics,National C
hiao Tung U
niversity
Syntax
• Picture Layer: Picture Start Code (PSC: 20 bits), Temporal Reference (TR: 5), Picture Type (PTYPE), Picture Extra Insertion(PEI), Picture Spare (PSPARE)
• GOB Layer: Group Start Code (GBSC), Group Number (GN), Quantizer (GQUANT), Extra Insertion (GEI), ...
15
Institute of Electronics,National C
hiao Tung U
niversity
Syntax (cont.)
• Macroblock (MB) Layer: MB Address — deffenrential (MBA), MB Type (MTYPE), Quantizer (MQUANT), Motion Vector Data —differential (MVD), Coded Block Pattern (CBP)
16
Institute of Electronics,National C
hiao Tung U
niversity
Syntax (cont.)
• Block Layer: (DCT) Transform Coefficients (TCOEFF), End of Block (EOB: ‘10’)
17
Institute of Electronics,National C
hiao Tung U
niversity
H.261 Frame Sequence
18
Institute of Electronics,National C
hiao Tung U
niversity
H.261 Frame Sequence• Two types of image frames are defined: Intra-frames (I-frames) and
Inter-frames (P-frames):
– I-frames are treated as independent images. Transform coding method similar to JPEG is applied within each I-frame, hence “intra".
– P-frames are not independent: coded by a forward predictive codingmethod (prediction from a previous P-frame is allowed --- not just from a previous I-frame).
– Temporal redundancy removal is included in P-frame coding, whereas I-frame coding performs only spatial redundancy removal.
– To avoid propagation of coding errors, an I-frame is usually sent a couple of times in each second of the video. (intra refresh)
• Motion vectors in H.261 are always measured in units of full pixeland they have a limited range of 15 pixels, i.e., p = 15.
19
Institute of Electronics,National C
hiao Tung U
niversity
Intra-frame (I-frame) Coding• Macroblocks are of size 16x16 pixels for the Y frame, and 8x8 for Cb
and Cr frames, since 4:2:0 chroma subsampling is employed. A macroblock consists of four Y, one Cb, and one Cr 8x8 blocks.
• For each 8x8 block a DCT transform is applied, the DCT coefficients then go through quantization zigzag scan and entropy coding.
20
Institute of Electronics,National C
hiao Tung U
niversity
P-Frame (Inter-frame) Coding
21
Institute of Electronics,National C
hiao Tung U
niversity
P-Frame (Inter-frame) Coding
• The P-frame coding encodes the difference macroblock(not the Target macroblock itself).
• Sometimes, a good match cannot be found, i.e., the prediction error exceeds a certain acceptable level.– The MB itself is then encoded (treated as an Intra MB) and in this
case it is termed a non-motion compensated MB.
• For motion vector, the difference MVD is sent for entropy coding:– MVD = MVPreceding −MVCurrent
22
Institute of Electronics,National C
hiao Tung U
niversity
H.261 Encoder (Nonstandard)
Loop filter
23
Institute of Electronics,National C
hiao Tung U
niversity
H.261 Decoder (Standard)
24
Institute of Electronics,National C
hiao Tung U
niversity
A Glance at Syntax of H.261 Video Bitstream
25
Institute of Electronics,National C
hiao Tung U
niversity
Parameter Selection and Rate Control
• MTYPE (intra vs. inter, zero vs. non-zero MV in inter, loop filter on/off)
• CBP (which blocks in a MB have non-zero DCT coefficients)
• MQUANT (allow the changes of the quantizer step size at the MB level)– should be varied to satisfy the rate constraint
• MV (ideally should be determined not only by prediction error but also the total bits used for coding MV and DCT coefficients of prediction error)
26
Institute of Electronics,National C
hiao Tung U
niversity
Quantization
• 8x8 DCT; zig-zag scan• Uniform quantizer with a dead-zone:
Odd QUANTREC = QUANT•(2•level + 1); for level > 0REC = QUANT•(2•level – 1); for level < 0
Even QUANTREC = QUANT•(2•level + 1) – 1; for level > 0REC = QUANT•(2•level – 1) + 1; for level < 0REC = 0; for level = 0
QUANT value: 1 – 31 (5 bits); may be changed for every MB and /or GOBException: Intra-block dc coeff — step size = 8 (fixed) and no dead-zone
27
Institute of Electronics,National C
hiao Tung U
niversity
Quantization
• The quantization in H.261 uses a constant step size, for all DCT coefficients within a macroblock.
• If we use DCT and QDCT to denote the DCT coefficients before and after the quantization, then for DC coefficients in Intra mode
– For other coefficients (floor function for center deadzone)
– Scale: an integer in the range of [1, 31].
28
Institute of Electronics,National C
hiao Tung U
niversity
DCT Coefficient Quantization
Deadzone:To avoid too many small coefficients being coded, which are typically due to noise
29
Institute of Electronics,National C
hiao Tung U
niversity
Variable Length Coding • DCT coefficients are converted into runlength representations and then coded
using VLC (Huffman coding for each pair of symbols)– Symbol: (Zero run-length, non-zero value range)
• Other information are also coded using VLC (Huffman coding)
Bits 1 2 3 4 5 6 7 8 …. 15 16 ………. 128 0 1 2 3 4 5 6 7 8 . . .1112 . .
27 . .63
2(3 ) 5 6 8 9 9 11 13 ….. 14 20 …….. 20 4 7 9 11 13 14 14 20 ….. 5 8 11 13 14 20 …. 6 9 13 14 20 … 6 11 13 20 … 7 11 14 20 … 7 13 20 … 7 13 20 … 8 13 20 … . . . . 20 …
9 20 … 9 . 20 … . 20 ….
20 … 20 20 20
20-bits fixed length codes
Escape(6 bits)+Run(6)+Level(8)
R
u
n
↓
Absolute Level→
30
Institute of Electronics,National C
hiao Tung U
niversity
Motion Estimation and Compensation
• Integer-pel accuracy in the range [-16,16]• Methods for generating the MVs are not specified in the
standard – Standards only define the bitstream syntax, or the decoder
operation)
• MVs coded differentially (DMV)• Encoder and decoder uses the decoded MVs to perform
motion compensation • Loop-filtering can be applied to suppress propagation of
coding noise temporally– Separable filter [1/4,1/2,1/4]– Loop filter can be turned on or off
H.263
Dept. Electronics Engineering,N
ational Chiao T
ung University
32
Institute of Electronics,National C
hiao Tung U
niversity
Very Low Bit Rate Coding• ITU-T Study Group (SG) 15/16: Very low Bit-Rate Visual Telephony
(LBC)• History:
Sept. 1993: Started new work item.Near-term: Improving H.261
Nov. 1995 — H.263 decidedJan. 1998 — H.263+ (H.263 Ver.2) decided2000 — Finished H.263++ ( H.263 Ver.3)Long-term: Draft H.26L H.264 (2003)— Different from H.261 (H.263)— Collaborate with MPEG-4 (JVT = AVC)
• Goal: Improved quality at lower rates• Result: Significantly better quality at lower rates
– Better video at 18-24 Kbps than H.261 at 64 Kbps– Enable video phone over regular phone lines (28.8 Kbps) or wireless
modem
33
Institute of Electronics,National C
hiao Tung U
niversity
Different From H.261
• A combination of H.261 and MPEG• Various picture formats such as sub-QCIF, 4CIF,
etc.• Half-pel motion compensation (~MPEG)• No loop filter• No microblock addressing (included in MB header)• Quantizer stepsize: 5-bit in picture and GOB
layers; differential MQUANT stepsize: 2-bit in MB layer
• 3D VLC for transform coeffs.• Four negotiable options
34
Institute of Electronics,National C
hiao Tung U
niversity
Video Format and Picture Partition in H.263
• Wider application range from sub-QCIF to 16CIF
35
Institute of Electronics,National C
hiao Tung U
niversity
H.263 Typical Encoder (Nonstandard)
• A general source coder model
36
Institute of Electronics,National C
hiao Tung U
niversity
DCT, Quantization and 3-D VLC
• DCT and Zig-zag scan: same as H.261 (JPEG)• Inverse-Quantization: same at that of H.261• At MB layer, the QUANT value can only be
increased / decreased by 1 and 2• 3-D VLC: An event (symbol) is made of (Last, Run,
Level).– ‘Last’ = 1 indicates the last coeff.
37
Institute of Electronics,National C
hiao Tung U
niversity
3-D VLC
Last Run Level (Bits) VCL Code0 0 1 3 10s0 0 2 5 1111s0 0 3 7 0101 01s
…1 0 1 5 0111s1 0 2 10 0000 1100 1s1 0 3 12 0000 0000 101s
•••
38
Institute of Electronics,National C
hiao Tung U
niversity
Motion Estimation: Median Prediction for MV
1. horizontal and vertical components are seperatedly calculated2. The difference between MV and the predictor is VLC-coded
39
Institute of Electronics,National C
hiao Tung U
niversity
Motion Estimation: Half-Pel Precision
• Half-pixel prediction by bilinear interpolation– to reduce the prediction error,– default range MV(u; v) are now [−16; 15:5].– Half pels are generated by bilinear interpolation
40
Institute of Electronics,National C
hiao Tung U
niversity
H.263 Negotiable Options
-- Negotiable between encoder and decoderUnrestricted motion vectors (UMV) mode:Motion vectors are allowed to point outside the pictureSyntax-based arithmetic coding (SAC) mode: VLC is replaced by arithmetic codingAdvanced prediction (AP) mode: One MV for each 8x8 blockPB-frame (PB) mode: Introduce a ‘constrained version’ of (MPEG) B-frame
41
Institute of Electronics,National C
hiao Tung U
niversity
Advanced Prediction Mode
Four MV's can be used in a MB: The 1st (differential) MV is MVD and the rest, MVD2-4The MV predictor for each 8x8 block is formed by using 3 nearby MV's as shown below
42
Institute of Electronics,National C
hiao Tung U
niversity
AP Mode: Overlapped Motion Compensation
Each pel in the current 8x8 luminance block is predicted using the weighted sum of the pels of three previous frame predictors: current, left (or right), top (or bottom). For example, the upper left 4x4 corners uses the current, top and left predictors; the upper right 4x4 corners uses the current, top and right predictors; etc.The current predictors is the previous-frame pels displaced using the current MV, the left predictor is displaced using the left block MV, etc.Four MV‘s enable more accurate MV for each block. Overlapped compensation achieves smooth transitionbetween nearby blocks.
43
Institute of Electronics,National C
hiao Tung U
niversity
Overlapped Motion Compensation
where is the pels displaced by the current MV, is the pel displaced by (MV of the top or the
bottom block), is the pel displaced by (MV of the left or the right block).
8/)4),(),(),(),(),(),((),( 0
+×+×+×=
jiHjisjiHjirjiHjiqjip
s
r
),( jiq),( jir rMV
),( jis MVs
0MV
44
Institute of Electronics,National C
hiao Tung U
niversity
Overlapped MC (cont.)
45
Institute of Electronics,National C
hiao Tung U
niversity
Motion Estimation: PB-Picture Mode
PB-picture mode codes two pictures as a group. The second picture (P) is coded first, then the first picture (B) is coded using both the P-picture and the previously coded picture. This is to avoid the reordering of pictures required in the normal B-mode. But it still requires additional coding delay than P-frames only.
In a B-block, forward prediction (predicted from the previous frame) can be used for all pixels;backward prediction (from the future frame) is only used for those pels that the backward motion vectoraligns with pels of the current MB. Pixels in the “white area” use only forward prediction.
Under large motions, PB-frames do not compress as well as B-frames. An improved PB-frame mode was defined in H.263+, that removes the previous restriction.
46
Institute of Electronics,National C
hiao Tung U
niversity
Performance of H.261 and H.263
Forman, QCIF, 12.5 Hz
Integer MC, +/- 16
Half-pel MC, +/- 32
Integer MC, +/- 16, loop filter
Integer MC, +/- 32
OBMC, 4 MVs, etc
47
Institute of Electronics,National C
hiao Tung U
niversity
Advantages of Options
(Girod and et al., Performance of the H.263 Video Compression Standard, VLSI Signal Proc., 1997)At 64 kbps, QCIF pictures, ~12.5 frames / secH.261 vs. H.263: (1) w/o options ~2 dB PSNRimprovement; (2) with all options ~ 3 dB.Key factor: Half-pel motion estimation.H.263 SAC option: 0.2 dB improvement (vs. w/o)H.263 AP option: 1.2 dB (vs. w/o)H.263 PB option: P-pic PSNR is higher but B-pic PSNR is lower; Better subjective quality
48
Institute of Electronics,National C
hiao Tung U
niversity
H.263+ (H.263 v2)
Enhance H.263 with additional options (Draft 20, Sept. ‘97)Coding efficiency:
— Advanced intra coding mode— Deblocking filter mode— Improved PB-frames mode— Reference picture resampling mode— Alternative inter VLC mode— Modified quantization mode
49
Institute of Electronics,National C
hiao Tung U
niversity
H.263+ (cont.)
Error robustness:— Slice structured mode— Referenced picture selection mode— Independently segmented decoding modeEnhanced Communication:— Temporal, SNR, and spatial scalability mode— Reduced-resolution updated mode