-
Video CompressionConferencing & Internet
Video
Hamid R. Rabiee, PhDIntel Corporation &
Oregon Graduate Institute
-
ObjectivesThe student should be able to:� Describe the basic components of the H.263
video codec and how it differs from H.261.� Describe and understand the improvements of
H.263+ over H.263.� Understand enough about Internet and WWW
protocols to see how they affect video.� Understand the basics of streaming video over
the Internet as well as error resiliency andconcealment techniques.
-
Section 1: Conferencing Video
Section 2: Internet Review
Section 3: Internet Video
Outline
-
Section 1: Conferencing Video
�Video Compression Review�Chronology of Video Standards�The Input Video Format�H.263 Overview�H.263+ Overview
-
Video Compression Review
-
MotionEstimation
&Compensation
MotionEstimation
&Compensation
Transform,Quantization, Zig-Zag Scan & Run-Length Encoding
Transform,Quantization, Zig-Zag Scan & Run-Length Encoding
SymbolEncoderSymbolEncoder
Frames ofDigital Video
Bit Stream
Video codecs have threemain functional blocks
Video Compression Review
Garden Variety Video Coder
-
MotionEstimation
&Compensation
MotionEstimation
&Compensation
Transform,Quantization, Zig-Zag Scan & Run-Length Encoding
Transform,Quantization, Zig-Zag Scan & Run-Length Encoding
SymbolEncoderSymbolEncoder
Frames ofDigital Video
Bit Stream
The symbol encoder exploits thestatistical properties of its input by usingshorter code words for more commonsymbols.Examples: Huffman & Arithmetic Coding
Video Compression Review
Symbol Encoding
-
MotionEstimation
&Compensation
MotionEstimation
&Compensation
Transform,Quantization, Zig-Zag Scan & Run-Length Encoding
Transform,Quantization, Zig-Zag Scan & Run-Length Encoding
SymbolEncoderSymbolEncoder
Frames ofDigital Video
Bit Stream
This block is the basis for most losslessimage coders (in conjunction with DPCM,etc.)
Video Compression Review
Symbol Encoding
-
MotionEstimation
&Compensation
MotionEstimation
&Compensation
Transform,Quantization, Zig-Zag Scan & Run-Length Encoding
Transform,Quantization, Zig-Zag Scan & Run-Length Encoding
SymbolEncoderSymbolEncoder
Frames ofDigital Video
Bit Stream
A transform (usually DCT) is applied tothe input data for better energycompaction which decreases the entropyand improves the performance of thesymbol encoder.
Video Compression Review
Transform & Quantization
-
MotionEstimation
&Compensation
MotionEstimation
&Compensation
Transform,Quantization, Zig-Zag Scan & Run-Length Encoding
Transform,Quantization, Zig-Zag Scan & Run-Length Encoding
SymbolEncoderSymbolEncoder
Frames ofDigital Video
Bit Stream
The DCT also decomposes the input intoits frequency components so thatperceptual properties can be exploited.For example, we can throw away highfrequency content first.
Video Compression Review
Transform & Quantization
-
MotionEstimation
&Compensation
MotionEstimation
&Compensation
Transform,Quantization, Zig-Zag Scan & Run-Length Encoding
Transform,Quantization, Zig-Zag Scan & Run-Length Encoding
SymbolEncoderSymbolEncoder
Frames ofDigital Video
Bit Stream
Quantization lets us reduce therepresentation size of each symbol,improving compression but at theexpense of added errors. It’s the maintuning knob for controlling data rate.
Video Compression Review
Transform & Quantization
-
MotionEstimation
&Compensation
MotionEstimation
&Compensation
Transform,Quantization, Zig-Zag Scan & Run-Length Encoding
Transform,Quantization, Zig-Zag Scan & Run-Length Encoding
SymbolEncoderSymbolEncoder
Frames ofDigital Video
Bit Stream
Zig-zag scanning and run-lengthencoding orders the data into 1-D arraysand replaces long runs of zeros with run-length symbols.
Video Compression Review
Transform & Quantization
-
MotionEstimation
&Compensation
MotionEstimation
&Compensation
Transform,Quantization, Zig-Zag Scan & Run-Length Encoding
Transform,Quantization, Zig-Zag Scan & Run-Length Encoding
SymbolEncoderSymbolEncoder
Frames ofDigital Video
Bit Stream
These two components form the basis formany still image compression algorithmssuch as JPEG, PhotoCD, M-JPEG and DV.
Video Compression Review
Still Image Compression
-
MotionEstimation
&Compensation
MotionEstimation
&Compensation
Transform,Quantization, Zig-Zag Scan & Run-Length Encoding
Transform,Quantization, Zig-Zag Scan & Run-Length Encoding
SymbolEncoderSymbolEncoder
Frames ofDigital Video
Bit Stream
Finally, because video is a sequence of pictureswith high temporal correlation, we add motionestimation/compensation to try to predict asmuch of the current frame as possible from theprevious frame.
Video Compression Review
Motion Estimation/Compensation
-
MotionEstimation
&Compensation
MotionEstimation
&Compensation
Transform,Quantization, Zig-Zag Scan & Run-Length Encoding
Transform,Quantization, Zig-Zag Scan & Run-Length Encoding
SymbolEncoderSymbolEncoder
Frames ofDigital Video
Bit Stream
Most common method is to predict eachblock in the current frame by a (possiblytranslated) block of the previous frame.
Video Compression Review
Motion Estimation/Compensation
-
MotionEstimation
&Compensation
MotionEstimation
&Compensation
Transform,Quantization, Zig-Zag Scan & Run-Length Encoding
Transform,Quantization, Zig-Zag Scan & Run-Length Encoding
SymbolEncoderSymbolEncoder
Frames ofDigital Video
Bit Stream
These three components form the basis formost of the standard video compressionalgorithms: MPEG-1, -2, & -4, H.261, H.263,H.263+.
Video Compression Review
Garden Variety Video Coder
-
Section 1: Conferencing Video
�Video Compression Review�
�The Input Video Format�H.263 Overview�H.263+ Overview
-
Chronology of Video Standards
1990 1996 20021992 1994 1998 2000
H.263LH.263L
H.263++H.263++
H.263+H.263+
H.263H.263H.261H.261
MPEG 7MPEG 7
MPEG 4MPEG 4
MPEG 2MPEG 2
MPEG 1MPEG 1
ISO
ITU
-T
-
Chronology of Video Standards• (1990) H.261, ITU-T
– Designed to work at multiples of 64 kb/s(px64).
– Operates on standard frame sizes CIF, QCIF.• (1992) MPEG-1, ISO “Storage & Retrieval
of Audio & Video”– Evolution of H.261.– Main application is CD-ROM based video (~1.5
Mb/s).
-
Chronology continued
• (1994-5) MPEG-2, ISO “Digital Television”– Evolution of MPEG-1.– Main application is video broadcast (DirecTV,
DVD, HDTV).– Typically operates at data rates of 2-3 Mb/s and
above.
-
Chronology continued• (1996) H.263, ITU-T
– Evolution of all of the above.– Supports more standard frame sizes (SQCIF,
QCIF, CIF, 4CIF, 16CIF).– Targeted low bit rate video
-
Chronology continued• (1/99) MPEG-4, ISO “Multimedia
Applications”– MPEG4 video based on H.263, similar to H.263+– Adds more sophisticated binary and multi-bit
transparency support.– Support for multi-layered, non-rectangular video
display.• (2H/’00) H.263++ (H.263V3), ITU-T
– Tentative work item.– Addition of features to H.263.– Maintain backward compatibility with H.263 V.1.
-
Chronology continued• (2001) MPEG7, ISO “Content
Representation for Info Search”– Specify a standardized description of various
types of multimedia information. Thisdescription shall be associated with thecontent itself, to allow fast and efficientsearching for material that is of a user’sinterest.
• (2002) H.263L, ITU-T– Call for Proposals, early ‘98.– Proposals reviewed through 11/98, decision to
proceed.– Determined in 2001
-
Section 1: Conferencing Video
�Video Compression Review�Chronology of Video Standards�
�H.263 Overview�H.263+ Overview
-
Video Format for Conferencing• Input color format is YCbCr (a.k.a. YUV). Y is the
luminance component, U & V are chrominance(color difference) components.
• Chrominance is subsampled by two in eachdirection.
• Input frame size is based on the CommonIntermediate Format (CIF) which is 352x288 pixelsfor luminance and 176x144 for each of thechrominance components.
Input Format
CbCr
Y=
-
0.299 0.587 0.114-0.169 -0.331 0.5000.500 -0.419 -0.081
RGB
YCbCr
=
Y represents the luminance of a pixel.Cr, Cb represents the color difference or chrominance of a pixel.
Input Format
• Defined as input color space to H.263, H.263+,H.261, MPEG, etc.
• It’s a 3x3 transformation from RGB.
YCbCr (YUV) Color Space
-
1
10
100
0 2 4 6 8 10
frequency
YC
Input Format
• The human eye is more sensitive to spatialdetail in luminance than in chrominance.
• Hence, it doesn’t make sense to have asmany pixels in the chrominance planes.
Subsampled Chrominance
-
luminance pelchrominance pelblock edge
Different thanMPEG-2 4:2:0Different thanMPEG-2 4:2:0
Input Format
Spatial relation between lumaand chroma pels for CIF 4:2:0
-
352 4 3 264 288/ × = ≠
Input Format
•The input video format is based on CommonIntermediate Format or CIF.
•It is called Common Intermediate Format becauseit is derivable from both 525 line/60 Hz (NTSC) and625 line/50 Hz (PAL) video signals.
•CIF is defined as 352 pels per line and 288 linesper frame.
•The picture area for CIF is defined to have anaspect ratio of about 4:3 . However,
Common Intermediate Format
-
Picture4:3
352
288
Pixel12:11
Pixels are not square in CIF.
Input Format
Picture & Pixel Aspect Ratios
-
Hence on a square pixel display such as acomputer screen, the video will look slightlycompressed horizontally. The solution is tospatially resample the video frames to be
384 x 288 or 352 x 264
This corresponds to a 4:3 aspect ratio forthe picture area on a square pixel display.
Input Format
Picture & Pixel Aspect Ratios
-
The luma and chroma planes are divided into 8x8pixel blocks. Every four luma blocks areassociated with a corresponding Cb and Cr blockto create a macroblock.
8x8 pixel blocks
macroblock
Y
Cb Cr
Input Format
Blocks and Macroblocks
-
Section 1: Conferencing Video
�Video Compression Review�Chronology of Video Standards�The Input Video Format�
�H.263+ Overview
-
ITU-T RecommendationH.263
-
ITU-T RecommendationH.263
• H.263 targets low data rates (< 28 kb/s).For example it can compress QCIF videoto 10-15 fps at 20 kb/s.
• For the first time there is a standard videocodec that can be used for videoconferencing over normal phone lines(H.324).
• H.263 is also used in ISDN-based VC(H.320) and network/Internet VC (H.323).
-
Composed of a baseline plusfour negotiable options
Baseline CodecBaseline Codec
Unrestricted/Extended MotionVector Mode
Unrestricted/Extended MotionVector Mode
Advanced Prediction ModeAdvanced Prediction Mode
PB Frames ModePB Frames Mode
Syntax-based ArithmeticCoding Mode
Syntax-based ArithmeticCoding Mode
ITU-T Recommendation H.263
-
Format Y U,VSQCIF 128x96 64x48QCIF 176x144 88x72CIF 352x288 176x1444CIF 704x576 352x28816CIF 1408x1152 704x576
Always 12:11 pixel aspect ratio.
H.263 Baseline
Frame Formats
-
Picture & Macroblock Types• Two picture types:
– INTRA (I-frame) implies no temporal predictionis performed.
– INTER (P-frame) may employ temporalprediction.
• Macroblock (MB) types:– INTRA & INTER MB types (even in P-frames).
• INTER MBs have shorter symbols in P frames• INTRA MBs have shorter symbols in I frames
– Not coded - MB data is copied from previousdecoded frame.
H.263 Baseline
-
X
CB
A
( )CBAX MVMVMVMVMV ,,median−=∆
H.263 Baseline
• Motion vectors have 1/2 pixelgranularity. Reference framesmust be interpolated by two.
• MV’s are not coded directly,but rather a median predictoris used.
• The predictor residual is thencoded using a VLC table.
Motion Vectors
-
Motion Vector Delta (MVD)Symbol Lengths
02468
101214
0 0.5 1 1.5 2 2.5 -3.5
4.0 -5.0
5.5 -12.0
12.5-15.5
MVD Absolute Value
Cod
e le
ngth
in b
its
H.263 Baseline
-
H.263 Baseline
Assign a variable length code accordingto three parameters (3-D VLC):
1 - Length of the run of zeros preceding thecurrent nonzero coefficient.
2 - Amplitude of the current coefficient.
3 - Indication of whether current coefficientis the last one in the block.
3 - The most common are variable lengthcoded (3-13 bits), the rest are coded withescape sequences (22 bits)
Transform Coefficient Coding
-
Q
-Q
2Q
-2Q
in
out
H.263 Baseline
• H.263 uses a scalar quantizer with centerclipping.
• Quantizer varies from 2 to 62, by 2’s.• Can be varied ±1, ±2 at macroblock
boundaries (2 bits), or 2-62 at row andpicture boundaries (5 bits).
Quantization
-
Hierarchy of three layers.
Picture Layer
GOB* Layer
MB Layer
*A GOB is usually a row of macroblocks, exceptfor frame sizes greater than CIF.
Picture Hdr GOB Hdr MB MB ... GOB Hdr ...
H.263 Baseline
Bit Stream Syntax
-
Picture StartCode
TemporalReference
PictureType
PictureQuant
H.263 Baseline
• PSC - sequence of bits that can not be emulatedanywhere else in the bit stream.
• TR - 29.97 Hz counter indicating time reference fora picture.
• PType - Denotes INTRA, INTER-coded, etc.
• P-Quant - Indicates which quantizer (2…62) isused initially for the picture.
Picture Layer Concepts
-
GOB StartCode
GOBNumber
GOBQuant
H.263 Baseline
• GSC - Another unique start code (17 bits).• GOB Number - Indicates which GOB,
counting vertically from the top (5 bits).• GOB Quant - Indicates which quantizer
(2…62) is used for this GOB (5 bits).
GOB Layer ConceptsGOB Headers are Optional
GOB can be decoded independently from therest of the frame.
-
CodedFlag
MBType
Code BlockPattern
MVDeltas
TransformCoefficientsDQuant
H.263 Baseline
• COD - if set, indicates empty INTER MB.• MB Type - indicates INTER, INTRA, whether MV
is present, etc.• CBP - indicates which blocks, if any, are empty.• DQuant - indicates a quantizer change by +/- 2, 4.• MV Deltas - are the MV prediction residuals.• Transform coefficients - are the 3-D VLC’s for the
coefficients.
Macroblock Layer Concepts
-
Unrestricted/Extended MotionVector Mode
• Motion vectors are permitted to point outsidethe picture boundaries.– non-existent pixels are created by replicating the
edge pixels.– improves compression when there is movement
across the edge of a picture boundary or whenthere is camera panning.
• Also possible to extend the range of themotion vectors from [-16,15.5] to [-31.5,31.5]with some restrictions. This better addresseshigh motion scenes.
H.263 Options
-
Motion Vectors OverPicture Boundaries
Target Frame NReference Frame N-1
Edge pixelsare repeated.
H.263 Options
-
Extended MV Range
15.5
15.5
-16
-16-16
-1615.5
15.5 (31.5,31.5)
Base motion vector range.
Extended motionvector range, [-16,15.5]around MV predictor.
H.263 Options
-
H.263 Options
• Includes motion vectors across pictureboundaries from the previous mode.
• Option of using four motion vectors for 8x8blocks instead of one motion vector for 16x16blocks as in baseline.
• Overlapped motion compensation to reduceblocking artifacts.
Advanced Prediction Mode
-
Overlapped MotionCompensation
• In normal motion compensation, the currentblock is composed of– the predicted block from the previous frame
(referenced by the motion vectors), plus– the residual data transmitted in the bit stream for
the current block.• In overlapped motion compensation, the
prediction is a weighted sum of threepredictions.
H.263 Options
-
Overlapped MotionCompensation
H.263 Options
• Let (m, n) be the column & row indices of an8×8 pixel block in a frame.
• Let (i, j) be the column & row indices of apixel within an 8×8 block.
• Let (x, y) be the column & row indices of apixel within the entire frame so that:
(x, y) = (m×8 + i, n×8 + j)
-
Overlapped Motion Comp.• Let (MV0x,MV0y) denote the
motion vectors for the currentblock.
• Let (MV1x,MV1y) denote themotion vectors for the blockabove (below) if the currentpixel is in the top (bottom) halfof the current block.
• Let (MV2x,MV2y) denote themotion vectors for the block tothe left (right) if the currentpixel is in the left (right) half ofthe current block.
H.263 Options
MV0
MV1
MV1
MV2 MV2
-
Overlapped Motion Comp.Then the summed, weighted prediction is denoted:P(x,y) =
(q(x,y) H0(i,j) + r(x,y) H1(i,j) + s(x,y) H2(i,j) +4)/8
Where,q(x,y) = (x + MV0x, y + MV0y),r(x,y) = (x + MV1x, y + MV1y),s(x,y) = (x + MV2x, y + MV2y)
H.263 Options
-
Overlapped Motion Comp.4 5 5 5 5 5 5 4
5 5 5 5 5 5 5 5
5 5 6 6 6 6 5 5
5 5 6 6 6 6 5 5
5 5 6 6 6 6 5 5
5 5 6 6 6 6 5 5
5 5 5 5 5 5 5 5
4 5 5 5 5 5 5 4
H0(i, j) =
H.263 Options
-
Overlapped Motion Comp.1 2 2 2 2 2 2 1
1 1 2 2 2 2 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 2 2 2 2 1 1
1 2 2 2 2 2 2 1
H1(i, j) =
H.263 Options
H2(i, j) = ( H1(i, j) )T
-
H.263 Options
• Permits two pictures to be coded as one unit:a P frame as in baseline, and a bi-directionallypredicted frame or B frame.
• B frames provide more efficient compressionat times.
• Can increase frame rate 2X with only about30% increase in bit rate.
• Restriction: the backward predictor cannotextend outside the current MB position of thefuture frame. See diagram.
PB Frames Mode
-
Picture 1P or I Frame
Picture 2B Frame
Picture 3P or I Frame
V 1/2 -V 1/2
2X frame rate for only 30% more bits.
H.263 Options
PB Frames
PB
-
H.263 Options
• In this mode, all the variable length codingand decoding of baseline H.263 is replacedwith arithmetic coding/decoding. Thisremoves the restriction that each sumbolmust be represented by an integer number ofbits, thus improving compression efficiency.
• Experiments indicate that compression canbe improved by up to 10% over variablelength coding/decoding.
• Complexity of arithmetic coding is higherthan variable length coding, however.
Syntax based ArithmeticCoding Mode
-
• H.261 only accepts QCIF and CIF format.
• No 1/2 pel motion estimation in H.261, instead ituses a spatial loop filter.
• H.261 does not use median predictors for motionvectors but simply uses the motion vector in theMB to the left as predictor.
• H.261 does not use a 3-D VLC for transformcoefficient coding.
• GOB headers are mandatory in H.261.
• Quantizer changes at MB granularity requires 5bits in H.261 and only 2 bits in H.263.
H.263 Improvements over H.261
-
H.261 H.263
Demo: QCIF, 8 fps @ 28 Kb/s
-
VideoConferencing
Demonstration
VideoConferencing
Demonstration
-
Section 1: Conferencing Video
�Video Compression Review�Chronology of Video Standards�The Input Video Format�H.263 Overview�
H.263 Options
-
ITU-T RecommendationH.263 Version 2
(H.263+)
-
H.263 Ver. 2 (H.263+)• H.263+ was standardized in January, 1998.
• H.263+ is the working name for H.263Version 2.
• Adds negotiable options and featureswhile still retaining a backwardscompatibility mode.
H.263+
-
H.263 “plus” more negotiable options
• Arbitrary frame size, pixel aspect ratio (includingsquare), and picture clock frequency
• Advanced INTRA frame coding• Loop de-blocking filter• Slice structures• Supplemental enhancement information• Improved PB-frames
H.263: OverviewH.263+
-
H.263: OverviewH.263 “plus” more negotiable options
• Reference picture selection• Temporal, SNR, and Spatial Scalability Mode• Reference picture resampling• Reduced resolution update mode• Independently segmented decoding• Alternative INTER VLC• Modified quantization
-
H.263+
• In addition to the multiples of CIF, H.263+ permitsany frame size from 4x4 to 2048x1152 pixels inincrements of 4.
• Besides the 12:11 pixel aspect ratio (PAR), H.263+supports square (1:1), 525-line 4:3 picture (10:11),CIF for 16:9 picture (16:11), 525-line for 16:9 picture(40:33), and other arbitrary ratios.
• In addition to picture clock frequencies of 29.97 Hz(NTSC), H.263+ supports 25 Hz (PAL), 30 Hz andother arbitrary frequencies.
Arbitrary Frame Size, PixelAspect Ratio, Clock Frequency
-
H.263+
• In this mode, either the DCcoefficient, 1st column, or 1st rowof coefficients are predicted fromneighboring blocks.
• Prediction is determined on a MB-by-MB basis.
• Essentially DPCM of INTRA DCTcoefficients.
• Can save up to 40% of the bits onINTRA frames.
Advanced INTRA Coding Mode
-
Advanced INTRA Mode
DCT Blocks
RowPrediction
ColumnPrediction
H.263+
-
A
DCB
DCBA
Blockboundary
block2block1
block1
H.263+
• Filter pixels along blockboundaries while preservingedges in the image content.
• Filter is in the coding loop whichmeans it filters the decodedreference frame used for motioncompensation.
• Can be used in conjunction with apost-filter to further reducecoding artifacts.
Deblocking Filter Mode
-
Deblocking Filter Mode
A
DCB
DCBA
BlockBoundary
BlockBoundary
H.263+
-
Deblocking Filter Mode
• A, B, C and D are replaced by new values,A1, B1, C1, and D1 based on a set of non-linear equations.
• The strength of the filter is proportional tothe quantization strength.
H.263+
-
Deblocking Filter ModeA,B,C,D are replaced by A1,B1,C1, D1:
B1 = clip(B + d1)C1 = clip(C - d1)A1 = A - d2D1 = D + d2d2 = clipd1((A - D)/4, d1 / 3)d1 = Filter((A - 4B + 4C - D)/8, Strength(QUANT) )
Filter(x, Strength) =SIGN(x) * (MAX(0, abs(x) - MAX(0, 2*( abs(x) - Strength))))
H.263+
-
Post-Filter
• Filter the decoded frame first horizontally,then vertically, using a 1-D filter.
• The post-filter strength is proportional tothe quantization: Strength(QUANT)
D1 = D + Filter((A+B+C+E+F+G-6D)/8,Strength)
H.263+
-
Deblocking Filter Demo
H.263+
No Filter DeblockingLoop Filter
-
Deblocking Filter Demo
H.263+
No Filter Loop &Post Filter
-
Filter Demo Videos
No Filter Loop Filter
Loop &Post Filter
-
H.263+
• Allows insertion of resynchronization markersat macroblock boundaries to improve networkpacketization and reduce overhead. More onthis later.
• Allows more flexible tiling of video frames intoindependently decodable areas to support“view ports”, a.k.a. “local decode.”
• Improves error resiliency by reducing intra-frame dependence.
• Permits out-of-order transmission to reducelatency.
Slice Structured Mode
-
Slice Structured Mode
SliceBoundaries
No INTRA or MVPrediction acrossslice boundaries.
H.263+
Slices start and endon macroblockboundaries.
-
Slice Structured ModeIndependent Segments
SliceBoundaries
No INTRA or MVPrediction acrossslice boundaries.
H.263+
Slice sizes remainfixed between INTRAframes.
-
H.263+
Backwards compatible with H.263 butpermits indication of supplementalinformation for features such as:
• Partial and full picture freeze requests• Partial and full picture snapshot tags• Video segment start and end tags for off-line
storage• Progressive refinement segment start and end
tags• Chroma keying info for transparency
Supplemental EnhancementInformation
-
H.263+
• Allows frame size changes of acompressed video sequence withoutinserting an INTRA frame.
• Permits the warping of the reference framevia affine transformations to addressspecial effects such as zoom, rotation,translation.
• Can be used for emergency rate control bydropping frame sizes adaptively when bitrate get too high.
Reference Picture Resampling
-
Reference Picture Resamplingwith Warping
Specify arbitrarywarping parametersvia displacementvectors from corners.
H.263+
-
Reference Picture ResamplingFactor of 4 Size Change
P P P P P
No INTRA Frame Required when changingvideo frame sizes
H.263+
-
Base LayerEnhancement Layer 1Enhancement Layer 2
H.263+
• A scalable bit stream consists of layersrepresenting different levels of video quality.
• Everything can be discarded except for the baselayer and still have reasonable video.
• If bandwidth permits, one or more enhancementlayers can also be decoded which refines the baselayer in one of three ways:
temporal, SNR, or spatial
Scalability Mode
-
Layered Video Bitstreams
Enh. Layer 1
Enhancement Layer 3
Enhancement Layer 4
Base Layer
Enhancement Layer 2
H.263+ Encoder
40 kb/s
20 kb/s
90 kb/s
200 kb/s
320 kb/s
H.263+
-
H.263+
• Scalability is typically used when one bit streammust support several different transmissionbandwidths simultaneously, or some processdownstream needs to change the data rateunbeknownst to the encoder.
• Example: Conferencing Multipoint Control Unit(we’ll see another example in Internet Video)
Scalability Mode
-
Layered Video Bit Streams inmultipoint conferencing
384 kb/s
384 kb/s
128 kb/s
28.8 kb/s
H.263+
-
Temporal Enhancement
HigherFrame Rate!
Base LayerBase Layer + B Frames+ B Frames
H.263+
-
Temporal scalability means that two or moreframe rates can be supported by the same bitstream. In other words, frames can bediscarded (to lower the frame rate) and the bitstream remains usable.
H.263+
Temporal Scalability
IorP
B B P ......
-
H.263+
• The discarded frames are never used asprediction.
• In the previous diagram the I and P frames formthe base layer and the B frames from thetemporal enhancement layer.
• This is usually achieved using bidirectionalpredicted frames or B-frames.
Temporal Scalability
-
Picture 1P or I Frame
Picture 2B Frame
Picture 3P or I Frame
V 1/2
-V 1/2
2X frame rate for only 30% more bits
H.263+
B Frames
-
Temporal ScalabilityDemonstration
• layer 0, 3.25 fps, P-frames
• layer 1, 15 fps, B-frames
H.263+
-
SNR Enhancement
BetterSpatialQuality!
Base LayerBase Layer + SNR Layer+ SNR Layer
H.263+
-
H.263+
• Base layer frames are coded just as they wouldbe in a normal coding process.
• The SNR enhancement layer then codes thedifference between the decoded base layerframes and the originals.
• The SNR enhancement MB’s may be predictedfrom the base layer or the previous frame in theenhancement layer, or both.
• The process may be repeated by addinganother SNR enhancement layer, and so on...
SNR Scalability
-
Base Layer(15 kbit/s)
EnhancementLayer(40 kbit/s)
Legend:I - Intracoded or KeyFrame
P - Predicted Frame
EI - Enhancement layerkey frame
EP - Enhancement layerpredicted frame
H.263+
SNR Scalability
EI EP EP
PPI
-
SNR ScalabilityDemonstration
• layer 0, 10 fps, 40 kbps
• layer 1, 10 fps, 400 kbps
H.263+
-
Spatial Enhancement
MoreSpatial
Resolution!!
Base LayerBase Layer + Spatial Layer+ Spatial Layer
H.263+
-
H.263+
• For spatial scalability, the video is down-sampled by two horizontally and vertically priorto encoding as the base layer.
• The enhancement layer is 2X the size of thebase layer in each dimension.
• The base layer is interpolated by 2X beforepredicting the spatial enhancement layer.
Spatial Scalability
-
H.263+
Spatial Scalability
EPEPEI
I P P
EnhancementLayer
BaseLayer
-
Spatial ScalabilityDemonstration
• layer 0, QCIF, 10 fps, 60kbps
• layer 1, CIF, 10 fps, 300kbps
H.263+
-
It is possible to combine temporal, SNRand spatial scalability into a flexiblelayered framework with many levels ofquality.
H.263+
Hybrid Scalability
-
H.263+
Hybrid Scalability
EP
EP
P
EI
EP
P
B
EP
P
EI
EI
IBaseLayer
EnhancementLayer 1
EnhancementLayer 2
-
Scalability Demonstration• SNR/Spatial Scalability, 10 fps
– layer 0, 88x72, ~5 kbit/s– layer 1, 176x144, ~15– layer 2, 176x144, ~40– layer 3, 352x288, ~80– layer 4, 352x288, ~200
H.263+
-
H.263+
Other Miscellaneous Features• Improved PB-frames
– Improves upon the previous PB-frame mode by permittingforward prediction of “B” frame with a new vector.
• Reference picture selection (discussed later)– A lower latency method for dealing with error prone
environments by using some type of back-channel toindicate to an encoder when a frame has been received andcan be used for motion estimation.
• Reduced resolution update mode– Used for bit rate control by reducing the size of the residual
frame adaptively when bit rate gets too high.
-
Other Miscellaneous Features• Independently decodable segments
– When signaled, it restricts the use of dataoutside of a current Group-of-Blocksegment or slice segment. Useful for errorresiliency.
• Alternate INTER VLC– Permits use of an alternative VLC table
that is better suited for INTRA codedblocks, or blocks with low quantization.
H.263+
-
Other Miscellaneous Features• Modified Quantization
– Allows more flexibility in adapting quantizerson a macroblock by macroblock basis byenabling large quantizer changes throughthe use of escape codes.
– Reduces quantizer step size forchrominance blocks, compared to luminanceblocks.
– Modifies the allowable DCT coefficient rangeto avoid clipping, yet disallows illegalcoefficient/quantizer combinations.
H.263+
-
Section 1: Conferencing VideoSection 2: Internet ReviewSection 3: Internet Video
�
Outline
-
The InternetThe Internet
-
Phone lines are “circuit-switched”. A (virtual) circuitis established at call initiation and remains for theduration of the call.
Source Dest.switch
switch
switch
Internet Review
Internet Basics
-
Computer networks are “packet-switched”. Data isfragmented into packets, and each packet finds itsway to the destination using different routes. Lotsof implications...
Source Dest.switch
switch
switchX
Internet Review
Internet Basics
-
R
R R
Corporate LAN
INTERNETINTERNET(Global Public)(Global Public)
AOL
HyperStreamHyperStreamFR, SMDS, ATMFR, SMDS, ATM
LAN LAN
TYMNETTYMNET
MCI MailMCI Mail
LAN Mail
GW
HostDial-up IP“SLIP”, “PPP”
IP
IPIP
“SMTP”E-mail
FR
FRFR“SLIP”“PPP”
X.25
“SMTP”IP
Dial-up
E-mail
The Internet is heterogeneous [V. Cerf]
-
Network Access Layerconsists of routines for accessing
physical networks
Network Access Layerconsists of routines for accessing
physical networks
1
Internet Layerdefines the datagram and handles the
routing of data.
Internet Layerdefines the datagram and handles the
routing of data.
2
Host-to-Host Transport Layerprovides end-to-end data delivery
services.
Host-to-Host Transport Layerprovides end-to-end data delivery
services.
3
Application Layerconsists of applications and processes
that use the network.
Application Layerconsists of applications and processes
that use the network.
4
Internet Review
Layers in the Internet ProtocolArchitecture
-
HeaderHeader
HeaderHeader
Data Encapsulation
HeaderHeader
DataDataApplication Layer
Transport Layer
Internet Layer
Network Access Layer
DataData
HeaderHeader DataData
HeaderHeader HeaderHeader DataData
Internet Review
Data Encapsulation
-
I P
FDDIFDDI
EthernetEthernet
Token RingToken Ring
HDLCHDLC
SMDSSMDS
X.25X.25
ATMATM
FRFR
TCPTCP UDPUDP
SNMPSNMP DNSDNS
TELNETTELNET FTPFTP SMTPSMTP
MIMEMIME . . .
. . . NetworkAccessLayer
Internet
Host-HostTransport
Utility/Application
RTPRTP
Internet Review
MBoneMBone
VIC/VATVIC/VAT
Internet Protocol Architecture
-
IPIP
UD
PU
DP
RTPRTP
Specific Protocols forMultimedia
IPIP
TCPTCP UDPUDP
RTPRTP
Physical NetworkPhysical Network
payloadpayload
RTPRTP payloadpayload
UD
PU
DP
RTPRTP payloadpayload
Data
Internet Review
Payload header
Specific Protocols forMultimedia
-
Internet Review
• IP implements two basic functions– addressing & fragmentation
• IP treats each packet as an independententity.
• Internet routers choose the best path to sendeach packet based on its address. Eachpacket may take a different route.
• Routers may fragment and reassemblepackets when necessary for transmission onsmaller packet networks.
The Internet Protocol (IP)
-
Internet Review
The Internet Protocol (IP)
• IP packets have a Time-to-Live, after whichthey are deleted by a router.
• IP does not ensure secure transmission.• IP only error-checks headers, not payload.• Summary: no guarantee a packet will reach
its destination, and no guarantee of when itwill get there.
-
Transmission Control Protocol(TCP)
Internet Review
Transmission Control Protocol(TCP)
• TCP is connection-oriented, end-to-end reliable, in-order protocol.
• TCP does not make any reliability assumptions of theunderlying networks.
• Acknowledgment is sent for each packet.• A transmitter places a copy of each packet sent in a
timed buffer. If no “ack” is received before the time isout, the packet is re-transmitted.
• TCP has inherently large latency - not well suited forstreaming multimedia.
-
Internet Review
Universal Datagram Protocol(UDP)
• UDP is a simple protocol for transmittingpackets over IP.
• Smaller header than TCP, hence loweroverhead.
• Does not re-transmit packets. This is OK formultimedia since a late packet usually mustbe discarded anyway.
• Performs check-sum of data.
-
Internet Review
Real time Transport Protocol(RTP)
• RTP carries data that has real time properties• Typically runs on UDP/IP• Does not ensure timely delivery or QoS.• Does not prevent out-of-order delivery.• Profiles and payload formats must be
defined.• Profiles define extensions to the RTP header
for a particular class of applications such asaudio/video conferencing (IETF RFC 1890).
-
Internet Review
Real-time Transport Protocol(RTP)
• Payload formats define how a particular kindof payload, such as H.261 video, should becarried in RTP.
• Used by Netscape LiveMedia, MicrosoftNetMeeting®, Intel VideoPhone, ProShare®Video Conferencing applications and publicdomain conferencing tools such as VIC andVAT.
-
Internet Review
Real-time Transport ControlProtocol (RTCP)
• RTCP is a companion protocol to RTP whichmonitors the quality of service and conveysinformation about the participants in an on-going session.
• It allows participants to send transmissionand reception statistics to other participants. Italso sends information that allowsparticipants to associate media types such asaudio/video for lip-sync.
-
Internet Review
Real-time Transport ControlProtocol (RTCP)
• Sender reports allow senders to derive roundtrip propagation times.
• Receiver reports include count of lost packetsand inter-arrival jitter.
• Scales to a large number of users since mustreduce the rate of reports as the number ofparticipants increases.
• Most products today don’t use the informationto avoid congestion, but that will change inthe next year or two.
-
Internet Review
Multicast Backbone (Mbone)• Most IP-based communication is unicast. A
packet is intended for a single destination.For multi-participant applications, streamingmultimedia to each destination individuallycan waste network resources, since the samedata may be travelling along sub-networks.
• A multicast address is designed to enable thedelivery of packets to a set of hosts that havebeen configured as members of a multicastgroup across various subnetworks.
-
S1
D1
S2D1
D21
1
11
2
2
21
1
1
1
S1 sends duplicatepackets because there’stwo participants: D1, D2..
D2 sees excesstraffic on this
subnet.
Internet Review
Unicast ExampleStreaming media to multi-participants
-
S1
D1
S2D1
D21
1
12
2
21
1
S1 sends single set ofpackets to a multicast
group.
D2 doesn’t seeany excess traffic
on this subnet.
Both D1 receiverssubscribe to the
same multicast group.
Internet Review
Multicast ExampleStreaming media to multi-participants
-
Multicast Backbone (MBone)
• Most routers sold in the last 2-3 years supportmulticast.
• Not turned on yet in the Internet backbone.• Currently there is an MBone overlay which
uses a combination of multicast (wheresupported) and tunneling.
• Multicast at your local ISP may be 1-2 yearsaway.
Internet Review
-
Internet Review
ReSerVation Protocol (RSVP)Internet Draft
• Used by hosts to obtain a certain QoS fromunderlying networks for a multimedia stream.
• At each node, RSVP daemon attempts to make aresource reservation for the stream.
• It communicates with two local modules: admissioncontrol and policy control.
• Admission control determines whether the node hassufficient resources available. “The Internet BusySignal”
• Policy control determines whether the user hasadministrative permission to make the reservation.
-
Internet Review
Real-time Streaming Protocol(RTSP) Internet Draft
• A “network remote control” for multimedia servers.• Establishes and controls either a single or several
time-synchronized streams of continuous media suchas audio and video.
• Supports the following operations:– Requests a presentation from a media server.– Invite a media server to join a conference and playback or
record.– Notify clients that additional media is available for an existing
presentation.
-
Internet Review
Hyper-Text Transport Protocol(HTTP)
• HTTP generally runs on TCP/IP and is theprotocol upon which World-Wide-Web data istransmitted.
• Defines a “stateless” connection betweenreceiver and sender.
• Sends and receives MIME-like messages andhandles caching, etc.
• No provisions for latency or QoS guarantees.
-
OutlineSection 1: Conferencing VideoSection 2: Internet ReviewSection 3: Internet Video
��
-
Internet Video
-
We’ll look at some solutions...
Internet Video
How do we stream video overthe Internet?
• How do we handle the special cases ofunicasting? Multicasting?
• What about packet-loss? Quality of service?Congestion?
-
Internet Video
HTTP Streaming• HTTP was not designed for streaming
multimedia, nevertheless because of itswidespread deployment via Web browsers,many applications stream via HTTP.
• It uses a custom browser plug-in which canstart decoding video as it arrives, rather thanwaiting for the whole file to download.
• Operates on TCP so it doesn’t have to dealwith errors, but the side effect is high latencyand large inter-arrival jitter.
-
Internet Video
HTTP Streaming• Usually a receive buffer is employed which
can buffer enough data (usually severalseconds) to compensate for latency and jitter.
• Not applicable to two-way communication!• Firewalls are not a problem with HTTP.
-
Internet Video
RTP Streaming• RTP was designed for streaming multimedia.• Does not resend lost packets since this would
add latency and a late packet might as wellbe lost in streaming video.
• Used by Intel Videophone, MicrosoftNetMeeting, Netscape LiveMedia,RealNetworks, etc.
• Forms the basis for network videoconferencing systems (ITU-T H.323)
-
Internet Video
RTP Streaming• Subject to packet loss, and has no quality of
service guarantees.• Can deal with network congestion via RTCP
reports under some conditions:– Should be encoding real time so video rate can be
changed dynamically.• Needs a payload defined for each media it
carries.
-
Internet Video
RTP HeaderRTP Header
H.263 Payload HeaderH.263 Payload Header
H.263 Payload (bit stream)H.263 Payload (bit stream)
H.263 Payload for RTP• Payloads must be defined in the IETF for all media
carried by RTP.• A payload has been defined for H.263 and is now an
Internet RFC.• A payload has been defined for H.263+ as an ad-hoc
group activity in the ITU and is now an Internet Draft.• An RTP packet typically consists of...
-
Internet Video
H.263 Payload for RTP• The H.263 payload header contains
redundant information about the H.263 bitstream which can assist a payload handlerand decoder in the event that related packetsare lost.
• Slice mode of H.263+ aids RTP packetizationby allowing fragmentation on MB boundaries(instead of MB rows) and restricting datadependencies between slices.
• But what do we do when packets are lost orarrive too late to use?
-
Error Resiliency:Redundancy &
Concealment Techniques
Internet Video
-
Internet Packet Loss• Depends on network topology.• On the Mbone
– 2-5% packet loss– single packet loss most common
• For end-to-end transmission, loss rates of10% not uncommon.
• For ISPs, loss rates may be even higherduring high periods of congestion.
Internet Video
-
Packet Loss Burst LengthsDistribution of length of loss bursts
observed at a receiver
0.0001
0.001
0.01
0.1
1
0 5 10 15 20 25 30 35 40 45 50
length of loss bursts, b
Prob
abili
ty o
f bur
sts
of le
ngth
b
Internet Video
-
Conditional loss probability
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 2 4 6 8 10 12
Number of consecutive packets lost, n
Prob
abili
ty o
f los
ing
pack
et n
+1
Internet Video
-
First Order Loss Model2-Stage Gilbert Model
NoLoss Loss
1 - p 1 - q
p
q
Internet Video
p = 0.083 q = 0.823
-
Error Resiliency
+
- RED
UN
DA
NC
Y
com
pres
sion
resi
lienc
y
• Error resiliency and compressionhave conflicting requirements.
• Video compression attempts toremove as much redundancy out ofa video sequence as possible.
• Error resiliency techniques at somepoint must reconstruct data thathas been lost and must rely onextrapolations from redundantdata.
Internet Video
-
Error ResiliencyErrors tend to propagate in video compression
because of its predictive nature.
I or P frame P frame
One block is lost.One block is lost. Error propagates to twoblocks in the next frame.Error propagates to twoblocks in the next frame.
Internet Video
-
Error ResiliencyInternet Video
There are essentially two approaches todealing with errors from packet loss:
– Error redundancy methods are preventativemeasures that add extra infromation at theencoder to make it easier to recover when datais lost. The extra overhead decreasescompression efficiency but should improveoverall quality in the presence of packet loss.
– Error concealment techniques are the methodsthat are used to hide errors that occur oncepackets are lost.
Usually both methods are employed.
-
Simple INTRA Coding &Skipped Blocks
Internet Video
• Increasing the number of INTRA codedblocks that the encoder produces willreduce error propagation since INTRAblocks are not predicted.
• Blocks that are lost at the decoder aresimply treated as empty INTER codedblocks. The block is simply copied fromthe previous frame.
• Very simple to implement.
-
Intra Coding Resiliency
20
25
30
35
40
45
20 40 60 80 100 120 140 160 180
Data Rate (kbps)
Aver
age
PSNR
resil 0loss 0
resil 5loss 0
resil 10loss 0
resil 0loss 10-20resil 5loss 10-20resil 10loss 10-20
Internet Video
-
Reference Picture SelectionMode of H.263+
I or Pframe
Pframe
Pframe
Last acknowledgederror-free frame.
In RPS Mode, a frame is not used forprediction in the encoder until it’s been
acknowledged to be error free.
No acknowledgmentreceived yet - not
used for prediction.
Internet Video
-
Reference Picture Selection
Internet Video
• ACK-based: a picture is assumed tocontain errors, and thus is not used forprediction unless an ACK is received, or…
• NACK-based: a picture will be used forprediction unless a NACK is received, inwhich case the previous picture that didn’treceive a NACK will be used.
-
Multi-threaded Video
13
2
5 7 9
4
6
8 10
I
P
P
P
P P
P P
PI
Internet Video
• Reference pictures are interleaved to create twoor more independently decodable threads.
• If a frame is lost, the frame rate drops to 1/2 rateuntil a sync frame is reached.
• Same syntax as Reference Picture Selection, butwithout ACK/NACK.
• Adds some overhead since prediction is notbased on most recent frame.
-
Conditional Replenishment
ME/MC DCT, etc.
decoder
decoder
Encoder
Internet Video
• A video encoder contains a decoder (calledthe loop decoder) to create decoded previousframes which are then used for motionestimation and compensation.
• The loop decoder must stay in sync with thereal decoder, otherwise errors propagate.
-
Conditional Replenishment
Internet Video
• One solution is to discard the loop decoder.• Can do this if we restrict ourselves to just two
macroblock types:– INTRA coded and– empty (just copy the same block from the previous
frame)• The technique is to check if the current block has
changed substantially since the previous frameand then code it as INTRA if it has changed.Otherwise mark it as empty.
• A periodic refresh of INTRA coded blocksensures all errors eventually disappear.
-
Error TrackingAppendix II, H.263
Internet Video
• Lost macroblocks are reported back to theencoder using a reliable back-channel.
• The encoder catalogs spatial propagation of eachmacroblock over the last M frames.
• When a macroblock is reported missing, theencoder calculates the accumulated error in eachMB of the current frame.
• If an error threshold is exceeded, the block iscoded as INTRA.
• Additionally, the erroneous macroblocks are notused as prediction for future frames in order tocontain the error.
-
Prioritized Encoding
AC Coefficients
DC Coefficients
MB Information
Motion Vectors
Picture Header
Incr
easi
ngEr
ror P
rote
ctio
n
Internet Video
• Some parts of a bit stream contribute more toimage artifacts than others if lost.
• The bit stream can be prioritized and moreprotection can be added for higher priorityportions.
-
Prioritized Encoding Demo
Internet Video
UnprotectedEncoding
PrioritizedEncoding
(23% Overhead)
Videos used with permission of ICSI, UC Berkeley
-
Error Concealment byInterpolation
d1
d2
Lost block
Take the weighted average of4 neighboring pixels.
Internet Video
-
Other Error ConcealmentTechniques
Internet Video
• Error Concealment with Least Square Constraints• Error Concealment with Bayesian Estimators• Error Concealment with Polynomial Interpolation• Error Concealment with Edge-Based Interpolation• Error Concealment with Multi-directional
Recursive Nonlinear Filter (MRNF)
See references for more information...
-
Example: MRNF Filtering
[email protected] bpp, block loss:10% MRNF-GMLOS, PSNR=34.94dB
Internet Video
-
Network Congestion
• Most multimedia applications place the burden ofrate adaptivity on the source.
• For mutlicasting over heterogeneous networksand receivers, it’s impossible to meet theconflicting requirements which forces the sourceto encode at a least-common denominator level.
• The smallest network pipe dictates the quality forall the other participants of the multicast session.
• If congestion occurs, the quality of servicedegrades as more packets are lost.
Internet Video
-
Receiver-driven LayeredMulticast
• If the responsibility of rate adaptation is moved tothe receiver, heterogeneity is preserved.
• One method of receiver based rate adaptivity is tocombine a layered source with a layeredtransmission system.
• Each bit stream layer belongs to a differentmulticast group.
• In this way, a receiver can control the rate bysubscribing to multicast groups and thus layersof the video bit stream.
Internet Video
-
Receiver-driven LayeredMulticast
S
D3
D2
D1
R
R123
123
12
12
1
Internet Video
Multicast groups are not transmittedon networks that have no subscribers.