logo video compression npust-minar professor : sheau-ru tong student : chih-ming chen minar
TRANSCRIPT
LOGO
Video CompressionNPUST-MINAR
Professor : Sheau-Ru TongStudent : Chih-Ming Chen
http://minarlab.mis.npust.edu.tw/ MINAR
Outline
Review of basics of image and video compression1
Scalable video coding2
Overview of current video compression standards3
Object-based video coding (MPEG-4)4
2http://minarlab.mis.npust.edu.tw/MINAR
Review of Image Compression
http://minarlab.mis.npust.edu.tw/3 MINAR
Coding an image (single frame): RGB to YUV color-space conversion Partition image into 8x8-pixel blocks 2-D DCT of each block Quantize each DCT coefficient Runlength and Huffman code the nonzero quantized DCT coefficients
Basis for the JPEG Image Compression Standard
JPEG-2000 uses wavelet transform and arithmetic coding
RGBto
YUVBlock DCT Quantization
OriginalSignal
CompressedBitstream
Runlength &Huffman
Coding
Video Compression
http://minarlab.mis.npust.edu.tw/4 MINAR
Main addition over image compression: Exploit the temporal redundancy
Predict current frame based on previously coded frames Three types of coded frames:
I-frame: Intra-coded frame, coded independently of all other frames P-frame: Predicatively coded frame, coded based on previously coded
frame B-frame: Bi-directionally predicted frame, coded based on both
previous and future coded frames
MC-Prediction and Bi-DirectionalMC-Prediction (P- and B-frames)
http://minarlab.mis.npust.edu.tw/5 MINAR
Motion compensated prediction: Predict the current frame based on reference frame(s) while compensating for the motion
Examples of block-based motion-compensated prediction (P-frame) and bi-directional prediction (B-frame):
Previous Frame P-Frame Previous Frame B-Frame Future Frame
Example Use of I-,P-,B-frames:MPEG Group of Pictures (GOP)
http://minarlab.mis.npust.edu.tw/6 MINAR
Arrows show prediction dependencies between frames
Summary of Temporal Processing
http://minarlab.mis.npust.edu.tw/7 MINAR
Use MC-prediction (P and B frames) to reduce temporal redundancy
MC-prediction usually performs well; In compression have a second chance to recover when it performs badly
MC-prediction yields: Motion vectors MC-prediction error or residual Code error with conventional
image coder
Sometimes MC-prediction may perform badly Examples: Complex motion, new imagery (occlusions) Approach:
1. Identify blocks where prediction fails
2. Code block without prediction
Basic Video Compression Algorithm
http://minarlab.mis.npust.edu.tw/8 MINAR
Exploiting the redundancies: Temporal: MC-prediction (P and B frames) Spatial: Block DCT Color: Color space conversion
Scalar quantization of DCT coefficients Zigzag scanning, runlength and Huffman coding of the
nonzero quantized DCT coefficients
Outline
Review of basics of image and video compression1
Scalable video coding2
Overview of current video compression standards3
Object-based video coding (MPEG-4)4
11http://minarlab.mis.npust.edu.tw/MINAR
Motivation for Scalable Coding
Basic situation:
1. Diverse receivers may request the same video Different bandwidths, spatial resolutions, frame rates, computational
capabilities
2. Heterogeneous networks and a priori unknown network conditions Wired and wireless links, time-varying bandwidths
When you originally code the video you don’t know which client or network situation will exist in the future
Probably have multiple different situations, each requiring a different compressed bitstream
Need a different compressed video matched to each situation Possible solutions:
1. Compress & store MANY different versions of the same video
2. Real-time transcoding (e.g. decode/re-encode)
3. Scalable codinghttp://minarlab.mis.npust.edu.tw/
12 MINAR
Scalable Video Coding
Scalable coding: Decompose video into multiple layers of prioritized importance Code layers into base and enhancement bitstreams Progressively combine one or more bitstreams to produce different
levels of video quality
Example of scalable coding with base and two enhancement layers: Can produce three different qualities1. Base layer
2. Base + Enh1 layers
3. Base + Enh1 + Enh2 layers
Scalability with respect to: Spatial or temporal resolution, bit rate, computation, memory
http://minarlab.mis.npust.edu.tw/13 MINAR
Higher quality
Example of Scalable Coding
Encode image/video into three layers:
Low-bandwidth receiver: Send only Base layer
Medium-bandwidth receiver: Send Base & Enh1 layers
High-bandwidth receiver: Send all three layers
Can adapt to different clients and network situations
http://minarlab.mis.npust.edu.tw/14 MINAR
Scalable Video Coding (cont.)
Three basic types of scalability (refine video quality along three different dimensions): Temporal scalability Temporal resolution Spatial scalability Spatial resolution SNR (quality) scalability Amplitude resolution
Each type of scalable coding provides scalability of one dimension of the video signal Can combine multiple types of scalability to provide scalability along
multiple dimensions
http://minarlab.mis.npust.edu.tw/15 MINAR
Scalable Coding: Temporal Scalability
Temporal scalability: Based on the use of B-frames to refine the temporal resolution B-frames are dependent on other frames However, no other frame depends on a B-frame Each B-frame may be discarded without affecting other frames
http://minarlab.mis.npust.edu.tw/16 MINAR
Scalable Coding: Spatial Scalability
Spatial scalability: Based on refining the spatial resolution Base layer is low resolution version of video Enh1 contains coded difference between upsampled base layer and
original video Also called: Pyramid coding
http://minarlab.mis.npust.edu.tw/17 MINAR
Scalable Coding: SNR (Quality) Scalability
SNR (Quality) Scalability: Based on refining the amplitude resolution Base layer uses a coarse quantizer Enh1 applies a finer quantizer to the difference between the original
DCT coefficients and the coarsely quantized base layer coefficients
http://minarlab.mis.npust.edu.tw/18 MINAR
Summary of Scalable Video Coding
Three basic types of scalable coding: Temporal scalability Spatial scalability SNR (quality) scalability
Scalable coding produces different layers with prioritized importance
Prioritized importance is key for a variety of applications: Adapting to different bandwidths, or client resources such as spatial or
temporal resolution or computational power Facilitates error-resilience by explicitly identifying most important
and less important bits
http://minarlab.mis.npust.edu.tw/19 MINAR
Outline
Review of basics of image and video compression1
Scalable video coding2
Overview of current video compression standards3
Object-based video coding (MPEG-4)4
20http://minarlab.mis.npust.edu.tw/MINAR
Motivation for Standards
Goal of standards: Ensuring interoperability: Enabling communication between devices
made by different manufacturers Promoting a technology or industry Reducing costs
http://minarlab.mis.npust.edu.tw/21 MINAR
What do the Standards Specify?
Not the encoder Not the decoder Just the bitstream syntax and the decoding process (e.g. use IDCT, but not
how to implement the IDCT)
Enables improved encoding & decoding strategies to be employed in a standard-compatible manner
http://minarlab.mis.npust.edu.tw/22 MINAR
Encoder DecoderBitstream
Scope of Standardization
(Decoding Process)
Current Image and VideoCompression Standards
Standard Application Bit Rate
JPEG Continuous-tone still-image compression Variable
H.261 Video telephony and teleconferencing over ISDN
p x 64 kb/s
MPEG-1 Video on digital storage media (CD-ROM) 1.5 Mb/s
MPEG-2 Digital Television 2-20 Mb/s
H.263 Video telephony over PSTN 33.6-? kb/s
MPEG-4 Object-based coding, synthetic content, interactivity
Variable
JPEG-2000 Improved still image compression Variable
H.26L Improved video compression 10’s to 100’s kb/s
http://minarlab.mis.npust.edu.tw/23 MINAR
Comparing Current Video Compression Standards
Based on the same fundamental building blocks Motion-compensated prediction (I, P, and B frames) 2-D Discrete Cosine Transform (DCT) Color space conversion Scalar quantization, runlengths, Huffman coding
Additional tools added for different applications: Progressive or interlaced video Improved compression, error resilience, scalability, etc.
MPEG-1/2/4, H.261/3/L: Frame-based coding MPEG-4: Object-based coding and Synthetic video
http://minarlab.mis.npust.edu.tw/24 MINAR
MPEG-1 and MPEG-2
MPEG-1 (1991) Goal: Compression for digital storage media (e.g. CD-ROM) Achieves VHS quality video and audio at ~1.5 Mb/s
MPEG-2 (1993) Goal: Superset of MPEG-1 to support higher bit rates, higher
resolutions, and interlaced pictures. Original goal to support interlaced video from conventional television;
Eventually extended to support HDTV Provides: Field-based coding and scalability tools
http://minarlab.mis.npust.edu.tw/25 MINAR
Example Use of I-,P-,B-frames:MPEG Group of Pictures (GOP)
Arrows show prediction dependencies between frames
http://minarlab.mis.npust.edu.tw/26 MINAR
MPEG Group of Pictures (GOP) Structure
Composed of I, P, and B frames Arrows show prediction dependencies Periodic I-frames enable random access into the coded
bitstream Parameters: (1) Spacing between I frames, (2) number of B
frames between I and P frames
http://minarlab.mis.npust.edu.tw/27 MINAR
MPEG Structure
MPEG codes video in a hierarchy of layers. The sequence layer is not shown.
http://minarlab.mis.npust.edu.tw/28 MINAR
GOP Layer Picture Layer
Slice Layer MacroblockLayer
BlockLayer
MPEG-2 Profiles and Levels
Goal: To enable more efficient implementations for different applications (interoperability points) Profile: Subset of the tools applicable for a family of applications Level: Bounds on the complexity for any profile
http://minarlab.mis.npust.edu.tw/29 MINAR
Level
Profile
High
High
Main
Main
Low
Simple
HDTV: Main Profile atHigh Level (MP@HL)
DVD & SD Digital TV:Main Profile at Main Level(MP@ML)
Goals of MPEG-4
Primary goals: New functionalities (not just better compression) Object-based or content-based representation Separate coding of individual visual objects Content-based access and manipulation Integration of natural and synthetic objects Interactivity Communication over error-prone environments
Includes frame-based coding techniques from earlier standards
http://minarlab.mis.npust.edu.tw/30 MINAR
Comparing MPEG-1/2 and H.261/3 with MPEG-4
MPEG-1/2 and H.261/H.263: Algorithms for compression Basically describe a pipe for storage or transmission Frame-based Emphasis on hardware implementation
MPEG-4: Set of tools for a variety of applications Define tools and glue to put them together Object-based and frame-based Emphasis on software Downloadable algorithms (not encoders or decoders)
http://minarlab.mis.npust.edu.tw/31 MINAR
Outline
Review of basics of image and video compression1
Scalable video coding2
Overview of current video compression standards3
Object-based video coding (MPEG-4)4
32http://minarlab.mis.npust.edu.tw/MINAR
Comments on Object-based Processing
Basic goal: Separate encoding/decoding of separate objects in a scene
Separate processing of each object enables: Identification and selective decoding and/or processing of object of
interest Facilitates interactivity and manipulation of content Processing of content in the compressed domain Possible w/o decoding or segmentation at decoder
Used for many years in authoring/production Video: bluescreening, e.g. weather-news Audio: individual processing of each voice
MPEG-4 also enables end-user to have object-based processing
http://minarlab.mis.npust.edu.tw/33 MINAR
Different Parts of MPEG-4
Video Coding and expression of natural and synthetic video objects
Audio Coding and expression of natural and synthetic speech and audio
objects
Systems Scene Description: Composition of different audio and video objects in
the scene BIFS: Binary Format for Scene Description Buffering, multiplexing, timing Interaction
Delivery (Delivery of MM Integration Framework, DMIF) Setup of connection (broadcast, interactive) Network is transparent to application
http://minarlab.mis.npust.edu.tw/34 MINAR
Scene Description
Scene description: Describes the spatio-temporal positioning of the individual audio &
video (AV) objects to compose the scene AV Objects: audio, video, natural, synthetic, 2-D, 3-D
Transmitted separately from object bitstreams Scene description info is a property of scene’s structure rather than
individual objects
Enables scene modification without decoding objects
Can be dynamically altered
http://minarlab.mis.npust.edu.tw/35 MINAR
Scene Description (cont.)
Hierarchical, tree structure: Leaf nodes: individual AV objects Other nodes: meaningful grouping
http://minarlab.mis.npust.edu.tw/37 MINAR
[MPEG Committee]
Object-based Processingin the Compressed Domain
Each video or audio object coded into a separate bitstream Scene description contains all non-coded information Possible operations:
Add/delete an object: Add/discard bitstream, e.g. individual instruments in an orchestra
Manipulate (e.g. move) object: Alter visual/audio scene composition
Many object-based operations can be performed without requiring decoding
http://minarlab.mis.npust.edu.tw/39 MINAR
MPEG-4 Natural Video
MPEG-4 has two primary goals for natural video coding: High compression efficiency coding
Rectangular frames High coding efficiency (64-384 kb/s), low latency, low complexity Error resilience against packet loss, burst errors on wireless links Applications include: Video streaming over the Internet, video over 3G
cellular systems
Object-based coding Content-based functionalities Arbitrarily shaped visual objects Separate encoding & decoding of each object Greatly improved content creation capabilities, as well as interactivity
with different objects at the client
http://minarlab.mis.npust.edu.tw/40 MINAR
MPEG-4 Coding of Natural Video
Classes of video to represent: Rectangular images
Shape (rectangle) does not change with time Code motion and amplitude information Use conventional coding methods, e.g. MPEG-1/2
Arbitrarily shaped (non-rectangular) image regions Shape usually changes with time Must code motion, amplitude (texture) and shape
Arbitrary & time-varying shape complicates coding Also describe how objects are composed to form scene (scene
description) Separate encoding and decode of each object
http://minarlab.mis.npust.edu.tw/41 MINAR
Frame-
based
coding
Object-
based
coding
MPEG-4 Natural Video Coding
Extension of MPEG-1/2-type algorithms to code arbitrarily shaped objects
http://minarlab.mis.npust.edu.tw/42 MINAR
Frame-based Coding
Object-based Coding
Basic Idea: Extend Block-DCT and Block-ME/MC-prediction to code arbitrarily shaped objects
[MPEG Committee]
Coding of Arbitrarily Shaped Video Objects
Following slides briefly discuss different aspects of coding arbitrarily shaped video objects: Coding of texture (amplitude) information MC-prediction I, P, B coding of objects Coding of shape information
Goal: To give brief, conceptual overview
(Not covered on problem sets or quiz) Key points to take away:
1. Different attributes to code for arbitrarily shaped video objects
Texture, motion, & shape information
2. MPEG-4 extends block-based coding to code arbitrarily shaped objects (Not an elegant solution, but it works)
http://minarlab.mis.npust.edu.tw/43 MINAR
Example of Arbitrarily Shaped Object
Arbitrarily shaped 2-D object (image region): Video object plane (VOP) in MPEG-4
http://minarlab.mis.npust.edu.tw/44 MINAR
[MPEG Committee]
Comments on Segmentation
Segmentation of video into objects is not standardized (part of encoder)
Different segmentations scenarios: Sometimes segmentation is available, e.g. synthetically generated
content Sometimes it is relatively easy, e.g. bluescreening or video-
conferencing Usually it is very difficult
http://minarlab.mis.npust.edu.tw/45 MINAR
Coding the Texture of an Arbitrarily Shaped Object
Texture (amplitude) coded by Block-DCT adapted for arbitrarily shaped support1. Embed VOP in rectangle
2. Separate processing of each 8x8 block
a) Interior ® Conventional Block-DCT
b) Exterior ® Discard
c) Boundary ® Extrapolate then Block-DCT
http://minarlab.mis.npust.edu.tw/46 MINAR
[MPEG Committee]
MC-Prediction for Texture Coding of Arbitrarily Shaped Object
Block-based ME/MC-P adapted for arbitrarily shaped support:1. Extrapolate arbitrarily shaped object to fill rectangle
2. Perform conventional block-based ME/MC-P
• Error metric computed only over object’s support in current frame
Also: Parametric motion models (e.g. affine, perspective)
http://minarlab.mis.npust.edu.tw/47 MINAR
[MPEG Committee]
MC-Prediction for Video Object Planes: I, P, and B VOP’s
MC-Prediction for VOP’s: I-VOP: Intra-coded VOP (no prediction) P-VOP: Predicted VOP B-VOP: Bi-directionally predicted VOP
http://minarlab.mis.npust.edu.tw/48 MINAR
Binary Shape Coding
Opaque objects: Each pixel either inside or outside support Shape given by binary alpha map (bitmap or binary mask)
Many possible approaches for lossless and lossy shape coding e.g. Describe shape by chain code, polynomials, splines, bitmap
MPEG-4: Block-based Context-based Arithmetic Coding (CAE)1. Embed support in rectangle
2. Separate processing of 16x16 blocks
a) Interior (opaque) blocks (completely within object)
b) Exterior (transparent) blocks (completely outside object)
c) Boundary blocks CAE
Also motion compensated CAE
http://minarlab.mis.npust.edu.tw/49 MINAR
Binary Shape Coding:Block-based Shape Coding
Different 16x16 blocks: Interior, boundary, and exterior
http://minarlab.mis.npust.edu.tw/50 MINAR
Binary Shape Coding:Block-based CAE (cont.)
Coding of boundary blocks using CAE: Intra-shape coding
Context defined by 10-pixel template
Inter-shape coding MC-shape using shape motion vector Context defined by 9-pixel template from current and previous frames
http://minarlab.mis.npust.edu.tw/51 MINAR
PreviousFrame
Current Frame
Sprite Coding (Background Prediction)
Sprite: Large background image Hypothesis: Same background exists for many frames, changes
resulting from camera motion and occlusions
One possible coding strategy:1. Code & transmit entire sprite once
2. Only transmit camera motion parameters for each subsequent frame
Significant coding gain for some scenes
http://minarlab.mis.npust.edu.tw/52 MINAR
Sprite Coding Example
http://minarlab.mis.npust.edu.tw/53 MINAR
Sprite (background) Foreground Object
Reconstructed Frame[MPEG Committee]
Related MPEG Standards(non-compression)
MPEG-7 “Multimedia Content Description Interface” Goal: A method for describing multimedia content to enable efficient
searching and management of multimedia.
MPEG-21 “Multimedia Framework” Goal: To enable the electronic commerce of digital media content.
http://minarlab.mis.npust.edu.tw/54 MINAR
References and Further Reading
General Video Compression References: J.G. Apostolopoulos and S.J. Wee, ``Video Compression Standards'‘,
Wiley Encyclopedia of Electrical and Electronics Engineering, John Wiley & Sons, Inc., New York, 1999.
V. Bhaskaran and K. Konstantinides, Image and Video Compression Standards: Algorithms and Architectures, Boston, Massachusetts: Kluwer Academic Publishers, 1997.
J.L. Mitchell, W.B. Pennebaker, C.E. Fogg, and D.J. LeGall, MPEG Video Compression Standard, New York: Chapman & Hall, 1997.
B.G. Haskell, A. Puri, A.N. Netravali, Digital Video: An Introduction to MPEG-2, Kluwer Academic Publishers, Boston, 1997.
MPEG web site: http://drogo.cselt.stet.it/mpeg
http://minarlab.mis.npust.edu.tw/55 MINAR
References and Further Reading (cont.)
Video Compression Standards Documents Video codec for audiovisual services at px64 kbits/s, ITU-T
Recommendation H.261, International Telecommunication Union, 1990. Video coding for low bit rate communication, ITU-T Recommendation
H.263, International Telecommunication Union, version 1, 1996; version 2, 1997.
ISO/IEC 11172, Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbits/s. International Organization for Standardization (ISO), 1993.
ISO/IEC 13818. Generic coding of moving pictures and associated audio information. International Organization for Standardization (ISO), 1996.
ISO/IEC 14496. Coding of audio-visual objects. International Organization for Standardization (ISO), 1999.
http://minarlab.mis.npust.edu.tw/56 MINAR