chuong 1&2.pdf

Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Email: [email protected] of Elec. and Telecom, Hanoi University of Science and Technology C9-411 Dai Co Viet str. 1, Hanoi

Video Coding

Tien Pham Van, Dr. rer. nat.

Hanoi University of Technology


Agenda

• Video coding process

• Video coding standards

• Future development


Introduction (1/2)

• Why video compression technique is

important ?

• One movie video without compression

– 720 x 480 pixels per frame

– 30 frames per second

– Total 90 minutes

– Full color

– The full data quantity = 167.96 G bytes !!

3


Introduction (2/2)

• What is the difference between video

compression and image compression?

– Temporal Redundancy

• Coding method to remove redundancy

– Intraframe Coding

• Remove spatial redundancy

– Interframe Coding

• Remove temporal redundancy

4


Desired Features

• Better compression

• Improved quality

• Interactivity and Manipulation of Content

• Error Resilience

• Processing of content in the compressed domain

• Identification and selective coding/decoding of the object of interest

• Facilitate Search / Indexing (MPEG-7)


Time table

1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 … 2010

6

JPEG

MPEG1

MPEG2/H.262

MPEG4

H.26L H.264

H.261

H.263

Year

VC-1/VC-2

H.265


Evolution of Video Compression

Standards

H.261

Video Telephony

H.262/MPEG-2

Digital TV/DVD

MPEG-4 Visual

Object-based Coding

H.263

Video Conferencing

H.264 MPEG-4 AVC

MPEG-1

Video-CD

ITU-T MPEG


Where used?

– MPEG-1• Video-CD

• Usually .mpg or .mpeg files are MPEG-1

• DAB Digital Radio is MP2 (MPEG-1 Layer 2)

• MP3 files (MPEG-1 Layer 3)

– MPEG-2:• .vob, .m2v, rarely .mpg files

• Anything to do with DVD– Camcorders, DVD players, DVD recorders

• Digital TV (DVB)

– MPEG-4:• High Quality AVI files

• Video Phones

• DivX

• Some advanced audio players support MPEG-4 Advanced Audio Coding (AAC)


Where used?

–H.263/+/++• NetMeeting and similar video-chat

• Network streaming application, video phone…

– H.264• Video Conferencing: over different networks

• Multimedia Streaming: live and on-demand

• Multimedia Messaging Services (MMS)

• Blu-ray, Digital Video Broadcasting, iPod Video, HD DVD

– VC-1, VC-2 • Video on Internet,

• HDTV broadcast, UHDTV


R-D Performance of MPEG Codecs

32

34

36

38

40

42

44

46

48

50

350 450 550 650 750 850 950 1050

Bit rate (kbps)

PS

NR

(Y

)

MPEG-1 MPEG-2 MPEG-4 H.264


Questions

• What are video/audio codecs ? Name some

popular codecs that your media players

support. What are disadvantages of using

specific codecs ?

• What is container format? Name some

examples.

• Codecs and Formats


Compression...

movie picture 1 movie picture 2


Residue after motion compensation

Pixel-wise difference w/o motion compensation

Motion estimation

“Horse ride”


Motion Prediction

• Motion vector: a motion vector is a bi-dimensional pointer that tell the decoder how much left/right and up/down

• Motion estimation: the process, perfomed by the coder, that should find the motion vector pointing to the best prediction macroblock in a reference frame or field

• Motion compensation: what obtained after applying motion vector on reference frame


Motion Estimation

• Help understanding the content of image sequence

– For surveillance

• Help reduce temporal redundancy of video

– For compression

• Stabilizing video by detecting and removing small, noisy global motions

– For building stabilizer in camcorder


Motion Compensation

• It aims to reduce the data transmitted by detecting the motion of objects

– Use the previous as reference

– In steps:

• Split the current frame in blocks. For each one:

• Find the best-matching block in the reference frame

• The best matching block is coded and transmitted

– Next frame can be used a reference too


Picture type

• Slice

– One or more "contiguous'' macroblocks. The order of

the macroblocks within a slice is from left-to-right

and top-to-bottom.

• Macroblock– A 16-pixel by 16-line section of luminance

components and the corresponding 8-pixel by 8-line

section of the two chrominance components.

• Block – A block is an 8-pixel by 8-line set of values of a

luminance or a chrominance component.


CODEC Design


Coding functions

• Achieve high compression performance while keep

good picture quality

• Theorem

– Spatial redundancy – DCT,DFT,subband,wavelet

– Temporal redundancy – MC/ME

– Statistical redundancy – VLC, Entropy coding

– Perceptual redundancy – VQ


Tradeoffs in lossy compression


DCT

• Use the technique of the JPEG

– DCT based coding scheme

• DCT transform (2D)

• 3D DCT transform ?


Discrete cosine transform

• Use the technique of the JPEG

– Discrete cosine transform


23

DCT Transformation


Steps

Image

Spatial-to-DCT domain

transformation

8 x 8 DCT

Lossless coding of

DCT domain samples

Entropy Coding

Discard unimportant

DCT domain samples

Quantization


Quantization

• Quantization

– Eyes are insensible to high-frequency components

– The greater quantizer means greater loss

– Lower frequency component has smaller quantizer, high frequency component has greater quantizer

– The quantization tables in the encoder and decoder are the same


Picture type

• Video bit stream


Picture type

• Intra picture

– Coded using only information present in the

picture itself

– I-pictures provide potential random access points into the compressed video data.


Picture type

• Predicted picture

– coded with respect to the nearest previous I- or P-

picture.

– P-pictures use motion compensation

– Unlike I-pictures, P-pictures can propagate coding errors


Picture type

• Bidirectional picture

– Coded use both a past and future picture as a

reference

– B-pictures provide the most compression and do

not propagate errors


Picture type

• Typical display order of picture types

• Video stream composition

– The MPEG encoder reorders pictures in the video stream to

present the pictures to the decoder in the most efficient sequence


Hybrid MC-DCT Video Encoder

• Intra-frame: encoded without prediction

• Inter-frame: predictively encoded => use quantized frames as ref for residue


MPEG-1 = JPEG + Motion Prediction + Rate Control

• Early motivation: to encode motion video at 1.5Mbits/s for

transport over T1 data circuits and for replay from CD-ROM

• Defines the decoder but not the encoder

• Frames (pictures)

– Intra-coded using JPEG

– Inter-coded using (interpolated)

ME & MC and JPEG for

the residuals

• MacroBlocks (MBs)

– 16×16 pixels block

• Rate control

– buffer at each end

– Test Model 5 (TM5)

A22

A21

A22 Intracoding of MBs in MPEG is as same as what is described for JPEG, except that 1) unless otherwise specified in the sequence

header MPEG defines quantization tables: one is used for intracoding, the other is used to code any residules when prediction by

montion estimation. 2)Quantization scale factor, or MQuant is different.Author, 6/17/2004

A21 MPEG does not define the encoder. A valid encoder produces a syntactically correct bit stream, resulting in the desired output if the bit

stream is fed to a compliant decoder. But an MPEG-1 complaint decoder is required to decode all valid MPEG-1 bit streams.Author, 6/17/2004


MPEG-2 = MPEG-1 +

• Improvements

– Color space: could support 4:2:2 and 4:4:4 coding

– Quantization: could have 9- or 10- bit precision for DC

coefficients

– Concealment motion vectors: used when an intra-MB is

lost

– Pan and Scan: supports display of different aspect

ratios, e.g., 16:9

• Profiles and levels

– Profiles: define the tools or syntactical elements

– Levels: define the permissible ranges of parameters


MPEG-2 = MPEG-1 +

• Interlace tools

• Scalable coding profiles

• System layer: define two bit stream

constructs

– Program stream (PS): modeled on MPEG-1

(backward compatibility)

– Transport stream (TS): more robust, does not

need a common time base, designed for use in

error-prone environment.


MPEG-4 = MPEG-2+Objects+Other Enhancements• Object-oriented

– Video (texture+shape), image, audio, speech, text, etc.

– Encoded using different techniques

– Transmitted independently

– Composited at the decoder using BInary Format for Scenes

(BIFS)

• Improvements in MPEG-4 version2

– Global motion compensation (GMC)

– Quarter pixel motion compensation

– Shape-adaptive DCT

• Why is MPEG-4 not a success as MPEG-2?

– Not substantially better than MPEG-2

– Suffers from its sheer size and flexibility

– Issue of licensing35


MPEG-4 – Error Resilience Tools

• Video packet resynchronization

– Previous coding standards: Resynchronization markers are

fixed at the beginning of each row of MBs

– MPEG-4: Resynchronization markers are inserted at every

K bits

• Data partitioning

– Partitions the data in a video packet into a motion part and

a texture part separated by a motion boundary marker

(MBM)

Resync.

marker

MB

No.QP HEC

Repeated

header info.

Motion

dataMBM DCT dataA video

packet

use discard use

I-VOPVP

Header

DC DCT

data

AC DCT

dataP-VOP

VP

Header

Motion

data

Texture

data


MPEG-4 – Error Resilience Tools

• Reversible variable length codes (RVLC)

– Finds the next resynchronization marker and

decode backwards

• Header extension code (HEC)

– The header information is repeated after the 1-bit

HEC

• Unequal error protection technique

(UEP)

Resync.

marker

MB

No.QP HEC

Repeated

header info.

Motion

dataMBM DCT data

A video

packet

use discard use

I-VOPVP

Header

DC DCT

data

AC DCT

dataP-VOP

VP

Header

Motion

data

Texture

data


New Features of H.264

• Multi-mode, multi-reference MC

• Motion vector can point out of image border

• 1/4-, 1/8-pixel motion vector precision

• B-frame prediction weighting

• 4×4 integer transform

• Multi-mode intra-prediction

• In-loop de-blocking filter

• UVLC (Uniform Variable Length Coding)

• NAL (Network Abstraction Layer)

• SP-slices


Profiles and Levels

• Profiles: Baseline, Main, and X

– Baseline: Progressive, Videoconferencing &

Wireless

– Main: esp. Broadcast

– X: Mobile network

• Baseline profile is the minimum implementation

– Without CABAC, 1/8 MC, B-frame, SP-slices

• 11 levels

– Resolution, capability, bit rate, buffer, reference #

– Built to match popular international production and

emission formats

– From QCIF to D-Cinema


Basic Marcoblock Coding Structure

Entropy

Coding

Scaling & Inv.

Transform

Motion-

Compensation

Control

Data

Quant.

Transf. coeffs

Motion

Data

Intra/Inter

Coder

Control

Decoder

Motion

Estimation

Transform/

Scal./Quant.-

Input

Video

Signal

Split into

Macroblocks

16x16 pixels

Intra-frame

Prediction

De-blocking

Filter

Output

Video

Signal


Variable block size

• The fixed block size may not be suitable for

all motion objects

– Improve the flexibility of comparison

– Reduce the error of comparison

• 7 types of blocks for selection

– 16×16, 16×8, 8×16, 8×8, 8×4, 4×8, 4×4

41


Multiple Reference Frames

• The neighboring frames are not the most

similar in some cases

• The B-frame can be reference frame

– B-frame is close to the target frame in many

situations


Spatial Prediction for Intra-Coded MBs

• luma

- 4x4: 9 modes

- 16x16: 4 modes

• chroma

- 8x8: 4modes

- The same prediction mode is always applied to

both chroma blocks

M A B C D

I

J

K

L

M A B C D

I

J

K

L

M

I

J

A B C D

K

L

Mean (A-D,

I-M)

M A B C D

I

J

K

L

E F G H

……..

H

V

……..

H

VMean(H, V)

H

V

H

V

……..

H

V ……..

H

V

H

VMean

(H, V)

H

V

…


Deblocking filter

• Picture is filtered using an adaptive deblocking filter.

• The filter removes visible block structures on the

edges of the 4 X 4 blocks caused by block-based

transform coding and motion estimation


Deblocking FiltersA boundary-strength (BS) parameter

is assigned to every 4×4 block• BS = 0 No filtering

BS = 1-3 Slight filtering

BS = 4 Strong filtering

• Filters only when

– |P0-Q0|< α

– |P1-P0|< β

– |Q1-Q0|< β

• Thresholds α and β depend on the average quantization parameter (QP)

• The deblocking filtering accounts for 1/3 of the computational complexity of a decoder.

46

Block modes and

conditions(BS)

One of the blocks is intra-

coded and the edge is a

MB edge

4

One of the blocks is intra-

coded

3

One of the blocks has

coded residuals

2

Difference of block

motion ≥ one luma

sample distance

1

Motion compensation

from different reference

frames

1

Else 0

P3 P2 P1 P0 Q0 Q1 Q2 Q3


SP and SI-Frame Design

• SP and SI-frames

– allow identical reconstruction when coded using different

references

– Subtract the reference in the coder and add it back in the

decoder

• Bitstream switching

– In previous coding standards:

perfect (mismatch-free) switching

only happens at Intra-frames.

• Other applications

– Bitstream splicing

– Error recovery/resilience

– Video redundancy coding47

P2,n-2 P2,n-1SP2,n P2,n+1 P2,n+2

P1,n-2 P1,n-1 P1,n P1,n+1 P1,n+2

SP12,n

Stream 2:

Stream 1:


Transformation

• H.264 employs a 4X4 integer transform

• The transform is an approximation of the DCT

– It has a similar coding-gain to the DCT transform.

– Since the integer transform has an exact inverse

operation, there is no mismatch between the

encoder and the decoder which was a problem in

all DCT based codecs


Network friendliless

• H.264 structure

– Video coding layer (VCL)

– Network abstraction layer (NAL)

Scope of H.264 standard


H.264 Over IP

• Network Abstraction Layer

Unit (NALU)

– A byte stream of variable

length

– 1-byte header

• NALU type (T)

• NALU importance (R)

• Error indication (F)

• RTP packetization

– Simple packetization

• One NALU in one RTP

packet

• NALU header as RTP

header

– NALU fragmentation

– NALU aggregation

OSI/RM Protocols and specifi-cations for H.264

Application Layer� RTP (Real-Time Transport Protocol)

Header size: IP/UDP/RTP = 20+8+12=40 bytes

Media-Unaware RTP payload specifications to reduce the loss rates observed by the decoder.

Packet duplication/Packet based FEC/Audio redundancy coding

� Control protocols: H.245, SIP (Session Initiation Protocol), SDP (Session Description Protocol), RTSP (Real-Time Streaming Protocol)

Presentation Layer

Session Layer

Transport Layer� UDP (User Datagram Protocol)

Network Layer � IP: best effort service

T FR

A1

A1 IP header is 20 bytes in size and protected by a checksum. No protection of the payload is performed.Author, 8/24/2011


Comparison


H265 outlook

• Half-rate reduction compared to H264

• Tree-structured prediction and residual difference

block segmentation

• Extended prediction block sizes (up to 64x64)

• Tile and slice picture segmentations for loss

resilience and parallelism

• Wavefront processing structure for decoder

parallelism

• Mode-dependent sine/cosine transform type

switching

• Adaptive motion vector predictor selection

• Temporal motion vector prediction


3D video coding

53

• Left and right eye view

• Depth sensation

• Resolving 2D viewing ambiguity

• Additional features:

• Free view points

• Depth-controlled

object insertion


Multiview Frame Structure

1 2 3 4 5 6 7

.

.

.

…..

time

view


Predictions based on H.264/AVC JM95


Homework 1

• Download the open source tool X264 from VIDEOLAN website

• Capture a video sequence via webcam or from the Internet

• Work around with FFMPEG to encode and transcode the video sequence with different standards (mpeg2, mpeg4, h.263, h.264, etc), parameters

• Playback the encoded video and comment

• Contain the encoded video sequence in mp4 format


Homework 2

• Draw decoding diagrams for MPEG1, MPEG2,

MPEG4, H264 and 3D


Future development

• Future coding/presentation standards:

– H265, VC-1, VC-2

– MPEG-21, MHEG

• Computer vision

– Game

– Graphics

• Multimedia retrieval

– Segmentation

– Search (Google)

• Multi-camera system

– 3D cinema

– Realistic broadcasting

chuong 1&2.pdf

Documents

pham van tien

network abstraction

motion vector

motion compensation

video sequence

motion estimation

marker mb

picture type