the h.264/avc video coding standard€¦ · the h.264/avc video coding standard (itu -t rec. h.264...

31
The H.264/AVC The H.264/AVC Video Coding Standard Video Coding Standard (ITU (ITU - - T Rec. H.264 | ISO/IEC 14496 T Rec. H.264 | ISO/IEC 14496 - - 10) 10) Gary J. Sullivan, Ph.D. Gary J. Sullivan, Ph.D. ITU ITU - - T VCEG Rapporteur / Chair T VCEG Rapporteur / Chair ISO/IEC MPEG Video Rapporteur / Co ISO/IEC MPEG Video Rapporteur / Co - - Chair Chair ITU/ISO/IEC JVT Rapporteur / Co ITU/ISO/IEC JVT Rapporteur / Co - - Chair Chair Microsoft Corporation Video Architect Microsoft Corporation Video Architect November 2007 November 2007

Upload: others

Post on 17-Nov-2019

35 views

Category:

Documents


0 download

TRANSCRIPT

The H.264/AVCThe H.264/AVC

Video Coding StandardVideo Coding Standard(ITU(ITU--T Rec. H.264 | ISO/IEC 14496T Rec. H.264 | ISO/IEC 14496--10)10)

Gary J. Sullivan, Ph.D.Gary J. Sullivan, Ph.D.

ITUITU--T VCEG Rapporteur / ChairT VCEG Rapporteur / Chair

ISO/IEC MPEG Video Rapporteur / CoISO/IEC MPEG Video Rapporteur / Co--ChairChair

ITU/ISO/IEC JVT Rapporteur / CoITU/ISO/IEC JVT Rapporteur / Co--Chair Chair

Microsoft Corporation Video ArchitectMicrosoft Corporation Video Architect

November 2007November 2007

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 1

Video CodingVideo Coding

Standardization OrganizationsStandardization Organizations§ Two organizations have historically dominated general-purpose

video compression standardization:

• ITU-T Video Coding Experts Group (VCEG)

International Telecommunications Union –Telecommunications Standardization Sector (ITU-T,a United Nations Organization, formerly CCITT),Study Group 16, Question 6

• ISO/IEC Moving Picture Experts Group (MPEG)

International Standardization Organization and International Electrotechnical Commission, Joint Technical Committee Number 1, Subcommittee 29, Working Group 11

§ Recently, the Society for Motion Picture and Television Engineers (SMPTE) has also entered with “VC-1”, based on Microsoft’s WMV 9 but this talk covers only the ITU and ISO/IEC work.

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 2

1990 1996 20021992 1994 1998 2000

H.263(1995-2000+)

H.263(1995-2000+)

MPEG-4

Visual(1998-2001+)

MPEG-4

Visual(1998-2001+)

MPEG-1 (1993)

MPEG-1 (1993)

ISO

/IE

CIT

U-T

H.120(1984-1988)

H.120(1984-1988)

H.261(1990+)

H.261(1990+)

H.262 / MPEG-2

(1994/95-1998+)

H.262 / MPEG-2

(1994/95-1998+)

H.264 / MPEG-4

AVC(2003-2007+)

H.264 / MPEG-4

AVC(2003-2007+)

Chronology of InternationalChronology of International

Video Coding StandardsVideo Coding Standards

2004

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 3

The The ScopeScope of Picture and Video of Picture and Video

Coding StandardizationCoding Standardization

§ Only the Syntax and Decoder are standardized:

• Permits optimization beyond the obvious

• Permits complexity reduction for implementability

• Provides no guarantees of Quality

Pre-Processing EncodingSource

DestinationPost-Processing

& Error Recovery

Decoding

Scope of Standard

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 4

The Advanced Video CodingThe Advanced Video Coding ProjectProject

AVC / ITUAVC / ITU--T H.264 / MPEGT H.264 / MPEG--4 part 104 part 10§ History: ITU-T Q.6/SG16 (VCEG - Video Coding Experts Group)

“H.26L” standardization activity (where the “L” stood for “long-term”)

§ Aug 1999: 1st test model (TML-1)

§ July 2001: MPEG open call for technology: H.26L demo’ed

§ Dec 2001: Formation of the Joint Video Team (JVT) between VCEG and MPEG to finalize H.26L as a new joint project (similar to MPEG-2/H.262)

§ July 2002: Final Committee Draft status in MPEG

§ Dec ‘02 Technical freeze, FCD ballot approved

§ May ’03 Completed in both orgs

§ July ’04 Fidelity Range Extensions (FRExt) completed

§ Jan ’07 Professional Profiles completed

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 5

§ Primary technical objectives:

• Significant improvement in coding efficiency

• High loss/error robustness

• “Network Friendliness” (carry it well on MPEG-2 or RTP or H.32x or in MPEG-4 file format or MPEG-4 systems or …)

• Low latency capability (better quality for higher latency)

• Exact match decoding

§ Initial extension objectives (in FRExt and Prof Profiles):

• Professional applications (more than 8 bits per sample, 4:4:4 color sampling, etc.)

• Higher-quality high-resolution video• Alpha plane support (a degree of “object” functionality)

• Extended color gamut support

H.264/AVC ObjectivesH.264/AVC Objectives

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 6

§ Test of different standards (ICIP 2002 study)

§ Using same rate-distortion optimization techniques for all codecs

§ Streaming test: High-latency (included B frames)

• Four QCIF sequences coded at 10 Hz and 15 Hz (Foreman, Container, News, Tempete) and

• Four CIF sequences coded at 15 Hz and 30 Hz (Bus, Flower Garden, Mobile and Calendar, and Tempete)

§ Real-time conversation test: No B frames

• Four QCIF sequences encoded at 10Hz and 15Hz (Akiyo, Foreman, Mother and Daughter, and Silent Voice)

• Four CIF sequences encoded at 15Hz and 30Hz (Carphone, Foreman, Paris, and Sean)

§ Compare four codecs using PSNR measure:

• MPEG-2 (in high-latency/streaming test only)

• H.263 (high-latency profile, conversational high-compression profile, baseline profile)

• MPEG-4 Visual (simple and advanced simple profiles with & without B pictures)

• H.264/AVC version 1 (with & without B pictures)

§ Note: These test results are from a private study and are not an endorsed report of the JVT, VCEG or MPEG organizations.

A ComparisonA Comparison of of PerformancePerformance

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 7

ComparisonComparison to MPEGto MPEG--2, H.263, 2, H.263, MPEGMPEG--4p24p2

27

28

29

30

31

32

33

34

35

36

37

38

39

0 50 100 150 200 250

Bit-rate [kbit/s]

Foreman QCIF 10Hz

Quality

Y-PSNR [dB]

MPEG-2 or MPEG-1

H.263 (+)

MPEG-4 Visual ASP

MPEG-4 AVC/H.264

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 8

MPEGMPEG--4 AVC/H.264 Structure4 AVC/H.264 Structure

EntropyCoding

Scaling & Inv. Transform

Motion-Compensation

ControlData

Quant.Transf. coeffs

MotionData

Intra/Inter

CoderControl

Decoder

MotionEstimation

Transform/Scal./Quant.-

InputVideoSignal

Split intoMacroblocks16x16 luma

samples+chroma

Intra-picture Prediction

DeblockingFilter

OutputVideoSignal

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 9

EntropyCoding

Scaling & Inv. Transform

Motion-Compensation

ControlData

Quant.Transf. coeffs

MotionData

Intra/Inter

CoderControl

Decoder

MotionEstimation

Transform/Scal./Quant.-

InputVideoSignal

Split intoMacroblocks16x16 luma

samples+chroma

Intra-picture Prediction

De-blockingFilter

OutputVideoSignal

Motion Compensation AccuracyMotion Compensation Accuracy

Motion vector accuracy 1/4 sample for luma(6-tap filter)

8x8

0

4x8

0 10 1

2 3

4x48x4

1

08x8

Types

0

16x16

0 1

8x16

MBTypes

8x8

0 1

2 3

16x8

1

0

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 10

EntropyCoding

Scaling & Inv. Transform

Motion-Compensation

ControlData

Quant.Transf. coeffs

MotionData

Intra/Inter

CoderControl

Decoder

MotionEstimation

Transform/Scal./Quant.-

InputVideoSignal

Split intoMacroblocks16x16 luma

samples+chroma

Intra-picture Prediction

De-blockingFilter

OutputVideoSignal

MotionData

OutputVideoSignal

Multiple Reference FramesMultiple Reference Frames

§Multiple Reference Pictures

§Generalized Referencing Relationships

§Generalized Bi-Prediction

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 11

EntropyCoding

Scaling & Inv. Transform

Motion-Compensation

ControlData

Quant.Transf. coeffs

MotionData

Intra/Inter

CoderControl

Decoder

MotionEstimation

Transform/Scal./Quant.-

InputVideoSignal

Split intoMacroblocks16x16 luma

samples+chroma

Intra-picture Prediction

De-blockingFilter

OutputVideoSignal

IntraIntra PredictionPrediction§ Directional spatial prediction

(9 types for 4x4 luma pred,

4 types for 16x16 luma pred,

4 types for 8x8 chroma pred)

e.g., Mode 4:

diagonal down/right prediction

a, f, k, p are predicted by

(A + 2M + I + 2) >> 2

M A B C D E F G HI a b c dJ e f g hK i j k lL m n o p

1

8

3 4

5

6

70

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 12

EntropyCoding

Scaling & Inv. Transform

Motion-Compensation

ControlData

Quant.Transf. coeffs

MotionData

Intra/Inter

CoderControl

Decoder

MotionEstimation

Transform/Scal./Quant.-

InputVideoSignal

Split intoMacroblocks16x16 pixels

Intra-picture Prediction

De-blockingFilter

OutputVideoSignal

DCTDCT--Like TransformLike Transform CodingCoding

§ 4x4 Block Integer Transform

§ Hierarchical transform of DC coeffs

for 8x8 chroma and 16x16 Intra

luma blocks

1 1 1 1

2 1 1 2

1 1 1 1

1 2 2 1

− − = − − − −

H

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 13

Deblocking FilterDeblocking Filter

EntropyCoding

Scaling & Inv. Transform

Motion-Compensation

ControlData

Quant.Transf. coeffs

MotionData

Intra/Inter

CoderControl

Decoder

MotionEstimation

Transform/Scal./Quant.-

InputVideoSignal

Split intoMacroblocks16x16 luma

samples+chroma

Intra-picture Prediction

DeblockingFilter

OutputVideoSignal

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 14

Entropy CodingEntropy Coding

EntropyCoding

Scaling & Inv. Transform

Motion-Compensation

ControlData

Quant.Transf. coeffs

MotionData

Intra/Inter

CoderControl

Decoder

MotionEstimation

Transform/Scal./Quant.-

InputVideoSignal

Split intoMacroblocks16x16 luma

samples+chroma

Intra-picture Prediction

DeblockingFilter

OutputVideoSignal

•UVLC/Exp-Golomb +

•CABAC or

•CAVLC

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 15

§ Three profiles in version 1: Baseline, Main, and Extended

§ Baseline (esp. Videoconferencing & Wireless)

• I and P progressive-scan picture coding (not B)

• In-loop deblocking filter

• 1/4-sample motion compensation

• Tree-structured motion segmentation down to 4x4 block size

• VLC-based entropy coding

• Some enhanced error resilience features

– Flexible macroblock ordering/arbitrary slice ordering

– Redundant slices

H.264/AVC Version 1 ProfilesH.264/AVC Version 1 Profiles

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 16

§ Main Profile (esp. Broadcast)

• All Baseline features except enhanced error resilience features

• Interlaced video handling

• Generalized B pictures

• Adaptive weighting for B and P picture prediction

• CABAC (arithmetic entropy coding)

§ Extended Profile (esp. Streaming)

• All Baseline features

• Interlaced video handling

• Generalized B pictures

• Adaptive weighting for B and P picture prediction

• More error resilience: Data partitioning

• SP/SI switching pictures

NonNon--Baseline H.264/AVC Version 1 ProfilesBaseline H.264/AVC Version 1 Profiles

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 17

FidelityFidelity--Range and Professional ExtensionsRange and Professional Extensions

§ AVC standard finished May 2003, published as “twin text”

• ITU-T Recommendation H.264

• ISO/IEC 14496-10 MPEG-4 AVC

§ Fidelity-Range Extensions (FRExt)

• Work item initiated in July 2003

• More than 8 bits, color other than 4:2:0

• Alpha coding

• More coding efficiency capability

• Also new supplemental information

§ Professional Profiles

• Work item initiated in October 2005

• Focus initially on 4:4:4 (replacing prior FRExt 4:4:4 profile)

• Later work on all-intra and new supplemental information

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 18

FRExtFRExt Technical Features Technical Features –– Part 1Part 1

§ Larger transforms

• 8x8 transform (as was in older standards)

• Drop 4x8, 8x4, or larger, 16-point…

§ Filtered intra prediction modes for 8x8 block size

§ Quantization matrix

• 4x4, 8x8, intra, inter trans. coefficients weighted differently

• Old idea, dating to JPEG and before (circa 1986?)

• Full capabilities not yet explored (visual weighting)

§ Coding in various color spaces

• 4:2:2, 4:2:0, Monochrome, with/without Alpha

• New integer color transform (a VUI-message item)

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 19

FRExtFRExt Technical Features Technical Features –– Part 2Part 2

§ Efficient lossless interframe coding

§ Film grain characterization for analysis/synthesis

representation

§ Stereo-view video support

§ Deblocking filter display preference

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 20

8x8 168x8 16--Bit (Bit (BossenBossen) Transform) Transform

−−−−−−−−

−−−−−−−−

−−−−−−−−

−−−−

361012121063

48844884

612310103126

88888888

103126612310

84488448

121063361012

88888888

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 21

8x8 Transform Advantage8x8 Transform Advantage

(JVT(JVT--K028, IBBP coding, K028, IBBP coding, progprog. scan). scan)

12.48Average

15.65Riverbed

10.93Crawford

13.46Movie 5

11.06Movie 4

12.01Movie 3

12.71Movie 2

11.59Movie 1

% BD bit-rate reductionSequence

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 22

Quantization MatrixQuantization Matrix

§ Similar concept to MPEG-2 design

§ Vary step size based on frequency

§ Adapted to modified transform structure

§ More efficient representation of weights

§ Eight downloadable matrices (at least 4:2:0)

• Intra 4x4 Y, Cb, Cr

• Intra 8x8 Y

• Inter 4x4 Y, Cb, Cr

• Inter 8x8 Y

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 23

New Profiles Created by New Profiles Created by FRExtFRExt

§ 4:2:0, 8-bit: “High” (HP)

§ 4:2:0, 10-bit: “High 10” (Hi10)

§ 4:2:2, 10-bit: “High 4:2:2” (Hi422)

§ Effectively the same tools, but acting on different input data

§ The High Profile has been a major force in recent industry developments (HD DVD, Blu-ray Disc, DBS, Terrestrial Broadcast, IPTV, etc.)

§ The others are emerging in professional applications (e.g., content acquisition, editing, studios, recording)

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 24

A Performance Test for High Profile A Performance Test for High Profile

(from JVT(from JVT--L033 L033 -- Panasonic)Panasonic)

§ Subjective tests by Blu-Ray Disk Founders of FRExt HP

• 4:2:0/8 (HP) 1920x1080x24p (1080p), 3 clips.

• Nominal 3:1 advantage to MPEG-2

– 8 Mbps HP scored better than 24 Mbps MPEG-2

• Apparent transparency at 16 Mbps

H.264/AVC FRExt 8Mbps

Original MPEG2 24 Mbs, DVHS

emulation

Mean OpinionScore

2

2.5

3

3.5

4

4.5

5

H.264/AVC FRExt

12Mbps

H.264/AVC FRExt

16Mbps

H.264/AVC FRExt

20Mbps

3.65 3.71 4.00

4.03 3.90

3.59

Figure 1: Results of subjective test with studio participants (Blu-Ray Disk Founders)

5: Perfect

4: Good

3: Fair (OK for DVD)

2: Poor

1: Very Poor

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 25

Some Notes on Quality TestingSome Notes on Quality Testing

§ Use recent reference software (if using ref software)

§ Use rate-distortion optimization in encoder

§ Use large-range good-quality motion search

§ Use appropriate “High” profile (incl. adaptive transform)

§ If testing for PSNR, use “flat” quant matrices

§ Otherwise, use “non-flat” quant matrices

§ Use CABAC entropy coding

§ Use more than 1 or 2 reference pictures

§ Use hierarchical B reference frames coding structure

§ Use bi-predictive search optimization (see JVT-N014)

§ If testing high-quality PSNR, use adaptive thresholding*

* = See G. Sullivan & S. Sun, “On Dead-Zone…”, VCIP 2005/JVT-N011

AVC Profile Overview

BaselineBaseline

ExtendedExtended

SI and SPslice

ASO

FMO

Redundantpictures

Datapartitioning

I and P slice

Motion-compensatedprediction

In-loop deblocking

Intra prediction

CAVLC

B slice

Fieldcoding

MBAFF

Weightedprediction

MainMain

CABAC

HighHigh

8××××8spatialprediction

8××××8transform

Monochromeformat

Scalingmatrices

High 10High 10

High 4:2:2High 4:2:2

8-10bsamplebit depth

4:2:2chromaformat

High 4:4:4High 4:4:4

PredictivePredictive

4:4:4chromaformat

8-14bsamplebit depth

Predictivelossless

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 27

High 4:4:4 Intra (14b)

High 4:4:4 Predictive (14b)

High 4:2:2 Intra

(10b)

High 10 Intra

(4:2:0 10b)

High 4:2:2

(Predictive 10b)

High 10

(4:2:0 Predictive 10b)

Notes:

Arrows denote capability subset hierarchy.

Four profiles not shown: Baseline, Extended, Main, High.

Existing

CAVLC 4:4:4 Intra

(14b)

CABAC

+CAVLC

CAVLC

only

New Profiles for Professional Apps (2007)New Profiles for Professional Apps (2007)

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 28

Scalable Baseline

Baseline High

Scalable High Scalable High Intra

Spatial scalability

(dyadic, 3/2)

Coarse-grain scalability

Spatial scalability(arbitrary up to 2)

Coarse-grain scalability

Existing

New Scalable Video Coding ProfilesNew Scalable Video Coding Profiles

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 29

WorkWork--inin--progress: Multiprogress: Multi--view Video Codingview Video Coding

§ N camera views, synchronized in time

§ Example prediction structure:

Univ. Wash. Data Compression Class Guest Lecture, Nov 2007 Gary J. Sullivan 30

§ JVT, MPEG, and VCEG management team members:

• Gary J. Sullivan ([email protected])

• Jens-Rainer Ohm ([email protected])

• Ajay Luthra ([email protected])

• Thomas Wiegand ([email protected])

§ H.264/AVC literature references:

• IEEE Transactions on Circuits and Systems for Video Technology Special Issue on H.264/AVC (July 2003) [Includes several highly-referenced papers](Luthra, Sullivan, Wiegand, Eds.)

• Paper in Proceedings of IEEE Jan 2005 (Sullivan & Wiegand)

• Overview incl. FRExt: SPIE Aug 2004 (Sullivan, Topiwala, & Luthra)

• Paper at SPIE VCIP 2005: Meta-overview and deployment (Sullivan)

• Paper in IEEE Communications Magazine, Aug 2006(Marpe, Wiegand, Sullivan)

• Paper on Professional Extensions, IEEE ICIP, Sept 2007 (Sullivan et al.)

• Wikipedia H.264/MPEG-4 AVC page

• IEEE Transactions on Circuits and Systems for Video Technology Special Issue on Scalable Video Coding – Standardization and Beyond (Sept 2007)(Wiegand, Sullivan, Ohm, Luthra, Eds.)

For Further InformationFor Further Information