January 22, 2014 Sam Siewert
Computer and Machine Vision
Deeper Dive into MPEG
Digital Video Encoding
Reminders
CV and MV Use UNCOMPRESSED FRAMES
Remote Cameras (E.g. Security) May Need to Transport
Frames Capture Over Network to CV/MV Processor
We NEED to Understand Both!
BEWARE of LOSSY COMPRESSION
I-Frame ONLY or MJPEG Decent Compromise of Both
Sam Siewert 2
MPEG: Order Of Operators
Sam Siewert 3
#1: POINT (Pixel) Encoding
#2 A-C: Macro-Block Lossy Intra-Frame Compression
#3: Motion-Based Compression in Group of Pictures
#1
#2A
#2B
#2C #3
Sam Siewert 4
Step #1 – RGB to YCrCb 4:4:4 24-bit
(Lossless) For every Y sample in a scan-line, there is also one CrCb
sample
– Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits
– No compression between RGB and YCrCb 4:4:4 (both 24 bits/pixel)
Typically a Post Production, CEDIA or DCI format
… 0 319
… 76,480 76,799
…
= Y, Cr, and Cb sample = Y sample only
48 bit to 32 bit
Sam Siewert 5
Step #1 – RGB to YCrCb 4:2:2 (Lossy) For every 2 Y samples in a scan-line, one CrCb sample
– Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits
– Two RGB Pixels = 48 bits, Whereas Two YCrCb is 32 bits, or 16
bits per pixel vs. 24 bits per pixel (33% smaller frame size)
… 0 319
… 76,480 76,799
…
= Y, Cr, and Cb sample = Y sample only
Sam Siewert 6
Step #1 – RGB to YCrCb 4:2:0 (Lossy) For every 4 Y samples in a scan-line, one CrCb sample
– Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits
– Two RGB Pixes = 48 bits, Whereas Four YCrCb is 48 bits, or 12
bits per pixel on average vs. 24 bits per pixel (50% smaller)
… 0 319
…
76,480 76,799
…
= Cr, Cb sample = Y sample only
Step #2 – Convert to 8x8 Macroblocks
and Transform Aspect Ratios Designed to Fit 8x8 Macroblock
E.g. 640 x 480 => 80 x 60 Macroblocks
Discrete Cosine Transform Applied to Each 8x8
– Spatial Intensity to Frequency Transform
– Applied on X Axis (Row)
– Applied on Y Axis (Column)
Set up for Intra-frame (I-frame) Compression
Sam Siewert 7
Convolution Concepts Math operation on 2 functions, that produces a 3rd
Point Spread Function “Sharpen” meets this Definition
So do Many Mask Operations applied to Pixel Neighborhoods
Sam Siewert 8
2 impulses, f(t), g(X – t)
Area inside intersection
f convolved with g over t
DCT – Discrete Cosine Transform Convolution of Image with Discrete Cosine
See http://www.cse.uaa.alaska.edu/~ssiewert/a490dmis_code/example-dct1/
De-convolved to restore image from Convolved Image
Sam Siewert 9
DCT
Inverse DCT
DCT Concepts
F(x) is a sum of sinusoids (with frequency, amplitude)
DCT operates of a discrete number of samples
Can derive DC sum at any x, even where F(x) not known
N x N Macro-block has Zero Frequency DC at 0,0
Increasing Horizontal Frequency
Increasing Vertical Frequency
Can De-convolve (inverse DCT, or iDCT)
Can Eliminate High Frequency Horizontal and Vertical
Terms
– Minimal Losses from Truncation (otherwise lossless)
– Loss of High Frequency Image Features (What are These?)
Sam Siewert 10
Basic Concept of Waveforms
Complex Waveform is Sum of Simple Fundamentals
Simple Fundamentals Can Be Derived from Complex
Sam Siewert 11
Scanline DCT Example Small Losses Due to DCT, iDCT Numerical Truncation
Larger Losses Due to H.O.T. Quantization and Truncation
http://www.cse.uaa.alaska.edu/~ssiewert/a490dmis_doc/1D-DCT-N-Fundamentals.xlsx
Sam Siewert 12
What Is Lost with DCT Quantization? Noise More Than Anything Else
Complex XY Variable Patterns (Real Science Data?)
Sam Siewert 13
Complex Tiling
Higher Frequency X
Higher Frequency Y
Terms Can Still be Ignored
Complex Wood Texture
Most Detail in X
Far Less in Y
Randomized Texture Image
High X Detail
High Y Detail
Most Loss of Detail, But Noisy
Step #2A: Macro-block Discrete Cosine
Transform
8x8 Pixel Block – Macro-block
– SD NTSC 720x480 (90x60 Macro-blocks), 3:2 Aspect Ratio
– HD 720 1280x720 (160x90 Macro-blocks), 16:9 AR
– HD 1080 1920x1080 (240x135 Macro-blocks), 16:9 AR
Sam Siewert 14
Step #2B: Macro-block Quantization (Lossy)
Apply Weighting and Scaling 8x8 to DCT
Produces Lots of Repeated Values (and Zeros)
Compared to Original
Sam Siewert 15
Decode Process for #2A-B
Sam Siewert 16
How Lossy is the Decode Macro-
Block?
Sam Siewert 17
OpenCV Macroblock DCT Example
Same Cactus 320x240 with 80x80 DCT Macroblocks
Sam Siewert 18
DCT iDCT
Same Cactus 320x240 Again with 8x8 DCT Macroblocks
DCT iDCT
Mathematics for 2D DCT Frequency Variation on X and Y axes from top left to bottom right
Straight-forward Algorithm Based on 2D Equation is O(n2) per dimension
Like Cooley-Tukey for DFT, a DCT Algorithm that is O(n*log2(n)) has been formulated (Arai, Y.; Agui, T.; Nakajima, M. - Numerical Recipes: The Art of Scientific Computing (3rd ed.))
http://www.cse.uaa.alaska.edu/~ssiewert/a490dmis_code/dct2/dct2.c
Sam Siewert 19
http://en.wikipedia.org/wiki/File:Dctjpeg.png
Step #2C: Macro-block Run-Length and
Huffman Encoding
Zig-Zag Run-Length Encoding to Exploit Repeated Data
and Zeros found in H.O.T. of Quantized DCT
– 86, 1, 7, -5, -1, 0, 1, 0, 0, 2, -1, 1, 0, -1, 0 , 0, 0, 0, -1, 0, 0, …
Becomes:
Sam Siewert 20
Huffman Applied to RLE Data
Huffman Tables for MPEG-2 Macro-Blocks Defined in
13818-2 (Lossless)
Compression Based on Probability of Occurance
Shannon’s Source Coding Theory: log2(P), P=probability
of occurrence, Binary encoding of Symbols
Sam Siewert 21
Step #3: Group of Pictures Concept – Transmit Change-Only Data
I-Frame Compressed Only Intra-Frame
By Methods #2A-2C to Macro-Blocks
I-Frame Can Be Decoded Alone
P-Frame is Differences Only Over the
GoP
B-Frame is Differences Only Between
Both I-Frame and Closest P-Frame
Difference Data Can be Further
Encoded with Lossless Methods
Without Steps 2A-C, Specifically
Quantization, and With High Motion
Video, Could Blow-Up
Sam Siewert 22
Group of Pictures: High Level View
Sam Siewert 23
Overall MPEG YCrCb Compression
Performance Standard Definition 720x480x2 (675KB/frame) @ 30fps
– Requires 20MB/sec (200 Mbps) Uncompressed
– Typical MPEG-2 @ 3.75 Mbps, > 50x Compression
– Typical MPEG-4 @ 1.5 Mbps, > 100x Compression
– 10 to 20 Programs on QAM 256 (48Mbps, 6MhZ/Ch)
– ≈10 MPEG-4 Programs on ATSC 8VSB (19.39 Mbps, 6MhZ/Ch)
HD 720p (1280x720x2,1800KB/frame) @ 30fps
– Requires 53MB/sec (530Mbps) Uncompressed
– Typical MPEG-2 @ 20 Mbps, > 25x Compression
– Typical MPEG-4 @ 10 Mbps, > 50x Compression
HD 1080p (1920x1080x2, 4050KB/frame) @ 30fps
– Requires 120MB/sec (1200Mbps) Uncompressed
– Typical MPEG-2, VC-1 @ 45 Mbps, > 30x Compression
– Typical MPEG-4 @ 20 Mbps, > 60x Compression
Sam Siewert 24
Parsing an Elementary Video Stream
Sam Siewert 25
Many 188-Byte Packet Types and Header
Allows for Multi-plexing of many Video and Audio
Streams on a Carrier
MPEG-4 vs. MPEG-2
MPEG-2 – Defined by ISO 13818-1, 13818-2 – Leverages MPEG-1 (Motion Picture Experts Group – 1988)
– Widely Used for Digital Video – Digital Cable TV, DVD
– Transport Stream designed for Broadcast (Lossy, No Beginning or End of Stream)
ATSC – Advanced Television Systems Committee (HDTV Broadcast) – 8VSB Modulation – 8 level Vestigal Sideband Modulation, 6MhZ channel, 19.39
Mbps, Reed-Solomon Error Correction
– Up to 1080p (1920x1080) Video Resolution
– AC-3 (Dolby) Audio
DVB – Digital Video Broadcast (Europe, Satellite)
– Program Stream designed for Playback Media (DVD, Flash, HDD, etc.)
MPEG-4 – Defined by ISO 14496 (1998) – Leverages MPEG-2 Standards for Program/Transport, Encode/Decode
– Better Compression Rates (improved motion prediction for P,B frames), MPEG-4 Part-10 (H.264), e.g. Blu-Ray
– Extensions for Digital Rights Management
– Advanced Audio Encoding
– Becoming More Widely Deployed for HD and Because of Lower Bit-Rate Transport Streams
Sam Siewert 26