A Nonlinear Loop Filter for Quantization Noise Removal in
Hybrid Video Compression
Onur G. Guleryuz
DoCoMo USA [email protected]
2
Overview
• A loop filter based on denoising with an over-complete bank of
transforms.• Combats all types of quantization noise (blocking, ringing, ...).• Applicable to block or lapped compression transforms.• Consistently better than h.264 loop filter in rate-distortion.• Good visual quality, especially around edges and high frequency
regions.• Advanced signal processing ingredients in a tight package.• Hardware friendly complexity.
3
MC Prediction
In the loop, nonlinear filtering of video: Encoder Side
Encoder
1z
+
+Currentframe
+
-+
+
Previously decoded frame
Coded differential
Nonlinear Denoising
Filter
4
MC Prediction
Decoder side
Decoder
1z
++
+
Previously decoded frame
Decoded differential
Nonlinear Denoising
Filter
Display
5
Objective of the Loop Filter
Nonlinear Denoising
Filter
: Tries to make the decoded frame as close to original as possible Quantization noise removal.
(In a rate-distortion sense)
6
Nonlinear Denoising
Filter
The nonlinear denoising filter adapts to nonstationary image statistics using localized linear transforms and hard-thresholding.
(The filter automatically becomes high-pass, or low-pass, or band-pass, etc., depending on the region it is operating on)Signal processing ingredients:
•Statistically sound (cross-correlation robust) denoising.•Overcomplete, translation invariant, transforms.•Thresholding strategy based on conditional expectations.•Weighted denoising.•Compression mode based decisions.
Properties of the Proposed Filter
7
Transform 1 Transform 2 Transform 4Transform 3
Transform 5
Transform 13 Transform 16
What are translation invariant transforms?E.g., Translation invariant 4x4 DCTs:
8
•Get , and evaluate translation invariant transforms ( ).
•Denoise the resulting coefficients using per-coefficient thresholds
and a denoising rule.
•Compute a weighted inverse for a denoised estimate.
•Use the computed estimate to modify the denoising rule, and re-do
everything with the new rule.
•Put everything in a tight package so that complexity is manageable.
Main Idea
Suppose all images and transform coefficients are arranged into vectors
x : Original frame y :Decoded frame
161,...,TTy
9
Coefficient denoising
Weighted inverse
Masking
161,...,dd
1,161,1 ,..., kk ff1,161,1 ,..., kk
kk cc ,16,1 ˆ,...,ˆ
ku
kx
Transform
161,...,TT
y161,...,dd ),...,(),...,( 1610,160,1 ddff
Transform
161,...,TT
)1( k
(a) Executed for k=0.
(b) Executed for k=1,2.
Algorithm Flow
10
provide sparse decompositions for . 161,...,TT x
For most pixels we will have at least one of the transforms providing sparsity.
Ingredient: Translation Invariant Transforms
We assume
11
Transform 1Transform 7
E.g., piecewise smooth image with an edge
Sparse DCT block (many small coefficients inside the block). High performance denoising.
Non-sparse DCT block (many large coefficients inside the block). Low performance denoising.
Legend:
12
Ingredient: Cross-Correlation Robust Coefficient Denoising
Rule
Onur G. Guleryuz, ``Linear, Worst-Case Estimators for Denoising Quantization Noise in Transform Coded Images,'' accepted, IEEE Transactions on Image Processing
•In removing quantization noise, the additive “noise” is correlated with data. (Cannot pretend that noise is independent.)•These cross correlations are unknown.•Must use cross-correlation robust techniques (We use techniques that are optimal for the worst-case cross correlation.)
Transform
161,...,TT
y161,...,ddTransform
161,...,TT
x 161,...,cc
13
Coefficient Denoising Rule
otherwise
jjfjdjc kikii
ki 0
)(|)(|)()(ˆ 1,1,
,
Estimate using)( jci )( jdi(Please ask the presenter or check the paper for what “condition1” is in general)
Thresholding rule:
min-max optimal estimate
(Per-coefficient thresholds)
(Since denoising transforms are block, thresholds are actually per-block. Note that denoising trf. blocks are different from coded-blocks)
14
Ingredient: Weighted Inverse for High Performance Around
Edges
Onur G. Guleryuz, ``Weighted Overcomplete Denoising,'‘ Proc. Asilomar Conference on Signals and Systems, Pacific Grove, CA, Nov. 2003
Transform 1 Transform 7
pixel n
...)](ˆ[)(...)](ˆ[)()(ˆ ,71
77,11
11 ncTnncTnnu kkk
).()( 17 nn
16
1
1)(,0)(i
ii nn
The denoised block using transform 7 should contribute more at pixel n, i.e.,
(Weighted denoising becomes cross-cor. robust when the weights are constrained to sum to one.)
(Weight of each denoising block ~ 1/(16-number_of_zero_coeffs).)
15
Ingredient: Compression Mode Based Decisions
Video frame
•Intra coded-block•P coded-block with quantized data•P coded-block with no quantized data, but with a motion difference.•P coded-block with no quantized data, and no motion difference....
•Similar to h.264 loop filter , different types of coded-block boundaries undergo denoising with different “strengths”.•Modes determine per-coefficient (per denoising-block) thresholds.
(Rationale: Different types of coded-blocks have different quantization error statistics. We also do not want to filter what we have filtered before. ...
16
Vertical coded-block boundary
Example set of pixels influenced by the vertical boundary, forming a one pixel thick shell.
Masking
(Masking is useful at coded-block boundaries where there are no non-zero coefficients transmitted, i.e., cases of coded-block boundaries between motion-only coded-blocks.)
•Adjust thickness of shells based on coded-block modes. mask(n)=1 inside the shell, mask(n)=0 outside.
•Modes determine shell thickness.
)())(1()(ˆ)()(ˆ nynmasknunmasknx kk
17
Results•We incorporated this work into h.264 reference software (JM9.5) to generate rate-distortion results on video sequences. We also provide INTRA-only results for comparion.
h.264* : JM9.5 with loop filtering disabled.h.264 : JM9.5 with h.264 loop filter.Proposed : JM9.5 with denoising loop filter.
Default encoder.cfg, except adaptive rounding is on. All sequences QCIF
(Adaptive rounding gives the best R-D performance for all three codecs. The proposed filter is consistently better than h.264. Q: Why all QCIF? A: DoCoMo. Results are typical for CIF as well. Proposed provides ~ %10 improvements at typical bit-rates. How the encoder quantizes matters, please ask the presenter why. Please check the paper to see comparison against a windows-media type loop filter.)
18
Rate-Distortion Performance
102
30
31
32
33
34
35
36
37
38
39
40
41
42
43
Dis
tort
ion
(P
SN
R)
Rate (kbits/sec)
foreman (300 frames, IPP..., JM9.5, adaptive rounding on, default encoder.cfg)
New Loop
h264 Looph264*
%10 Improvement mark
Foreman
19
Rate-Distortion Performance
Silence
102
30
31
32
33
34
35
36
37
38
39
40
41
42
Dis
tort
ion
(PS
NR
)
Rate (kbits/sec)
silence (300 frames, IPP..., JM9.5, adaptive rounding on, default encoder.cfg
New Loop
h264 Looph264*
%10 Improvement mark
20
Rate-Distortion Performance
102
34
35
36
37
38
39
40
41
42
43
44
45
46
Dis
tort
ion
(P
SN
R)
Rate (kbits/sec)
sleepy (450 frames, IPP..., JM9.5, adaptive rounding on, default encoder.cfg)
New Loop
h264 Looph264*
%10 Improvement mark
Sleepy
21
102
30
31
32
33
34
35
36
37
38
39
40
41
42
43
Dis
tort
ion
(P
SN
R)
Rate (kbits/sec)
car (450 frames, IPP..., JM9.5, adaptive rounding on, default encoder.cfg)
New Loop
h264 Looph264*
%10 Improvement mark
Rate-Distortion Performance
Car
22
INTRA-only
Car - QP 20 24 28 32 36
h264* (dB) 44.08 40.64 37.40 34.00 30.92
h264 -.01 +.09 +.14 +.21 +.29
Proposed +.30 +.44 +.53 +.54 +.55
Foreman - QP
20 24 28 32 36
h264* (dB) 43.72 40.21 37.13 33.92 30.98
h264 -.06 +.04 +.07 +.16 +.28
Proposed +.26 +.39 +.53 +.57 +.57
(h264 and Proposed rows are dB improvements over h264*)
Sleepy - QP 20 24 28 32 36
h264* (dB) 46.68 43.51 40.64 37.57 34.69
h264 +.02 +.16 +.21 +.32 +.40
Proposed +.30 +.52 +.62 +.67 +.68
Silence - QP 20 24 28 32 36
h264* (dB) 43.57 39.89 36.72 33.53 30.77
h264 -.02 +.05 +.01 +.11 +.27
Proposed +.22 +.34 +.41 +.49 +.58
23
Visual Quality
(Visual quality results around singularities are hard to see on printouts. Please ask the presenter to show visual quality results on notebook screen.)
•Very good visual quality especially around edges/singularities.
24
•All integer (32-bit) algorithm with fast transforms (4x4-DCT) and
fast overcomplete transforms. Bit-precise results (all devices that implement this algorithm generate exactly the same results).
•Faster no multiplies algorithm possible (Hadamard + shift-weights).
•Time complexity about 20% of our proprietary decoder (this decoder similar in complexity to windows media 9).
•Hardware friendly algorithm. Conditional code executions not
required – worst case filtering complexity is for INTRA frames (this
is less than 10% of h.264* INTRA encoding).
•Please ask the presenter for operation count.
Complexity and Implementation Issues
25
Conclusion• Combats all types of quantization noise (blocking, ringing, ...).• ~ %10 improvements in bit-rate at typical bit-rates/sequences.• Applicable to all (I,P,B,*) frame types.• ~ 0.5 dB improvements on I Frames (lap or block transform
compressed) at typical bit-rates/sequences.• Good visual quality, especially around edges and high frequency
regions.• Significant improvements are when video contains non-smooth
regions, rapid scene changes, ... ( ~ whenever there are a significant
number of non-zero quantized transform coefficients).• Hardware friendly.