video coding using spatially varying transform cixun zhang, kermal ugur, jani lainema, antti...

Video Coding Using Spatially Varying Transform

Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM FOR VIDEO TECHNOLOGY, VOL. 21, NO. 2, FEBURARY 2011

Outline

• Introduction• SVT (Spatially varying transform)– Selection of SVT block-size– Selection and coding of candidate LP– Filtering of SVT block boundaries

• Implementing SVT in H.264/AVC• FSVT• Experimental Result• Conclusion

Introduction

• Why SVT?• Some drawback of H.264/AVC– Most standard doesn’t align the underlying

transform with the possible edge location.– [4] directional DCTs is proposed to improve the

efficiency for directional edges, but not efficient in vertical, horizontal, and nondirectional edges.

– Coding the entire prediction error signal may not be the best in RD tradeoff, e.g., SKIP mode

[4] B. Zeng and J. Fu, “Directional discrete cosine transforms: A new frameworkfor image coding,” IEEE Trans. Circuits, Syst. Video Technol., vol. 18, no. 3, pp. 305–313, Mar. 2008.

• Rate-Distortion– The classical method of making encoding decisions is for the

video encoder to choose the result which yields the highest quality output image. However, this has the disadvantage that the choice it makes might require more bits while giving comparatively little quality benefit. One common example of this problem is in motion estimation, [1] and in particular regarding the use of quarter pixel-precision motion estimation. Adding the extra precision to the motion of a block during motion estimation might increase quality, but in some cases that extra quality isn't worth the extra bits necessary to encode the motion vector to a higher precision.

http://en.wikipedia.org/wiki/Motion_estimation

http://en.wikipedia.org/wiki/Rate%E2%80%93distortion_optimization#cite_note-1

http://en.wikipedia.org/wiki/Qpel

http://en.wikipedia.org/wiki/Macroblock

Introduction (cont.)

• Basic idea of SVT :– Do not restrict the transform coding inside regular

block boundaries.– i.e., selecting and coding the best portion of the

prediction error to achieve coding efficiency improvement in terms of RD tradeoff.

– SVT can be considered as a special SKIP mode, part of the macroblock (Do not be coded into bitstream) is skipped


• Shifting the transform has been used in denoising.[6] – [9] (Often used in post-processing) (e.g. in-loop-filter)

• Have bad effort if applied at the boundary and to the small area (e.g. macroblock)

[6] A. Nosratinia, “Denoising JPEG images by re-application of JPEG,” in Proc. IEEE Workshop MMSP, Dec. 1998, pp. 611–615.[7] R. Samadani, A. Sundararajan, and A. Said, “Deringing and deblocking DCT compression artifacts with efficient shifted transforms,” in Proc. IEEE ICIP, Oct. 2004, pp. 1799–1802.[8] J. Katto, J. Suzuki, S. Itagaki, S. Sakaida, and K. Iguchi, “Denoising intra-coded moving pictures using motion estimation and pixel shift,” in Proc. IEEE ICASSP, Mar. 2008, pp. 1393–1396.[9] O. G. Guleryuz, “Weighted averaging for denoising with overcomplete dictionaries,” IEEE Trans. Image Process., vol. 16, no. 12, pp. 3020 – 3034, Dec. 2007.


• Proposed method has no drawback mentioned above.

• And the location parameter(LP) is coded in the bitstream for decoder to reconstruct MB.

• Drawback– High encoding complexity due to the brute force

search process to select the best LP.– Solution : FSVT

Outline



SVT

• Transform coding is widely used to decorrelate the prediction error and achieve high compression rates.

• Traditional transform coding drawback– If prediction error at fixed locations has a structure

that is not suitable for underlying transform, many high frequency transform coefficients will be generated. (more bits to code)

– Notorious visual artifacts may appear (e.g. ringing) when these coefficients get quantized.

SVT (cont.)

• What’s new is SVT:– Transform coding is not restricted inside regular

block boundary. (can be applied to any portion of the prediction error)

– The selection is due to the reduction of complexity.• This means that the position and shape of the

transform block is variable, and the information(shape and position) is signaled to the decoder.

SVT (cont.)

• Three issues of SVT– Selection of SVT block-size– Selection and coding of candidate LP– Filtering of SVT block boundaries

Selection of SVT Block-Size

• M*N SVT is applied on a selected M*N block inside a macroblock(size 16*16) and ONLY THIS BLOCK IS TRANSFORM CODED.

• (17-M)*(17-N) possible LPs.• Factors of choosing M and N– Larger M & N will result in fewer possible LPs.– Larger M & N will result in low distortion but need

more bits in coding the transform coefficient.– Larger block-size transform is more suitable for flat

areas and smaller is suitable for sharp edges.

Selection of SVT Block-Size (cont.)

• To facilitate the transform design, M = 2^m and N = 2^n.

• 4 SVT block size in this chapter : 8*8, 4*16, 16*4 and 0*0 (means SKIP mode)

• Block size can be changed according to different sequence for better performance.

Selection of SVT Block Size (cont.)

• Due to the well established variable block-size transform (VBT), variable block-size SVT is better than fixed block-size.

• Different block size issue (drawback) :– When the number of SVT become larger, the bits

need to code the LPs gains more.

Selection of SVT Block Size (cont.)

• As mentioned before, VBT can be used for SVT.

• For 8*8 SVT, transform kernel in H.264 can be used.

• For 4*16 and 16*4 SVT, 4*4 transform kernel in H.264 and 16*16 transform kernel in [14] can be used with the butterfly structure of 8*8.

[14] S. Ma and C.-C. Kuo, “High-definition video coding with supermacroblocks,” in Proc. SPIE Vis. Commun. Image Process., vol. 6508, 650816. Jan. 2007, pp. 1–12.

Selection and Coding of Candidate LPs

• When there are nonzero transform coefficient of the SVT, its location needs to be coded and transmitted.

• The best LP selected according to RDO(rate distortion optimization) [15]

[15] T. Wiegand, H. Schwarz, A. Joch, F. Kossentini, and G. J. Sullivan, “Rate-constrained coder control and comparison of video coding standards,” IEEE Trans. Circuits, Syst. Video Technol., vol. 13, no. 7, pp.688–703, Jul. 2003.

Selection and Coding of Candidate LPs (cont.)

• J is the RD-cost, D is the distortion, R is the bitrate, and is the Lgrangian multiplier.

• Note that RD-cost of the reconstruction residual for the remaining part is set to zero. (but other information might be beneficial like luminance change .etc)

• Experimentally, there are totally 58 candidate LPs, and it needs 5.73 bits to do entropy.

Selection and Coding of Candidate LPs (cont.)

• As mentioned before, 6-bit fixed length is needed for representing LP index.

• And in chroma case :

Filtering of SVT Block Boundaries

• For using SVT, deblocking process needs to be adjusted because the selected SVT block may not align with the regular block boundaries.

• Both the edges of the selected SVT block and the macroblock may be filtered.

Filtering of SVT Block Boundaries (cont.)

Outline



Implementing SVT in H.264/AVC

Implementing SVT in H.264/AVC (cont.)

• Several key parts of H.264/AVC need to be adjusted.– Macroblock types– Coded block pattern– Entropy coding– deblocking

Marcoblock Type

Coded Block Pattern

• In experiment, luma CBP is often equal to 1 in high fidelity video coding.

• Based on the observation, set the new macroblock modes to have luma CBP equal to 1.

Entropy Coding

• In H.264, CAVLC use a different coding table based on the total number of nonzero coefficients.

• For SVT, a fixed coding table is used.• In order to derive some information about the

number of nonzero coefficients in each 4*4 luma block, the following two steps are used :

Entropy Coding (cont.)• Step 1 : If luma block overlaps with a coded

block that has nonzero coefficients in the selected SVT block, then mark it to have nonzero coefficients. (Using for deblocking)

• Step 2 : The number of nonzero transform coefficient for 4*4 block is empirically set by

• And finally, distribute the total nonzero transform coefficient to the blocks that mark as having nonzero coefficient.

Deblocking

• As mentioned above in SVT chapter

Outline



FSVT (Fast Algorithms for SVT)

• The encoding complexity of SVT is higher due to the brute force search process in RDO.

• Typically, conducting transform, quantization, and entropy coding, .etc, are needed for RDO.

• The basic idea to reduce the encoding complexity is to reduce the number of LPs.

FSVT (cont.)

• There are two case : – 1. Skip testing SVT for macroblocks for which SVT

is unlikely to be useful. (by examining RD cost)– 2. The proposed fast algorithm selects LPs based

on the motion difference and utilizes a hierarchical search algorithm to select best LP.

Macroblock Level Fast Algorithm

• SVT is applied for macorblock modes only if

• Where J are the minimum RD cost without SVT coding.

• Jmode refers to RD cost of the current macroblock mode to be tested with SVT.

Macroblock Level Fast Algorithm (cont.)

• The threshold represent empirical upper limit of bitrate reduction.

Block-Level Fast Algorithm

• 1. Selection of Available Candidate LPs Based on Motion Difference

• 2. Hierarchical Search Algorithm

Selection of Available Candidate LPs

• Skip testing a candidate LP if one of the following condition is true : – 1. If that SVT block at that position overlaps with

at least two neighboring motion compensation blocks and motion vectors of these blocks are larger or equal to predefined threshold.

– 2. If the reference frames of these neighboring blocks are different.

Hierarchical Search Algorithm

• Idea : find the best LP in a relatively coarse resolution and refine the result in a finer resolution.

Step1 : Find lowest RDcost as set1, and his twoneighbors as set2

Step2 : Find best zone. A zone is available if and only if all three candidate LPs is available

Step3 : Select best LP from set1, set2, and best zone

Outline



Experimental Environment

• VBSVT and FVBSVT are performed in both HD and lower resolution video coding.

• Some coding parameter used– High Profile– QPI = 22,27,32,37 QPP = QPI + 1– CAVLC/CABAC– Frame structure IPPP– MV search range 64/32 pixels for 720p/CIF– RDO in the high complexity mode

Experimental Environment

• Intel® Core™2 Quad CPU Q6600 @2.40GHz 2G• Measure the average bitrate reduction

compared to H.264/AVC using Bjontegaard tool[20].

• Two configuration are tested– Low complexity configuration: 4*4 transform is

not used– High complexity configuration: Codec with full

usage of the tools provided in H.264

Experimental Result

Experimental Result (cont.)

Outline



Conclusion

• By varying the position of the transform block and its size, the prediction error is better localized, and coding efficiency is improved.

• The encoding complexity of SVT is relatively high because of brute force searching. (RDO)

• To deal with question above, FSVT is proposed to skip testing most of macroblock that not suitable with SVT.

video coding using spatially varying transform cixun zhang, kermal ugur, jani lainema, antti...

Documents

transform coding

varying transform video

coding efficiency improvement

motion vector

new frameworkfor image

extra precision

video encoder

ieee icassp