video coding using spatially varying transform cixun zhang, kermal ugur, jani lainema, antti...

45
Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM FOR VIDEO TECHNOLOGY, VOL. 21, NO. 2, FEBURARY 2011

Upload: dorcas-stevens

Post on 17-Jan-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Video Coding Using Spatially Varying Transform

Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM FOR VIDEO TECHNOLOGY, VOL. 21, NO. 2, FEBURARY 2011

Page 2: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Outline

• Introduction• SVT (Spatially varying transform)– Selection of SVT block-size– Selection and coding of candidate LP– Filtering of SVT block boundaries

• Implementing SVT in H.264/AVC• FSVT• Experimental Result• Conclusion

Page 3: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Introduction

• Why SVT?• Some drawback of H.264/AVC– Most standard doesn’t align the underlying

transform with the possible edge location.– [4] directional DCTs is proposed to improve the

efficiency for directional edges, but not efficient in vertical, horizontal, and nondirectional edges.

– Coding the entire prediction error signal may not be the best in RD tradeoff, e.g., SKIP mode

[4] B. Zeng and J. Fu, “Directional discrete cosine transforms: A new frameworkfor image coding,” IEEE Trans. Circuits, Syst. Video Technol., vol. 18, no. 3, pp. 305–313, Mar. 2008.

Page 4: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

• Rate-Distortion– The classical method of making encoding decisions is for the

video encoder to choose the result which yields the highest quality output image. However, this has the disadvantage that the choice it makes might require more bits while giving comparatively little quality benefit. One common example of this problem is in motion estimation, [1] and in particular regarding the use of quarter pixel-precision motion estimation. Adding the extra precision to the motion of a block during motion estimation might increase quality, but in some cases that extra quality isn't worth the extra bits necessary to encode the motion vector to a higher precision.

Page 5: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Introduction (cont.)

• Basic idea of SVT :– Do not restrict the transform coding inside regular

block boundaries.– i.e., selecting and coding the best portion of the

prediction error to achieve coding efficiency improvement in terms of RD tradeoff.

– SVT can be considered as a special SKIP mode, part of the macroblock (Do not be coded into bitstream) is skipped

Page 6: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Introduction (cont.)

• Shifting the transform has been used in denoising.[6] – [9] (Often used in post-processing) (e.g. in-loop-filter)

• Have bad effort if applied at the boundary and to the small area (e.g. macroblock)

[6] A. Nosratinia, “Denoising JPEG images by re-application of JPEG,” in Proc. IEEE Workshop MMSP, Dec. 1998, pp. 611–615.[7] R. Samadani, A. Sundararajan, and A. Said, “Deringing and deblocking DCT compression artifacts with efficient shifted transforms,” in Proc. IEEE ICIP, Oct. 2004, pp. 1799–1802.[8] J. Katto, J. Suzuki, S. Itagaki, S. Sakaida, and K. Iguchi, “Denoising intra-coded moving pictures using motion estimation and pixel shift,” in Proc. IEEE ICASSP, Mar. 2008, pp. 1393–1396.[9] O. G. Guleryuz, “Weighted averaging for denoising with overcomplete dictionaries,” IEEE Trans. Image Process., vol. 16, no. 12, pp. 3020 – 3034, Dec. 2007.

Page 7: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Introduction (cont.)

• Proposed method has no drawback mentioned above.

• And the location parameter(LP) is coded in the bitstream for decoder to reconstruct MB.

• Drawback– High encoding complexity due to the brute force

search process to select the best LP.– Solution : FSVT

Page 8: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Outline

• Introduction• SVT (Spatially varying transform)– Selection of SVT block-size– Selection and coding of candidate LP– Filtering of SVT block boundaries

• Implementing SVT in H.264/AVC• FSVT• Experimental Result• Conclusion

Page 9: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

SVT

• Transform coding is widely used to decorrelate the prediction error and achieve high compression rates.

• Traditional transform coding drawback– If prediction error at fixed locations has a structure

that is not suitable for underlying transform, many high frequency transform coefficients will be generated. (more bits to code)

– Notorious visual artifacts may appear (e.g. ringing) when these coefficients get quantized.

Page 10: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

SVT (cont.)

• What’s new is SVT:– Transform coding is not restricted inside regular

block boundary. (can be applied to any portion of the prediction error)

– The selection is due to the reduction of complexity.• This means that the position and shape of the

transform block is variable, and the information(shape and position) is signaled to the decoder.

Page 11: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

SVT (cont.)

• Three issues of SVT– Selection of SVT block-size– Selection and coding of candidate LP– Filtering of SVT block boundaries

Page 12: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Selection of SVT Block-Size

• M*N SVT is applied on a selected M*N block inside a macroblock(size 16*16) and ONLY THIS BLOCK IS TRANSFORM CODED.

• (17-M)*(17-N) possible LPs.• Factors of choosing M and N– Larger M & N will result in fewer possible LPs.– Larger M & N will result in low distortion but need

more bits in coding the transform coefficient.– Larger block-size transform is more suitable for flat

areas and smaller is suitable for sharp edges.

Page 13: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Selection of SVT Block-Size (cont.)

• To facilitate the transform design, M = 2^m and N = 2^n.

• 4 SVT block size in this chapter : 8*8, 4*16, 16*4 and 0*0 (means SKIP mode)

• Block size can be changed according to different sequence for better performance.

Page 14: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Selection of SVT Block Size (cont.)

• Due to the well established variable block-size transform (VBT), variable block-size SVT is better than fixed block-size.

• Different block size issue (drawback) :– When the number of SVT become larger, the bits

need to code the LPs gains more.

Page 15: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Selection of SVT Block Size (cont.)

• As mentioned before, VBT can be used for SVT.

• For 8*8 SVT, transform kernel in H.264 can be used.

• For 4*16 and 16*4 SVT, 4*4 transform kernel in H.264 and 16*16 transform kernel in [14] can be used with the butterfly structure of 8*8.

[14] S. Ma and C.-C. Kuo, “High-definition video coding with supermacroblocks,” in Proc. SPIE Vis. Commun. Image Process., vol. 6508, 650816. Jan. 2007, pp. 1–12.

Page 16: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Selection and Coding of Candidate LPs

• When there are nonzero transform coefficient of the SVT, its location needs to be coded and transmitted.

• The best LP selected according to RDO(rate distortion optimization) [15]

[15] T. Wiegand, H. Schwarz, A. Joch, F. Kossentini, and G. J. Sullivan, “Rate-constrained coder control and comparison of video coding standards,” IEEE Trans. Circuits, Syst. Video Technol., vol. 13, no. 7, pp.688–703, Jul. 2003.

Page 17: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Selection and Coding of Candidate LPs (cont.)

• J is the RD-cost, D is the distortion, R is the bitrate, and is the Lgrangian multiplier.

• Note that RD-cost of the reconstruction residual for the remaining part is set to zero. (but other information might be beneficial like luminance change .etc)

• Experimentally, there are totally 58 candidate LPs, and it needs 5.73 bits to do entropy.

Page 18: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Selection and Coding of Candidate LPs (cont.)

• As mentioned before, 6-bit fixed length is needed for representing LP index.

• And in chroma case :

Page 19: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Filtering of SVT Block Boundaries

• For using SVT, deblocking process needs to be adjusted because the selected SVT block may not align with the regular block boundaries.

• Both the edges of the selected SVT block and the macroblock may be filtered.

Page 20: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Filtering of SVT Block Boundaries (cont.)

Page 21: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Outline

• Introduction• SVT (Spatially varying transform)– Selection of SVT block-size– Selection and coding of candidate LP– Filtering of SVT block boundaries

• Implementing SVT in H.264/AVC• FSVT• Experimental Result• Conclusion

Page 22: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Implementing SVT in H.264/AVC

Page 23: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Implementing SVT in H.264/AVC (cont.)

• Several key parts of H.264/AVC need to be adjusted.– Macroblock types– Coded block pattern– Entropy coding– deblocking

Page 24: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Marcoblock Type

Page 25: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Coded Block Pattern

• In experiment, luma CBP is often equal to 1 in high fidelity video coding.

• Based on the observation, set the new macroblock modes to have luma CBP equal to 1.

Page 26: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Entropy Coding

• In H.264, CAVLC use a different coding table based on the total number of nonzero coefficients.

• For SVT, a fixed coding table is used.• In order to derive some information about the

number of nonzero coefficients in each 4*4 luma block, the following two steps are used :

Page 27: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Entropy Coding (cont.)• Step 1 : If luma block overlaps with a coded

block that has nonzero coefficients in the selected SVT block, then mark it to have nonzero coefficients. (Using for deblocking)

• Step 2 : The number of nonzero transform coefficient for 4*4 block is empirically set by

• And finally, distribute the total nonzero transform coefficient to the blocks that mark as having nonzero coefficient.

Page 28: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Deblocking

• As mentioned above in SVT chapter

Page 29: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Outline

• Introduction• SVT (Spatially varying transform)– Selection of SVT block-size– Selection and coding of candidate LP– Filtering of SVT block boundaries

• Implementing SVT in H.264/AVC• FSVT• Experimental Result• Conclusion

Page 30: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

FSVT (Fast Algorithms for SVT)

• The encoding complexity of SVT is higher due to the brute force search process in RDO.

• Typically, conducting transform, quantization, and entropy coding, .etc, are needed for RDO.

• The basic idea to reduce the encoding complexity is to reduce the number of LPs.

Page 31: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

FSVT (cont.)

• There are two case : – 1. Skip testing SVT for macroblocks for which SVT

is unlikely to be useful. (by examining RD cost)– 2. The proposed fast algorithm selects LPs based

on the motion difference and utilizes a hierarchical search algorithm to select best LP.

Page 32: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Macroblock Level Fast Algorithm

• SVT is applied for macorblock modes only if

• Where J are the minimum RD cost without SVT coding.

• Jmode refers to RD cost of the current macroblock mode to be tested with SVT.

Page 33: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Macroblock Level Fast Algorithm (cont.)

• The threshold represent empirical upper limit of bitrate reduction.

Page 34: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Block-Level Fast Algorithm

• 1. Selection of Available Candidate LPs Based on Motion Difference

• 2. Hierarchical Search Algorithm

Page 35: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Selection of Available Candidate LPs

• Skip testing a candidate LP if one of the following condition is true : – 1. If that SVT block at that position overlaps with

at least two neighboring motion compensation blocks and motion vectors of these blocks are larger or equal to predefined threshold.

– 2. If the reference frames of these neighboring blocks are different.

Page 36: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Hierarchical Search Algorithm

• Idea : find the best LP in a relatively coarse resolution and refine the result in a finer resolution.

Step1 : Find lowest RDcost as set1, and his twoneighbors as set2

Step2 : Find best zone. A zone is available if and only if all three candidate LPs is available

Step3 : Select best LP from set1, set2, and best zone

Page 37: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Outline

• Introduction• SVT (Spatially varying transform)– Selection of SVT block-size– Selection and coding of candidate LP– Filtering of SVT block boundaries

• Implementing SVT in H.264/AVC• FSVT• Experimental Result• Conclusion

Page 38: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Experimental Environment

• VBSVT and FVBSVT are performed in both HD and lower resolution video coding.

• Some coding parameter used– High Profile– QPI = 22,27,32,37 QPP = QPI + 1– CAVLC/CABAC– Frame structure IPPP– MV search range 64/32 pixels for 720p/CIF– RDO in the high complexity mode

Page 39: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Experimental Environment

• Intel® Core™2 Quad CPU Q6600 @2.40GHz 2G• Measure the average bitrate reduction

compared to H.264/AVC using Bjontegaard tool[20].

• Two configuration are tested– Low complexity configuration: 4*4 transform is

not used– High complexity configuration: Codec with full

usage of the tools provided in H.264

Page 40: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Experimental Result

Page 41: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Experimental Result (cont.)

Page 42: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Experimental Result (cont.)

Page 43: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Experimental Result (cont.)

Page 44: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Outline

• Introduction• SVT (Spatially varying transform)– Selection of SVT block-size– Selection and coding of candidate LP– Filtering of SVT block boundaries

• Implementing SVT in H.264/AVC• FSVT• Experimental Result• Conclusion

Page 45: Video Coding Using Spatially Varying Transform Cixun Zhang, Kermal Ugur, Jani Lainema, Antti Hallapuro and Moncef IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM

Conclusion

• By varying the position of the transform block and its size, the prediction error is better localized, and coding efficiency is improved.

• The encoding complexity of SVT is relatively high because of brute force searching. (RDO)

• To deal with question above, FSVT is proposed to skip testing most of macroblock that not suitable with SVT.