nishit samirbhai shah electrical engineering graduate student the university of texas at arlington,...

Nishit Samirbhai Shah Electrical Engineering Graduate Student The University of Texas at Arlington, 2015

Supervising Professor

Dr. K. R. Rao , EE Dept, UTA Committee Members

Dr. Russell Howard, EE Dept, UTA Dr. William Dillon , EE Dept, UTA

REDUCING ENCODER COMPLEXITY OF INTRA-MODE DECISION USING CU EARLY TERMINATION ALGORITHM

Need of Video compression Evolution of Video Codecs Why is complexity reduction important in

HEVC/H.265? Introduction to HEVC HEVC encoder Overview of HEVC intra coding Coding structure in HEVC Proposed algorithm Experimental Conditions and Results Conclusions Future research Acronyms References

OUTLINE

• Increase in use of High resolution video streaming over internet. Eg: YouTube.

• Eliminate large storage requirements for multimedia data.

• Reduce time interval to transmit compressed video or image file over limited bandwidth

NEED OF VIDEO COMPRESSION

EVOLUTION OF VIDEO CODECS

WHY IS COMPLEXITY REDUCTION IMPORTANT IN HEVC/H.265?

• HEVC/H.265 has very efficient compression methods, to achieve highly efficient compression, the computational cost associated with it is also very high.

• This is the reason why, these increased compression efficiencies cannot be exploited across all application domains. Resource constrained devices such as cell phones and other embedded systems use simple encoders or simpler profiles of the codec to tradeoff compression efficiency and quality for reduced complexity [3].

• The encoding process or the process of producing a standard compliant video is not specified. This approach leaves room for innovation in the encoding algorithm development.

INTRODUCTION TO HEVC

• Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T ISO/IEC started its work on a new standard for High Efficiency Video Coding (HEVC).

• HEVC [1][11][16][17] implements the same hybrid approach as H.264 [27] which includes both temporal and spatial predictions.

• HEVC divides the image into varying block sizes up to 64x64 pixels as compared to 16x16 pixels in H.264.

• It aims at 50% compression gain over H.264 while maintaining similar video quality

HEVC ENCODER

OVERVIEW OF HEVC INTRA CODING

• HEVC contains several elements for improving the efficiency of intra prediction over earlier approaches.

• The introduced methods can model accurately different structures as well as smooth regions with gradually changing sample values.

• HEVC introduces 33 angular prediction modes along with planar and DC prediction modes.

• In HEVC there are four effective intra prediction block sizes ranging from 4 by 4 to 32 by 32 samples, each of which supports 33 distinct prediction directions.

CODING STRUCTURE IN HEVC To start with the encoding process one needs to know the quad-tree in detail. CTU as it is the root of quad-tree. CTU is made of a luma coding tree block (CTB), two chroma CTB and corresponding quad-

tree syntax, where the luma CTB is a block of size NxN and chroma CTBs are of size (N/2)x(N/2) for a 4:2:0 format. N is chosen inside the bit stream and can be 16, 32 or 64.

The size of CTB is largest supported size of coding block (CB). CTB may contain one or more coding units (CU).

CU has an associated partitioning into prediction units (PUs) and transform units (TUs). The coding mode, intra or inter prediction, is selected at CU level. [7], [11], [12].

CODING STRUCTURE IN HEVC (CONT.)

CODING STRUCTURE IN HEVC

Prediction Unit and Transform Unit CU has an

associated partitioning into prediction units (PUs) and transform units (TUs).

SLICES AND TILES [1] Slices and tiles are used in coding of predicted frames.

A slice can be defined as a group of CTUs in an independent slice segment and all dependent slice segments. A slice segment can be defined as group of CTUs ordered consequently in a tile scan and contained in a NAL unit.

A tile can be defined as a rectangular region containing a group of CTUs in a CTB raster scan.

CODING STRUCTURE IN HEVC [34]

Pictorial representations of various block divisions for HEVC in a Frame

8

ANGULAR INTRA PREDICTION [1]

0-5-10-15-20-25-30

-30

-25

-20

-15

-10

-5

0

5 10 15 20 25 30

5

10

15

20

25

30

Figure12: 33 Intra prediction directions [12].

FIGURE 14: INTRA PREDICTION ANGULAR ORIENTATION AND EXAMPLE OF DIRECTION MODE 29 [1].

TABLE 1: PU SIZES AND CORRESPONDING NUMBER OF INTRA PREDICTIONS [12]

Prediction

size

Number of

intra

prediction

64x64 4

32x32 35

16x16 35

8x8 35

4x4 18

Proposed Algorithm

PROPOSED ALGORITHMStep 1 : CU early termination

When the CU texture is complex the CU is split into smaller sub units to find the best size and when the CU texture is flat, the CU is not divided further into sub – units. This has already been proved [12].

Down-sampling method is exploited by applying 2:1 down sampling filter by a simple average operator to the current CU and other CUs have the similar operation.

• After the down sampling, the complexity of the original LCU can be calculated by the following. formula.

where Ecom represents the texture complexity, N is the size of the current CU, p(i, j) is the pixel and (i, j) is the coordinate in CU. • Depending on the texture calculation, two thresholds are

set with a tradeoff on coding quality and complexity reduction as Thres1 and Thres2 (Thres2 > Thres1). The CU is split when the complexity is greater than Thres1 and when complexity is less than Thres2, the CU is not split further. If the complexity is between the Thres1 and Thres2, HEVC reference software is referred [4].

Step 2 : Priority of TU Size in Mode Decision

The HM 16.0 depth-first RQT decision process adopts the top-down search order. It always starts from the maximum admissible size TU in a CU leaf and evaluates the possibility of further partitions. But from the earlier discussions, we conclude that the 32×32 TU does not offer as much rate-distortion efficiency as the smaller TUs.

Thus, we force the 32x32 TU to be the last candidate in the mode search procedure if it appears. By doing this, we are able to save computing power by skipping unnecessary TU partition evaluation

• Depending on the size of leaf CU, there are two paths can be chosen. As the leaf CU size is smaller than 32x32, the left path in Fig is chosen, which recursively evaluates whether to further partition TU into subTUs.

• This procedure is denoted as “TU Split” decision process.\

• On the other hand, when the leaf CU size is 64x64 or 32x32, the 16x16 TU partition is first evaluated as showed by the right path.

• This path evaluates the 32x32 TU mode in the last This final step is denoted as “TU Merge” since this process determines whether the smaller subTUs should be merged into a larger 32x32 TU or not. Furthermore, two early termination schemes are proposed for both TU Merge and TU Split steps, respectively.

Experimental results

EXPERIMENTAL CONDITIONS AND RESULTS

Intra main profile is used for coding with all intra and frame rate set at 30 fps.

The proposed algorithm is evaluated with 4 QPs of 22, 27, 32 and 37 using following test sequences recommended by JCT-VC [35].

No. Sequence Name Resolution Type No. of frames

1. RaceHorses 416x240 WQVGA 30

2. BasketballDrillText 832x480 WVGA 30

3. KristenAndSara 1280x720 SD 30

4. BasketBallDrive 1920x1080 HD 30

5. PeopleOnStreet 2560x1600 WQHD 30

EXPERIMENTAL CONDITIONS AND RESULTS

22 27 32 37QP

7000

8000

9000

10000

11000

12000

13000

14000

15000

16000

14257.85

12434.75

11125.88

10390.1910865.39

9590.14

8742.52

7993.58

Peopleonstreet-WQHD-30Frames

originalproposed

enco

ding

tim

e (s

ec)

Encoding time gain 2

EXPERIMENTAL CONDITIONS AND RESULTS BD-PSNR Loss 2

22 27 32 37QP

-0.45

-0.43

-0.41

-0.39

-0.37

-0.35

-0.33

-0.31

-0.29

-0.27

-0.25

-0.29

-0.37

-0.41

-0.32


original vs Proposed

BD-P

SNR

(dB)

EXPERIMENTAL CONDITIONS AND RESULTS BD-Bitrate Increase 1

22 27 32 37QP

7

7.5

8

8.5

9

9.5

10

9.1

7.4

8.6

7.9

RaceHorses-WQVGA-30Frames


BD-B

itrat

e (k

bps)

22 27 32 37QP

88.5

99.510

10.511

11.512

8.6

9.7

11.6

9.3

BasketBallDrillText-WVGA-30Frames


BD-B

itrat

e (k

bps)

22 27 32 37QP

88.5

99.510

10.511

11.512

12.513

12.98

9.04 8.91

8.01

KristenAndSara-SD-30Frames


BD-B

itrat

e (k

bps)

22 27 32 37QP

8

8.5

9

9.5

10

10.5

11

10.52

8.43

9.11

9.43

BasketBallDrive-HD-30Frames


BD-B

itrat

e (k

bps)

EXPERIMENTAL CONDITIONS AND RESULTS BD-Bitrate Increase 2

22 27 32 37QP

7

7.5

8

8.5

9

9.5

10

10.5

11

11.5

12

7.7

9.6

11.8

9.9



BD-B

itrat

e (k

bps)

EXPERIMENTAL CONDITIONS AND RESULTS PSNR vs. Bitrate 2

22 27 32 37QP

0

2000

4000

6000

8000

10000

12000

1400012702.75

7362.6

4181.77

2432.9

12821.67

7482.83

4379.43

2444.74


originalproposed

enco

ded

bits

trea

m si

ze (K

B)

EXPERIMENTAL CONDITIONS AND RESULTS Encoded Bitstream size 2

EXPERIMENTAL CONDITIONS AND RESULTSPercentage Decrease in Encoding Time 1

1 2 3 4

-15.88

-13.88

-11.88

-9.88

-7.88

-5.88

-3.88

-1.88

-15.25

-12.03

-16.50

-12.60

RaceHorses-30frames-WQVGA

Original vs Proposed

% im

prov

emen

t in

enc

odin

g ti

me

22 27 32 37QP

-20

-19

-18

-17

-16

-15

-14

-13

-19.17

-16.43

-13.03-13.42

BasketBallDrillText-WVGA-30Frames


% im

prov

emen

t in

enco

ding

tim

e

22 27 32 37QP

-23

-22

-21

-20

-19

-18

-17

-16

-21.30 -21.58

-17.08

-22.63

KristenAndSara-SD-30Frames


% i

mpr

ovem

ent i

n en

codi

ng ti

me

22 27 32 37QP

-23

-22.5

-22

-21.5

-21

-20.5

-20

-19.5

-19

-22.23

-20.23

-22.48

-20.69

BasketBallDrive-HD-30Frames


% im

prov

emen

t in

enco

ding

tim

e

22 27 32 37QP

-24

-23.5

-23

-22.5

-22

-21.5

-21

-20.5

-20

-23.79

-22.88

-21.42

-23.07



% im

prov

emen

t in

enco

ding

tim

e

EXPERIMENTAL CONDITIONS AND RESULTSPercentage Decrease in Encoding Time 2

CONCLUSION Fast Intra Mode Decision algorithm is proposed to

reduce the computational complexity of the HEVC intra encoder.

Experimental results on different video sequences and comparison with original HM16.0 [4] indicate that the algorithm used achieves faster encoding time with a negligible loss in video quality.As shown below: Encoding time : 12-24%reduction PSNR Loss: only 0.29 dB to 0.51 dB Bitrate increase: only 8 kbps – 13kbps Bit stream Size gain: only 1% - 5%

FUTURE WORK There are many other ways to explore in the CU splitting algorithm

and the TU mode decision in the intra prediction area as suggested in [25] [33]. Many of these methods can be combined with this method, or if needed, one method may be replaced by a new method and encoding time gains can be explored.

Similar algorithms can be developed for fast inter-prediction in which the RD cost of the different modes in inter-prediction are explored, and depending upon the adaptive threshold [34], mode decision can be terminated resulting in less encoding time and reduced complexity combining with the above proposed algorithm.

Tan et al [37] proposed a fast RQT algorithm for both intra and inter mode coding in order to reduce the encoder complexity. In [37], for all intra case, 13% encoding time can be saved, However, BD-Rate just increases by 0.1%. For random access and low delay constraints it reduces by up to 9% encoding time with 0.3% BD-Rate performance degradation. This method can be integrated with the proposed algorithm to decrease the encoding time.

FUTURE WORK Another fact of encoding is CU size decisions which are the leaf

nodes of the encoding process in the quadtree. Bayesian decision rule can be applied to calculate the CU size and then this information can be combined with the proposed method to achieve further encoding time gains. [24]

Complexity reduction can also be achieved through hardware implementation of a specific algorithm which requires much computation. The FPGA implementation can be useful to evaluate the performance of the system on hardware in terms of power consumption and encoding time.

REFERENCES[1] G.J. Sullivan et al, “Overview of the high efficiency video coding (HEVC) standard”, IEEE Trans. CSVT, vol. 22, pp.1649-1668, Dec.2012. [2] C.C.Chi et al, “Parallel scalability and efficiency of HEVC parallelization approaches”, IEEE Trans. CSVT, vol. 22, pp.1827-1838, Dec.2012. [3] J. Lainema et al,”Intra coding of the HEVC standard”, IEEE Trans. CSVT, vol.22,PP.1792-1801, Dec.2012. [4] F. Bossen et al, “HEVC complexity and implementation analysis”, IEEE Trans. CSVT, vol. 22, pp.1685-1696, Dec.2012. [5] P. Hanhart et al, “Subjective quality evaluation of the upcoming HEVC video compression standard” SPIE Applications of digital image processing XXXV, vol.8499, pp.8499-30, Aug.2012. [6] J.-R Ohm, et al, “Comparison of the coding efficiency of video coding standards-Including high efficiency video coding (HEVC)” , IEEE Trans. CSVT , vol.22, pp.1669-1684, Dec.2012. [7] X. Zhang, S. Liu and S. Lei, ”Intra mode coding in HEVC standard”, VisualCommunications and Image Processing, VCIP 2012, pp. 1-6, San Diego, CA, Nov.2012. [8] Y.Duan, “An optimized real time multi-thread HEVC decoder”, Visual communicationsand image processing, VCIP 2012, San Diego, CA, Nov.2012. [9] G. Correa et al, “Performance and computational complexity assessment of high efficiency video encoders”, IEEE Trans. CSVT, vol.22, pp.1899-1909, Dec.2012.

REFERENCES[10] A.Saxena, F. Fernandes and Y. Reznik, ”Fast transforms for intra-prediction-based image and video coding,” in Proc. IEEE Data Compression Conference (DCC’13), pp.13-22, Snowbird, UT, March 2013. [11] T.L Silva et al, ”HEVC intra coding acceleration based on tree inter-level mode correlation”, SPA 2013 Poznan, Poland. Sept. 2013. [12] A. Saxena and F. Fernanades, “Mode dependent DCT/DST for intra prediction in block based image/video coding”, IEEE ICIP, pp. 1685-1688, Sept. 2011. [13] H. Zhang and Z. Ma, ”Fast intra prediction for high efficiency video coding ”, 13th PacificRim Conf. on Multimedia, PCM2012, Singapore, vol.7674 LNCS, PP. 568-577, 4-6 Dec. 2012. [14] M. Zhang, C. Zhao and J. Xu, ”An adaptive fast intra mode decision in HEVC ”, IEEEICIP 2012, pp.221-224, Orlando, FL, Sept.- Oct.2012. [15] K. Chen et al, ”Efficient SIMD optimization of HEVC encoder over X86 processors”,APSIPA, pp. 1732-1745, Los Angeles, CA, Dec. 2012. [16] Y. Kim et al, “A fast intra-prediction method in HEVC using rate-distortion estimationbased on Hadamard transform”, ETRI Journal, vol.35, #2, pp.270-280, Apr.2013. [17] T. Wiegand et al., ”Overview of the H.264/AVC Video Coding Standard”, IEEE Trans.CSVT., vol. 13, no. 7, pp. 560-576, July 2003. [18] M. Khan et al, “An adaptive complexity reduction scheme with fast prediction unit decision for HEVC Intra encoding”, IEEE ICIP, pp. 1578-1582, Sept. 2013. [19] S.-W. Teng, H.-M. Hang and Y.-F. Chen. "Fast mode decision algorithm for residual quadtree coding in HEVC." In Visual Communications and Image Processing (VCIP), 2011 IEEE, pp. 1-4, 2011.

REFERENCES[20] S. Vasudevan and K. R. Rao “Combination method of fast HEVC encoding” IEEEECTICON 2014, Korat, Thailand, May 2014. [21] H. Li, B. Li and J. Xu, “Rate distortion optimized reference picture management for high efficiency video coding”, IEEE Trans. CSVT, vol. 22, pp.1844-1857, Dec. 2012. [22] G. Bjontegaard, “Calculation of average PSNR differences between RD-Curves”, ITU-T SG16, Doc. VCEG-M33, 13th VCEG meeting, Austin, TX, April 2001. [23] G. Bjontegaard, “Improvements of the BD-PSNR model”, ITU-T SG16 Q.6, Doc. VCEG-AI11, Berlin, Germany, July 2008. [24] Y. Yuan et al, “Quadtree based non-square block structure for interframe coding in high efficiency video coding”, IEEE Trans. CSVT, vol. 22, pp.1707-1719, Dec. 2012. [25] P. Helle et al, “Block merging for quadtree-based partitioning in HEVC”, IEEE Trans. CSVT, vol. 22, pp.1720-1731, Dec. 2012. [26] Y. Qin, X. Zhang, S. Wang, and S. Ma. " Early termination of coding unit splitting for HEVC." In Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific, pp. 1-4. IEEE, 2012.[27] H. Zhang, Q. Liu and Z. Ma, “Priority classification based intra mode decision for high efficiency video coding”, IEEE PCS 2013, pp.285-288, San Jose, CA, Dec. 2013. [28] I. E. Richardson, “The H.264 advance video compression standard”, 2nd Edition, Wiley, 2010. [29] D. Marpe, T. Wiegand and G. J. Sullivan, “The H.264/MPEG-4 AVC standard and its applications”, IEEE Communications Magazine, vol. 44, pp. 134-143, Aug. 2006.

REFERENCES[30] HEVC open source software (encoder/decoder) https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/tags/HM-16.0 [31] Information about quad tree structure of HEVC http://codesequoia.wordpress.com/2012/10/28/hevc-ctu-cu-ctb-cb-pb-and-tb/ [32] Website for downloading test sequence for research purposes http://media.xiph.org/video/derf/ [33] Information on developments in HEVC NGVC- Next generation video coding http://bisqwit.iki.fi/story/howto/openmp/ [34] F. Bossen, D. Flynn and K. Suhring (July 2012), “HEVC reference software manual” online available: http://phenix.int-evry.fr/jct/doc_end_user/documents/6_Torino/wg11/JCTVC-F634-v2.zip [35] JCT-VC documents are publicly available at http://ftp3.itu.ch/av-arch/jctvc-site and http://phenix.it-sudparis.eu/jct/ [36] Detailed Overview of HEVC/H.265 by Shevach Riabtsev https://app.box.com/s/rxxxzr1a1lnh7709yvih [37] Access the website http://www.uta.edu/faculty/krrao/dip/Courses/EE5359/ [38] Access to HM 16.0 Software Manual: http://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/tags/HM-16.0/doc/software-manual.pdf [39] Video test sequences - http://forum.doom9.org/archive/index.php/t-135034.html or http://media.xiph.org/video/derf/

http://codesequoia.wordpress.com/2012/10/28/hevc-ctu-cu-ctb-cb-pb-and-tb/

http://media.xiph.org/video/derf/

http://phenix.it-sudparis.eu/jct/

https://app.box.com/s/rxxxzr1a1lnh7709yvih

http://www.uta.edu/faculty/krrao/dip/Courses/EE5359/

http://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/tags/HM-16.0/doc/software-manual.pdf

http://forum.doom9.org/archive/index.php/t-135034.html

http://media.xiph.org/video/derf/

REFERENCES[40] G. J. Sullivan et al “Standardized Extensions of High Efficiency Video Coding (HEVC)”, IEEE Journal of selected topics in Signal Processing, vol. 7, pp.1001-1016, Dec. 2013. [41] M. Wien, “High efficiency video coding: Tools and specification”, Springer, 2015. [42] I.E. Richardson, “Coding video: A practical guide to HEVC and beyond”, Wiley, April 2016. [43] V.Sze, M.Budagavi and G.J.Sullivan “High Efficiency Video Coding (HEVC) –Algorithms and Architectures”, Springer, 2014. [44] HEVC tutorial by I.E.G. Richardson: http://www.vcodex.com/h265.html [45] H.264 tutorial by I.E.G. Richardson: https://www.vcodex.com/h264.html [46] F. Bossen, ”Excel template for BD-rate calculation based on piece-wise cubic interpolation”, JCT-VC Reflector. [47] HEVC tutorial by I.E.G. Richardson: http://www.vcodex.com/h265.html [48] V. Sze and M. Budagavi, “Design and implementation of next generation video coding systems”, Sunday 1 June 2014 (half day tutorial), IEEE ISCAS 2014, Melbourne, Australia, 1-5 June 2014. [49] J. Chen et al, “Coding tools investigation for next generation video coding based on HEVC”, [9599 – 47], SPIE. Optics + photonics, San Diego, California, USA, 9 – 13, Aug. 2015.

[50] J. Chen et al, “Coding tools investigation for next generation video coding based on HEVC”, [9599 – 47], SPIE. Optics + photonics, San Diego, California, USA, 9 – 13, Aug. 2015. [51] A. Alshin et al, “Coding efficiency improvements beyond HEVC with known tools,” [9599-48], SPIE. Optics + photonics, San Diego, California, USA, 9 – 13, Aug. 2015

http://www.vcodex.com/h265.html



ACRONYMS API: Application Programming InterfaceAVC: Advanced Video CodingCABAC: Context Adaptive Binary Arithmetic CodingCB: Coding BlockCPU: Central Processing UnitCTB: Coding Tree BlockCTU: Coding Tree UnitCU: Coding UnitCUDA: Compute Unified Device ArchitectureDCC: Data Compression Conference DCT: Discrete Cosine Transform DST: Discrete Sine TransformFDIS: Final Draft International Standard HEVC: High Efficiency Video CodingISO: International Organization for StandardizationICIP: International Conference on Image ProcessingITU-T: International Telecommunication Union – Telecommunication Standardization SectorJCT-VC: Joint Collaborative Team on Video CodingMC: Motion Compensation

ACRONYMSMCP: Motion Compensated PredictionOPENMP: Open MultiprocessingPB: Prediction BlockPCM: Pulse Code ModulationPU: Prediction UnitSAO: Sample Adaptive OffsetSIMD: Single Instruction Multiple DataSPIE: Society of Photo-Optical and Instrumentation EngineersTB: Transform BlockVCIP: Visual Communication and Image Processing.

THANK YOU

EXTRA SLIDES

EXTRA SLIDESA. Zero-Block InheritanceIn many cases, all DCT coefficients of a TU are quantized to zero in those residual blocks containing little energy. It happens often especially in the still region of a frame. Strong temporal similarity increases the inter-frame prediction accuracy. Given a TU, the RQT mode decision process needs to evaluate two options: 1) encoding residuals, or 2) skipping residual coding. At low bitrates particularly, to save bits, TU skip modes are often chosen after RD optimization. We denote these TUs as “zero-blocks”.In general, a zero-block judged at 32x32 block size does not imply all its 16x16 sub-blocks are zero-blocks. Very often, however, all the subsequent partitions of a zero-block are zero-blocks (of smaller sizes). We call this hypothesis, the Zero-Block Inheritance (ZBI). To be precise, we assume all the quadtree-partitions of a zeroblock have strong tendency to be zero blocks, and vice versa. Base on this hypothesis, the behavior of RD cost of the hierarchical zero-blocks are analyzed. For a zero block, the RD costof depth-k TU is

nishit samirbhai shah electrical engineering graduate student the university of texas at arlington,...

Documents