parallel scalability and efficiency of hevc parallelization approaches
DESCRIPTION
Parallel Scalability and Efficiency of HEVC Parallelization Approaches. Chi Ching Chi, Mauricio Alvarez-Mesa ,, Ben Juurlink , Gordon Clare, F´elix Henry , St´ephane Pateux and Thomas Schierl IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY. Outline. Introduction - PowerPoint PPT PresentationTRANSCRIPT
Parallel Scalability and Efficiency ofHEVC Parallelization Approaches
Chi Ching Chi, Mauricio Alvarez-Mesa,, Ben Juurlink, Gordon Clare, F´elix Henry, St´ephane
Pateux and Thomas SchierlIEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS
FOR VIDEO TECHNOLOGY
Outline
• Introduction• Video codec parallelization approaches• Coding efficiency analysis• Experimental evaluation• Conclusions
Introduction
• While the single-core processor can decode a 1080p H.264/AVC video in real-time, it is very unlikely that processor performance will decode a 2160p50 HEVC video in real-time.
• To obtain real-time HEVC decoding performance, parallelism is no longer an option but a necessity.
Introduction
• H.264/AVC supports slice parallelization.• It may not achieve real-time if it receives a
video with one or a few slices per frame.• The main parallelization approaches currently
included in the HEVC draft (Tiles and Wavefront Parallel Processing[WPP]).
• This paper presents a approach called Overlapped Wavefront(OWF).
Previous parallelization strategies
• Frame-level parallelism• Slice-level parallelism• Macroblock-level parallelism
Frame-level parallelism
• Frame-level parallelism consists of processing multiple frames at the same time.
• Frame-level parallelism is sufficient for multicore systems with just a few cores.
• If due to fast motion, motion vectors are long, there is little parallelism.
Slice-level Parallelism
• Each frame can be partitioned into one or more slices.
• Slices in a frame are completely independent from each other and therefore they can also be used for parallel processing.
• It is useful for a frame with a few slices but not one slice per frame.
Macroblock-level Parallelism
Parallelization Strategies in HEVC
• Tiles• Wavefront Parallel Processing (WPP)• Overlapped Wavefront (OWF)
Tiles
Tiles
• The number of tiles and the location of their boundaries can be defined for the entire sequence or changed from picture to picture.
• Compared to slices, Tiles have a better coding efficiency.
• The rate-distortion loss increases with the number of tiles.
Wavefront Parallel Processing (WPP)
Overlapped Wavefront (OWF)
• When a thread has finished a CTB row in the current picture and no more rows are available it can start processing the next picture instead of waiting for the current picture to finish.
• The support this approach, the motion vector is contrained to ¼ of picture height.
Overlapped Wavefront (OWF)
Coding efficiency analysis
Coding efficiency analysis
Experimental evaluation
• Environment
Experimental evaluation
Experimental evaluation
Experimental evaluation
Experimental evaluation
Conclusions
• We present a detailed performance comparison of the main approaches, namely WPP ,Tiles and OWF.
• Tiles performance 7% higher than WPP on average at 12 cores.
• The proposed OWF 28% higher on average than Tiles.
• Achieve real-time performance for 1080p50 videos, but “only” 25.4 fps for 2160p.