j. vis. commun. image r. - eie | homewcsiu/paper_store/journal/...atively searching from one past...

13
Iterative search strategy with selective bi-directional prediction for low complexity multiview video coding Zhi-Pin Deng a,b , Yui-Lam Chan a,, Ke-Bin Jia b , Chang-Hong Fu a , Wan-Chi Siu a a Centre for Signal Processing, Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong b Department of Electronic Information and Control Engineering, Beijing University of Technology, Beijing, China article info Article history: Received 14 July 2011 Accepted 16 January 2012 Available online 3 February 2012 Keywords: 3DTV Multiview video coding Disparity estimation Motion estimation Bi-directional prediction Hierarchical B picture Adaptive search range JMVC abstract The multiview video coding (MVC) extension of H.264/AVC is the emerging standard for compression of impressive 3D and free-viewpoint video. The coding structure in MVC adopts motion and disparity estimation to exploit temporal and inter-view dependencies in MVC. It results in a considerable increase in encoding complexity. Most of the computational burden comes from uni-directional and bi-directional prediction. In this paper, an iterative search strategy is designed to speed up the uni-directional prediction in MVC. It can work with an adaptive search range adjustment through a confidence measure of a loop constraint to obtain both motion and disparity vectors jointly. Furthermore, a selective bi-directional prediction algorithm is proposed to enhance the coding performance by analyzing the statistical charac- teristics of bi-directional prediction in MVC. Experimental results demonstrate that, by using the proposed fast search, the temporal and inter-view redundancies of multiview video can be eliminated sufficiently with low complexity. Ó 2012 Elsevier Inc. All rights reserved. 1. Introduction With recent advances in stereo and three dimensional (3D) display technologies, 3D video has become an emerging medium that can offer a richer visual experience than traditional video [1]. Potential applications include free-viewpoint video (FVV), free- viewpoint television (FTV), 3D television (3DTV), IMAX theater, immersive teleconference, surveillance, etc. [2]. To support these applications, video systems require capturing a scene from different viewpoints which result in generating several video sequences from different cameras simultaneously. Multiview video coding (MVC) [3] has been studied for a long time in the Joint Video Team (JVT) formed by ISO/IEC MPEG and ITU-T VCEG. An international standard on MVC was developed in July 2008 as the H.264/AVC Multiview High Profile [4]. It comes along with the reference software referred to as Joint Multiview Video Coding Model (JMVC) [5]. The JMVC supports multiple refer- ence frame selection from either the same view or neighboring views. Take a three-view coding structure with a group-of-picture (GOP) length of 12 shown in Fig. 1 as an example. In this example, S i denotes the ith view. There are two types of frames in this MVC prediction structure. Anchor frames (enclosed by dotted lines in Fig. 1) are placed at the beginning of a GOP while non-anchor frames lies in between two anchor frames. The JMVC uses the block-based motion estimation (ME) and disparity estimation (DE) to exploit both temporal and view correlation. Within the same view, the JMVC adopts hierarchical B picture coding (HBP) [6] as the basic temporal prediction structure such that the non- anchor frames of a GOP are classified into different temporal layers, denoted by TL 1 , TL 2 , TL 3 , and TL 4 . The non-anchor frames in the cur- rent temporal level are usually hierarchically predicted by refer- ring the frames in the last temporal level. From Fig. 1, S i is also coded by referring two neighboring views S i1 and S i+1 , and is called as B-view. This prediction structure offers higher coding effi- ciency at the expense of dramatically increased computations. With this arrangement, the prediction process in each macroblock (MB) of a non-anchor frame includes forward and backward mo- tion estimation, forward and backward disparity estimation [7] and bi-directional prediction [8]. The one with the minimum rate-distortion (R-D) cost is selected as the final prediction type of the current MB. The hybrid uni-directional prediction and bi- directional prediction schemes make the prediction process as the most computationally intensive part [9]. Many fast search algorithms have been proposed to reduce the computation time of the prediction process in MVC. Since MBs with slow/homogeneous motion prevail in a sequence and these MBs always prefer ME to DE, some early termination algorithms were 1047-3203/$ - see front matter Ó 2012 Elsevier Inc. All rights reserved. doi:10.1016/j.jvcir.2012.01.016 Corresponding author. Fax: +852 23628439. E-mail addresses: [email protected] (Z.-P. Deng), [email protected] (Y.-L. Chan), [email protected] (K.-B. Jia), [email protected] (C.-H. Fu), [email protected] (W.-C. Siu). J. Vis. Commun. Image R. 23 (2012) 522–534 Contents lists available at SciVerse ScienceDirect J. Vis. Commun. Image R. journal homepage: www.elsevier.com/locate/jvci

Upload: others

Post on 01-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: J. Vis. Commun. Image R. - EIE | Homewcsiu/paper_store/Journal/...atively searching from one past reference frame from list0 and one future reference frame from list1. At last, the

J. Vis. Commun. Image R. 23 (2012) 522–534

Contents lists available at SciVerse ScienceDirect

J. Vis. Commun. Image R.

journal homepage: www.elsevier .com/ locate / jvc i

Iterative search strategy with selective bi-directional prediction for low complexitymultiview video coding

Zhi-Pin Deng a,b, Yui-Lam Chan a,⇑, Ke-Bin Jia b, Chang-Hong Fu a, Wan-Chi Siu a

a Centre for Signal Processing, Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kongb Department of Electronic Information and Control Engineering, Beijing University of Technology, Beijing, China

a r t i c l e i n f o a b s t r a c t

Article history:Received 14 July 2011Accepted 16 January 2012Available online 3 February 2012

Keywords:3DTVMultiview video codingDisparity estimationMotion estimationBi-directional predictionHierarchical B pictureAdaptive search rangeJMVC

1047-3203/$ - see front matter � 2012 Elsevier Inc. Adoi:10.1016/j.jvcir.2012.01.016

⇑ Corresponding author. Fax: +852 23628439.E-mail addresses: [email protected] (Z.-P. D

(Y.-L. Chan), [email protected] (K.-B. Jia), [email protected] (W.-C. Siu).

The multiview video coding (MVC) extension of H.264/AVC is the emerging standard for compression ofimpressive 3D and free-viewpoint video. The coding structure in MVC adopts motion and disparityestimation to exploit temporal and inter-view dependencies in MVC. It results in a considerable increasein encoding complexity. Most of the computational burden comes from uni-directional and bi-directionalprediction. In this paper, an iterative search strategy is designed to speed up the uni-directional predictionin MVC. It can work with an adaptive search range adjustment through a confidence measure of a loopconstraint to obtain both motion and disparity vectors jointly. Furthermore, a selective bi-directionalprediction algorithm is proposed to enhance the coding performance by analyzing the statistical charac-teristics of bi-directional prediction in MVC. Experimental results demonstrate that, by using the proposedfast search, the temporal and inter-view redundancies of multiview video can be eliminated sufficientlywith low complexity.

� 2012 Elsevier Inc. All rights reserved.

1. Introduction

With recent advances in stereo and three dimensional (3D)display technologies, 3D video has become an emerging mediumthat can offer a richer visual experience than traditional video [1].Potential applications include free-viewpoint video (FVV), free-viewpoint television (FTV), 3D television (3DTV), IMAX theater,immersive teleconference, surveillance, etc. [2]. To support theseapplications, video systems require capturing a scene from differentviewpoints which result in generating several video sequencesfrom different cameras simultaneously.

Multiview video coding (MVC) [3] has been studied for a longtime in the Joint Video Team (JVT) formed by ISO/IEC MPEG andITU-T VCEG. An international standard on MVC was developed inJuly 2008 as the H.264/AVC Multiview High Profile [4]. It comesalong with the reference software referred to as Joint MultiviewVideo Coding Model (JMVC) [5]. The JMVC supports multiple refer-ence frame selection from either the same view or neighboringviews. Take a three-view coding structure with a group-of-picture(GOP) length of 12 shown in Fig. 1 as an example. In this example,Si denotes the ith view. There are two types of frames in this MVC

ll rights reserved.

eng), [email protected]@hotmail.com (C.-H. Fu),

prediction structure. Anchor frames (enclosed by dotted lines inFig. 1) are placed at the beginning of a GOP while non-anchorframes lies in between two anchor frames. The JMVC uses theblock-based motion estimation (ME) and disparity estimation(DE) to exploit both temporal and view correlation. Within thesame view, the JMVC adopts hierarchical B picture coding (HBP)[6] as the basic temporal prediction structure such that the non-anchor frames of a GOP are classified into different temporal layers,denoted by TL1, TL2, TL3, and TL4. The non-anchor frames in the cur-rent temporal level are usually hierarchically predicted by refer-ring the frames in the last temporal level. From Fig. 1, Si is alsocoded by referring two neighboring views Si�1 and Si+1, and iscalled as B-view. This prediction structure offers higher coding effi-ciency at the expense of dramatically increased computations.With this arrangement, the prediction process in each macroblock(MB) of a non-anchor frame includes forward and backward mo-tion estimation, forward and backward disparity estimation [7]and bi-directional prediction [8]. The one with the minimumrate-distortion (R-D) cost is selected as the final prediction typeof the current MB. The hybrid uni-directional prediction and bi-directional prediction schemes make the prediction process asthe most computationally intensive part [9].

Many fast search algorithms have been proposed to reduce thecomputation time of the prediction process in MVC. Since MBs withslow/homogeneous motion prevail in a sequence and these MBsalways prefer ME to DE, some early termination algorithms were

Page 2: J. Vis. Commun. Image R. - EIE | Homewcsiu/paper_store/Journal/...atively searching from one past reference frame from list0 and one future reference frame from list1. At last, the

Fig. 1. MVC prediction structure using hierarchical B pictures.

Fig. 2. Loop constraint.

Z.-P. Deng et al. / J. Vis. Commun. Image R. 23 (2012) 522–534 523

proposed to selectively skip DE according to the rate-distortion (R-D) cost of ME or the characteristics of MBs/frames [10–13].However, the determination of the stopping criterion is still aknotty problem. Based on the camera geometry and coding infor-mation of the corresponding MB in neighbor view, a reduced searchrange of ME/DE instead of exhaustive full search was suggested in[14]. However, additional information of multiview video such asthe camera geometry is required and the performance might bewell only for certain video sequences. Some predictor based fastME/DE algorithms were then proposed [15–18]. With the aid ofthe already-known disparity vector, several predictors in ME canbe obtained by tracking through the corresponding MB in the pre-coded inter-view reference frame. Then a fast ME algorithm withjust a small search window is adopted to ensure the coding perfor-mance. Since the overall performance of ME is easily influenced bythe accuracy of the previously estimated disparity vector, the fullsearch is usually performed in DE prior to ME, in order to get anaccurate disparity vector. Although these algorithms can speed upME, the complexity of DE cannot be reduced simultaneously. Thussome algorithms for reducing the complexity of both ME and DEwere proposed in [19–22]. These algorithms exploit the loop con-straint among neighboring motion and disparity vectors to expeditethe search process and remove some useless search region. Greatcomplexity can be reduced with good coding performance. But inmost of these methods, the ME and DE are relatively independent;the vector of motion/disparity field in the last step cannot be fullyused to refine the vector of disparity/motion field in the next step.Moreover, all the above algorithms do not consider the bi-direc-tional prediction in hierarchical B picture coding of MVC. In theJMVC codec, the bi-directional prediction is performed other thanforward and backward predictions on all block sizes in order tosearch for a better result and eventually identify the final predictiontype. Speeding up the bi-directional prediction must make a greatcontribution for the whole prediction part.

In this paper, we present a fast and efficient algorithm to reducecomputational burden of both uni-directional prediction and bi-directional prediction. In our previous work [23], a fast predictionalgorithm based on a loop constraint was proposed to expedite theuni-directional prediction by iteratively estimating motion and dis-parity vectors. A confidence measure of the loop constraint was thendesigned to adaptively adjust a search range for each iteration. Ourprevious algorithm only targeted to the uni-directional prediction.As a result, we could not achieve the optimal performance ofthe JMVC codec in which both uni-directional prediction and bi-directional prediction are included. In this paper, we further proposea new bi-directional prediction technique worked with our previousalgorithm in order to further speed up the JMVC codec. The proposedselective bi-directional prediction is performed in hierarchical B

picture coding of MVC based on the analysis of the prediction typeselection process. Experimental results show that the proposedalgorithm can reduce the computational complexity significantlycompared with the conventional search and keep the coding quality.

The remainder of this paper is organized as follows. Section 2gives the reliable uni-directional prediction through the confidencemeasure using the loop constraint. The selective bi-directional pre-diction technique is then presented in Section 3. Section 4 intro-duces the flowchart of the proposed algorithm. Simulationresults and discussions are presented in Section 5. Finally, Section6 gives a summary of the contribution of this paper.

2. Joint uni-directional prediction scheme

As shown in Fig. 2, a non-anchor frame in the B-view (Si) has atmost 4 reference frames from both temporal and view directions.Suppose the non-anchor frame at t in Si is the current frame, de-noted by fi,t. In the JMVC codec, the forward reference frames fi,t�T

and fi�1,t belong to list0. Similarly, the backward reference framesfi,t+T and fi+1,t belong to list1. In course of search, firstly, the motionand disparity vectors of uni-directional prediction are estimatedindependently. Four predictive vectors of the current MB (MBi,t),including the forward motion and disparity vectors (MVFW,i andDVFW,t, respectively), and the backward motion and disparity vec-tors (MVBW,i and DVBW,t, respectively), are predicted independentlyunder exhaustive motion or disparity estimation in each referenceframe. Secondly, the bi-directional prediction is employed by iter-atively searching from one past reference frame from list0 and onefuture reference frame from list1. At last, the best prediction type(forward, backward, or bi-directional) is determined by evaluatingthe one with minimum R-D cost. Although exhaustive independent

Page 3: J. Vis. Commun. Image R. - EIE | Homewcsiu/paper_store/Journal/...atively searching from one past reference frame from list0 and one future reference frame from list1. At last, the

Fig. 3. Illustrative example of using the loop constraint in forward search.

524 Z.-P. Deng et al. / J. Vis. Commun. Image R. 23 (2012) 522–534

search in the uni-directional prediction could achieve an optimaltrade-off between rate and distortion, it does not utilize the corre-lation between neighboring motion and disparity vectors. Theexhaustive full search thus costs unbearable computation. In thissection, a fast estimation technique for the uni-directional predic-tion is designed to get the forward/backward motion and disparityvectors jointly and simultaneously by making use of the relation-ship between neighboring motion and disparity vectors.

2.1. The iterative search using the loop constraint

The concept of a loop constraint has been extensively studied instereoscopic video coding to expedite the motion and disparityestimation [22]. This constraint describes the high correlation be-tween two successive frame pairs in neighboring views. As de-picted in Fig. 2, the loop constraint in stereoscopic video codingcan then be extended to both forward (FW) search and backward(BW) search in MVC [23], and they can be written as,

MVFW;i þ DVFW ;t�T ffi DVFW ;t þMVFW ;i�1 FW � search

MVBW;i þ DVBW;tþT ffi DVBW ;t þMVBW;iþ1 BW � search

�ð1Þ

where MVFW,i�1 is the forward motion vector of the disparity-com-pensated MB (DCMB) in the forward inter-view reference framefi�1,t. DVFW,t�T is the forward disparity vector of the motion-com-pensated MB (MCMB) in the forward temporal reference frame fi,t�T.MVBW,i+1 is the backward motion vector of the disparity-compen-sated MB in the backward inter-view reference frame fi+1,t. DVBW,t+T

is the backward disparity vector of the motion-compensated MB inthe backward temporal reference frame fi,t+T.

Given that DVFW,t�T, MVFW,i�1, DVBW,t+T, and MVBW,i+1 have beenobtained in the previous step, our aim is to determine DVFW,t,MVFW,i, DVBW,t, and MVBW,i of MBi,t. A direct way is to estimate DVFW,t

and DVBW,t through the full search algorithm. According to the loopconstraint of the forward search in (1), MVFW,i can then be obtainedby the combination of DVFW,t�T, MVFW,i�1 and DVFW,t. Similarly, inthe backward search, MVBW,i can be computed by DVBW,t+T, MVBW,i+1

and DVBW,t. The direct use of the loop constraint can only mitigatethe computational requirement of motion estimation, but it cannotbenefit disparity estimation. In the proposed algorithm, we aim atestimating DVFW,t�T, MVFW,i�1, DVBW,t+T, and MVBW,i+1 iteratively soas to expedite both motion and disparity estimation.

Besides, the loop constraint is only approximate for block-basedmotion and disparity estimation since the motion-compensatedand disparity-compensated MBs in the reference frames are gener-ally not aligned on MB boundary. For the sake of simplicity, wetake the forward search as an example to illustrate this problem.In the example shown in Fig. 3, MCMBi,t�T and DCMBi�1,tdenotethe best motion-compensated MB and the best disparity-compen-sated MB to MBi,t, respectively. In fi�1,t�T, MCMBi�1,t�T is the bestmotion-compensated MB to DCMBi�1,t and DCMBi�1,t�T is the bestdisparity-compensated MB to MCMBi,t�T. Nevertheless, MCMBi�1,t�T

and DCMBi�1,t�T are not coincident in fi�1,t�T. It means that the loopconstraint does not hold in practice due to the block nature ofH.264. It can only be used to generate the base vector at each iter-ation. Afterwards, the base vectors are necessary to be refined laterthrough an iterative way. Thus the joint estimation proposed inthis paper can then be designed in an iterative way. For forward/backward search, the kth iteration is composed of two steps:

Step 1: Assume MVk�1FW;i=MVk�1

BW;i is fixed and the base disparity vec-tor (BDVk

FW ;t or BDVkBW;t) is calculated based on the loop

constraint, where k indicates the iteration number. Thebase disparity vector is then refined by exhaustively com-puting all checking points in a new small search windowto obtain the refined disparity vector (DVk

FW;t or DVkBW;t).

Step 2: Assume that the refined disparity vector (DVkFW;tor DVk

BW;t)is fixed and the base motion vector (BMVk

FW;ior BMVkBW;i) is

then computed based on the loop constraint. A refinementwith the limited search range is used to get the updatedmotion vector (MVk

FW ;ior MVkBW;i).

At each iteration, two base vectors (motion and disparity) canbe computed through (1). Each base vector is used as a search cen-ter, and then a refinement process around the center with a limitedrefinement search range is carried out to get the updated vector.After several iterations, the proposed algorithm can get the optimalvectors with low complexity in comparison with the independentsearch.

From Fig. 3, it can be seen that the motion vector of DCMBi�1,t,MVFW,i�1(DCMBi�1,t), and the disparity vector of MCMBi,t�T, DVFW,t�T

(MCMBi,t�T) are not available. To approximate MVFW,i�1(DCMBi�1,t)and DVFW,t�T(MCMBi,t�T), the disparity and motion vectors of thefour overlapping MBs with DCMBi�1,t and MCMBi,t�T (marked asA, B, C, and D in Fig. 3) can be used. However, the approximationof DVFW,t�T(MCMBi,t�T) and MVFW,i�1(DCMBi�1,t) cannot exactly bethe true vectors of MCMBi,t�T and DCMBi�1,t, and thereforeDCMBi�1,t�T and MCMBi�1,t�T corresponding to the same contentin the 3D space cannot locate in the same point. As a result,DCMBi�1,t�T and MCMBi�1,t�T are no longer coincident in fi�1,t�T.This positional uncertainty in fi�1,t�T of the forward loop inducesa loop difference attached to the loop constraint, denoted asdForward,

dForward ¼ jjDVFW;tðMBi;tÞ þMVFW;i�1ðDCMBi�1;tÞ� DVFW;t�TðMCMBi;t�TÞ �MVFW;iðMBi;tÞjj ð2Þ

where ||v|| is the norm of a vector v. Similarly, the loop difference ofthe backward loop, dBackward, can be written as

dBackward ¼ jjDVBW;tðMBi;tÞ þMVBW ;iþ1ðDCMBiþ1;tÞ� DVBW;tþTðMCMBi;tþTÞ �MVBW;iðMBi;tÞjj ð3Þ

where DCMBi+1,t and MCMBi,t+T are the best disparity-compensatedMB and the best motion-compensated MB to MBi,t in the backwardsearch. dForward and dBackward would affect the accuracy of the itera-tive search. As mentioned above, it is possible to use motion anddisparity of the four overlapping MBs with DCMBi�1,t and MCMBi,t�T

to form the base motion vector (BMVFW;iðMBki;tÞ) and the base dispar-

ity vector (BDVFW;tðMBki;tÞ) at the kth iteration.

Suppose that MVk�1FW;iðMBi;tÞ obtained in the (k � 1)th iteration is

fixed. The disparity vectors of the four overlapping MBs with

Page 4: J. Vis. Commun. Image R. - EIE | Homewcsiu/paper_store/Journal/...atively searching from one past reference frame from list0 and one future reference frame from list1. At last, the

Z.-P. Deng et al. / J. Vis. Commun. Image R. 23 (2012) 522–534 525

MCMBi,t�T are the possible candidates to compose BDVkFW;tðMBi;tÞ,

and they are denoted by DVk�1;uFW;t�TðMCMBi;t�TÞ, where 1 6 u 6 4, as

shown in Fig. 4(a). Meanwhile, the motion vector of DCMBi�1,t

retrieved by DVk�1FW ;tðMBi;tÞ of the (k � 1)th iteration, denoted by

MVk�1FW;i�1ðDCMBi�1;tÞ, is also achievable. Therefore, according to

(1), there would be four possible combinations of BDVk;uFW;tðMBi;tÞ,

denoted by

BDVk;uFW;tðMBi;tÞ ¼ MVk�1

FW;iðMBi;tÞ þ DVk�1;uFW;t�TðMCMBi;t�TÞ

�MVk�1FW;i�1ðDCMBi�1;tÞ ð4Þ

Therefore, the resultant BDVkFW ;tðMBi;tÞ of the kth iteration can then

be derived from

BDVkFW;tðMBi;tÞ ¼ Min

16u64ðRDCostðBDVk;u

FW;tðMBi;tÞÞÞ ð5Þ

where RDCost() is the cost function that takes both of the distortionand the bit number consumed for coding MB into account. Noticethat the corresponding DVk�1;u

FW ;t�TðMCMBi;t�TÞ in (4) which contributesto attain BDVk

FW;tðMBi;tÞ in (5) is updated as DVkFW ;t�TðMCMBi;t�TÞ for

the next step to get BMVkFW ;iðMBi;tÞ .

Similarly, when the refined disparity vector DVkFW;tðMBi;tÞ is fixed,

the motion vectors of the four overlapping MBs with DCMBi�1,t,represented by MVk;v

FW;i�1ðDCMBi�1;tÞ, where 1 6 v 6 4, are thepossible candidates to compose BMVk

FW ;iðMBi;tÞ, as depicted inFig. 4(b). The resultant base motion vector BMVk

FW;iðMBi;tÞ for thekth iteration will be

BMVkFW;iðMBi;tÞ ¼ Min

16v64ðRDCostðBMVk;v

FW ;iðMBi;tÞÞ ð6Þ

Fig. 4. Processes of obtaining (a) the base disparity vector and (b) the base motionvector.

where

BMVkFW;iðMBi;tÞ ¼ DVk

FW ;tðMBi;tÞ þMVk;vFW;i�1ðDCMBi�1;tÞ

� DVkFW ;t�TðMCMBi;t�TÞ ð7Þ

where DVkFW ;t�TðMCMBi;t�TÞ is obtained in the last step of the kth

iteration.A similar iterative process can also be done in the backward

loop to get the base motion and disparity vectors (MVBW,i+1andDVBW,t+T) of the backward search.

2.2. Search range adjustment

The above section described a novel search strategy in whichboth motion and disparity vectors are iteratively updated to mini-mize the cost function under the loop constraint. The refined vec-tor obtained in the previous step from motion or disparityestimation can be used to get a new base disparity or motion vec-tor in the current step. However, using the loop constraint directlymay introduce the positional uncertainty, which is described bydForward and dBackward in (2) and (3). This uncertainty is more seriousin some occlusion and ambiguous areas. The use of multiple candi-dates in Section 2.1 can alleviate this uncertainty in the iterativesearch. In this section, another technique of dealing with theuncertainty is proposed to assign a confidence measure to thesearch strategy. For each iteration, this measure is used to adap-tively adjust the search range of the refinement process, and canfurther improve the reliability of the final motion and disparityvectors.

To obtain a more accurate vector field for the next vector field inthe aforementioned search strategy, a refinement around each basevector is necessary. The search range of the refinement is of greatimportance for the performance of the iterative search. It is well-known that a large refinement search range can compensate anunreliable base vector but it needs heavy computation load. Onthe other hand, a small refinement search range can reduce thenumber of search points but it is easily trapped into local minimaand leads to an incorrect prediction result. This further affects theaccuracy of the subsequent iterations for the iterative search. Toavoid being trapped in a local minimum with the reasonablesearch range, the size of the search range at each iteration is adap-tively adjusted based on the reliability of the base vector. In gen-eral, a smaller refinement search range is used for the case of thereliable loop constraint while a larger refinement search range isneeded for the case of the unreliable constraint. The dForward anddBackward in (2) and (3) can be used as the confidence measure forthe base vector to determine the search range of the refinementin forward and backward search respectively. If dForward/dBackward

is small, it means that the base vector obtained by the loop con-straint is reliable, and the size of the search range can be reducedto a proper size without affecting the estimation accuracy. On theother hand, if dForward/dBackward is large, it implies that the base vec-tor may be inaccuracy, and thus the size of the search range shouldbe larger in order to maximize the possibility for finding the globalminimum. This process also benefits the subsequent iterations ifmore reliable motion and disparity vectors can be obtained inthe previous iterations.

As depicted in Fig. 5, the size of the refinement search rangefor a current MB, RSR, can be adjusted with the aid of dForward/dBackward,

RSR ¼

RSRMIN d < Th1

RSRMIN þ d�Th1Th2�Th1

ðRSRMAX � RSRMINÞ Th1 6 d 6 Th2

RSRMAX d > Th2

8>><>>:

ð8Þ

where

Page 5: J. Vis. Commun. Image R. - EIE | Homewcsiu/paper_store/Journal/...atively searching from one past reference frame from list0 and one future reference frame from list1. At last, the

Fig. 5. Adaptive RSR based on d.

526 Z.-P. Deng et al. / J. Vis. Commun. Image R. 23 (2012) 522–534

d ¼dForward FW� searchdBackward BW� search

RSRMIN and RSRMAX denote the sizes of the minimum and maximumsearch ranges, respectively. RSRMAX ¼ da� SRe, a 6 1. a is a theoret-ical factor to control the search range, and SR is the initial searchrange. We can adjust a to find a balance between the predictionaccuracy and computational complexity. From (8), two confidencethresholds, Th1 and Th2, must be given to discriminate RSR for thethree regions of the search range, as shown in Fig. 5. If d is smallerthan Th1, the base vector should be reliable and it is confident ofusing a very small RSR, i.e. RSRMIN.In most cases, it is sufficient toset RSRMIN equal to 2, which is good enough to ensure the predictionaccuracy with limited search points. If d is larger than Th2, it ishighly probable that the loop constraint does not work well forthe current MB, then RSR is assigned to RSRMAX. WhenTh1 6 d 6 Th2, RSR varies in direct proportion to d. The confidencethresholds, Th1 and Th2, have been selected experimentally by con-sidering the tradeoff between the computational requirement andprediction accuracy for most sequences.

3. Selective bi-directional prediction scheme

Bi-directional prediction uses the forward reference frame fromlist0 and the backward reference frame from list1 as two refer-ences for prediction. It could allow for decreased noise by averag-ing both forward and backward predictions. Additional benefit ofbi-directional prediction is the ability to match a background areathat was occluded in the previous reference frame of list0 but canbe found in the reference frame of list1 using backward prediction.

In JMVC, a Lagrangian rate-distortion optimization process isemployed to select the best prediction type among forward, back-ward, and bi-directional prediction for each MB by checking theRDCost. This RDCost is used to optimize the amount of distortion(D) against the number of coded bits (R) required to encode thecurrent MB, and it can be defined as

RDCost ¼ Dþ kR ð9Þ

where k is the Lagrange multiplier associated with the quantizationfactor (dQP), by the relationship

k ¼ 0:85� 2minf52;dQPg=3�4 ð10Þ

The computation of the RDCost for each prediction type requires theavailability of the reconstructed image and actual bit count. Bothrequirements necessitate completion of the encoding-decoding cy-cle. Furthermore, the RDCost has to be computed for all predictiontypes. As a consequence, the exhaustive computation of forward,backward, and bi-directional prediction in the JMVC can obtainthe best R-D performance. Unfortunately, the computational burdenintroduced is enormous.

In this section, we develop a selective bi-directional predictionscheme, which provides different levels of early termination ofbi-directional prediction, determined by the loop constraint andthe hierarchical prediction structure. Fig. 6 gives the statisticalanalysis of different prediction types in MVC. Six sequences weretested. In our experiment, it can be seen that most MBs (over80%) select uni-directional prediction as the final prediction type.Thus the bi-directional prediction is not always a necessary stepfor all the MBs. If the bi-directional prediction can be performedconditionally, lots of coding time could be saved with insignificantquality drop.

As discussed in Section 2, the loop differences, dForward and dBack-

ward, in (2) and (3) can be used to represent the reliability of theloop constraint. Only if the corresponding MBs in neighboring tem-poral and inter-view reference frames are projected into the sameobject in the real world, dForward or dBackward of the forward or back-ward loop is equal to zero. Uni-directional prediction in either oneof the loops can achieve accurate estimation. The evidence isshown in Table 1 where the percentages of MBs selecting bi-direc-tional prediction as the final prediction type for various sequencesunder the full search when either dForward or dBackward is zero are tab-ulated. In this table, it can be observed that the bi-directional pre-diction is not always used when dForward or dBackward is equal to zero,especially in the case of a high QP value. Further discussion abouthow the QP affects the usage of bi-directional prediction will bementioned later. Consequently, the unnecessary bi-directional pre-diction could be skipped to reduce computational complexity. Thesearch can then be early terminated. On the other hand, if both ofdForward and dBackward are non-zero, there is high probability thatuni-directional prediction of the current MB is in an occlusion area.It is noted that bi-directional prediction could provide occlusionbenefits. To sum up, when dForward and dBackward are non-zero, it ismost probable that the current MB is in an occlusion area and itis therefore necessary to carry out bi-directional prediction in or-der to maximize the prediction accuracy.

The early termination of bi-directional prediction can be furtherenhanced through the use of characteristics of the hierarchical pre-diction structure in MVC. In the hierarchical prediction structureshown in Fig. 1, the anchor frames form the temporal base layerTL0. The non-anchor B-frames in the temporal layer TL1 are codedwith two reference frames in TL0, while it becomes one of the ref-erence frames to the next temporal layer, TL2 and so on. It is notedthat all frames in TL0 are used as reference frames for other layers.Thus, the temporal base layer should be encoded with the highestfidelity. To achieve high quality encoded video, the values of quan-tization parameters used in the lower temporal layers should besmaller than those used in the higher temporal layers. In [24],the cascaded quantization parameters (dQPs) were suggested as

dQP ¼ QP þ DQP ð11Þ

where QP represents the basic quantization parameter. DQPchanges with the temporal layer according to

DQP ¼

0; TL0

3; TL1

4; TL2

5; TL3

6; TL4

8>>>>>><>>>>>>:

ð12Þ

From (10)–(12), it is clear that either a high QP value or a high TLinduces a high dQP, which, in turn, increases k. In this case, R in(9) gets more important than D in the calculation of RDCost. There-fore, R should be controlled into a small value to keep RDCostminimum. As a consequence, bi-directional prediction becomesnot indispensable at high TL/QP since the motion rate (R) ofbi-directional prediction is large for encoding both the motion

Page 6: J. Vis. Commun. Image R. - EIE | Homewcsiu/paper_store/Journal/...atively searching from one past reference frame from list0 and one future reference frame from list1. At last, the

Fig. 6. Percentage of different prediction types under the full search.

Table 1Statistical Analysis of selecting the bi-directional prediction type under the full search when either dForward = 0 or dBackward = 0.

QP22 (%) QP27 (%) QP32 (%) QP37 (%)

Vassar 12.73 4.91 2.44 1.14Flamenco2 7.42 5.72 3.54 1.95Race1 6.95 2.7 0.77 0.24Rena 12.23 6.02 2.25 1.12Uli 8.68 5.26 3.44 2.32Jungle 7.43 4.52 2.86 1.58

Z.-P. Deng et al. / J. Vis. Commun. Image R. 23 (2012) 522–534 527

and disparity vectors from the forward and backward referenceframes. Fig. 7 gives the coding results of ‘‘Rena’’ at different QPsusing the full search algorithm. From Fig. 7(a), it can be observedthat a high QP (QP = 37) favors a smooth and uniform motion/dis-parity vector field in order to ensure a small value of R. Subse-quently, the distortion of the reconstructed frame is larger andfewer MBs select bi-directional prediction as the resulting predic-tion type, as depicted in Fig. 7(b). On the other hand, coding witha low QP (QP = 22) results in a frame with chaotic vector field andmore bi-directional predicted MBs, as shown in Fig. 7(c) and (d).

To further exploit the relationship among the usage of bi-direc-tional prediction, TL, and QP, Table 2 demonstrates this in terms ofthe proportion of bi-directional prediction MBs, denoted asKBI(QP, TL). The statistical result of KBI(QP, TL) is computed basedon ‘‘Vassar’’, ‘‘Flamenco2’’, ‘‘Race1’’, ‘‘Rena’’, ‘‘Uli’’, and ‘‘Jungle’’.From this table, it can be seen that KBI(QP, TL) becomes smallerwhen QP/TL gets larger. It further demonstrates the aforemen-tioned analysis that the bi-directional prediction is less desirableat high QP/TL. According to this observation, a feasible threshold-ing method can be designed to save some unnecessary searches[25–26]. If KBI(QP, TL) > TBI, bi-directional prediction is bound tobe active for better search results. However, this method is basedon the hypothesis that KBI(QP, TL) is already known. Hence we mustgenerate a K 0BIðQP; TLÞ to simulate KBI(QP, TL) in the real situation.From Table 2, we found that KBI(QP, TL)is approximately linear withTL and QP. K 0BIðQP; TLÞ in the proposed selective bi-directional pre-diction is then formed theoretically by

K 0BIðQP; TLÞ ¼ a� TLþ b� QP þ c ð13Þ

where a, b and c are the parameters of the approximately linearfunctions. In this paper, a, b and c are experimentally set as�0.04, �0.03 and 1.30, respectively.

From the above analysis, the condition of the selective bi-direc-tional prediction scheme should take into account both K 0BIðQP; TLÞand dForward/dBackward. When bi-directional prediction can beskipped. Otherwise, bi-directional prediction is performed.

4. The flowchart of the proposed algorithm

Based on the above considerations, the proposed algorithm isdivided into two parts, fast uni-directional prediction and selectivebi-directional prediction. Its flowchart is shown in Fig. 8. For uni-directional prediction, four steps – an initialization, a refinementsearch range adjustment, an iterative search, and a stopping pro-cess are included. For simplicity, a summary of the forward searchis provided in the following, where k is the iteration step:

Step 1. Initialization:(i) Set k = 0 and d = 0.

(ii) Select an initial base motion vector (BMV0FW;iðMBi;tÞ) and an

initial base disparity vector (BDV0FW ;tðMBi;tÞ) from the follow-

ing vector sets:

BMV0FW;iðMBi;tÞ : fMVFW;i�1;MVmed;MVa;MVb;MVc;~0g;

and

BDV0FW;tðMBi;tÞ : fDVFW;t�T ;DVmed;DVa;DVb;DVc;~0g

where MVFW,i�1 is the motion vector of the co-located MB in the for-ward inter-view reference frame fi�1,t. DVFW,t�T is the disparity vec-tor of the co-located MB in the forward temporal reference framefi,t�T. MVa/DVa, MVb/DVb and MVc/DVc are motion/disparity vectorsfrom the neighboring left, upper, and upper-right blocks of the cur-rent block. MVmed is the median of MVa, MVb and MVc. DVmed is the

Page 7: J. Vis. Commun. Image R. - EIE | Homewcsiu/paper_store/Journal/...atively searching from one past reference frame from list0 and one future reference frame from list1. At last, the

Fig. 7. Illustration of the encoded B-view of ‘‘Rena’’ at different QPs under the full search.

Table 2The statistics of KBI(QP, TL) under the full search.

KBI(QP, TL) TL1 (%) TL2 (%) TL3 (%) TL4 (%)

QP22 46.20 33.13 25.77 26.27QP27 32.51 20.54 14.36 15.63QP32 20.21 11.71 8.34 9.16QP37 10.72 6.93 5.27 5.20

528 Z.-P. Deng et al. / J. Vis. Commun. Image R. 23 (2012) 522–534

median of DVa, DVb and DVc. Note that either motion or disparityvector is available [27–28] for the spatially neighboring blocks.Thus, if block b is predicted from the inter-view reference frame,MVb is not available. In this case, (0,0) is used to replace MVb forderiving MVmed. For each initial base vector, the one that has theminimum value of RDCost is selected.

(iii) Carry out refinement on both BMV0FW;iðMBi;tÞ and BDV0

FW;t

ðMBi;tÞ with RSR equal to RSRMIN to obtain MV0FW;iðMBi;tÞ and

DV0FW;tðMBi;tÞ. Save RDCostðMV0

FW;iðMBi;tÞÞ and RDCostðDV0FW ;t

ðMBi;tÞÞ, and set k = k + 1.

Step 2. Refinement search range adjustment: Set the refinementsearch range RSR through (8).

Step 3. Iterative search:

(i) Fix MVk�1

FW;iðMBi;tÞ and compute the base disparity vec-tor BDVk

FW ;tðMBi;tÞ via (4) and (5). Perform disparityvector refinement with RSR on BDVk

FW;tðMBi;tÞ to getDVk

FW ;tðMBi;tÞ. Save RDCostðDVkFW;tðMBi;tÞÞ.

(ii) Fix DVkFW ;tðMBi;tÞ and calculate the base motion vector

BMVkFW;iðMBi;tÞ using (6) and (7). Determine MVk

FW;i

ðMBi;tÞ by carrying out motion vector refinement withRSR on BMVk

FW;iðMBi;tÞ. Save RDCostðMVkFW;iðMBi;tÞÞ.

Step 4. Stopping process: if RDCostðMVkFW;iðMBi;tÞÞP RDCost

ðMVk�1FW;iðMBi;tÞÞ and RDCostðDVk

FW;tðMBi;tÞÞP RDCostðDVk�1FW ;t

ðMBi;tÞÞ, set MVk�1FW;iðMBi;tÞ and DVk�1

FW;tðMBi;tÞ as the final

Page 8: J. Vis. Commun. Image R. - EIE | Homewcsiu/paper_store/Journal/...atively searching from one past reference frame from list0 and one future reference frame from list1. At last, the

Fig. 8. The flowchart of the proposed fast algorithm.

Z.-P. Deng et al. / J. Vis. Commun. Image R. 23 (2012) 522–534 529

motion and disparity vectors, stop the iteration. Other-wise, set k = k + 1, calculate and update dForward from (2),go to step 2.

Once the final motion and disparity vectors are calculated, theselection between motion and disparity vectors of MBi,t is basedon RDCostðMVk�1

FW;iðMBi;tÞÞ and RDCostðDVk�1FW;tðMBi;tÞÞ. If RDCost

ðMVk�1FW;iðMBi;tÞÞ is smaller, MVk�1

FW;iðMBi;tÞ is used to compute the pre-diction error of MBi,t for forward search. Otherwise, DVk�1

FW;tðMBi;tÞ isselected. The corresponding RDCost is chosen as RDCostFW .

The same process can then be applied to the backward search.After performing the uni-directional prediction, as depicted in asFig. 8, the selective bi-directional prediction can be skipped ifeither dForward = 0 or dBackward = 0, and K 0BIðQP; TLÞ 6 TBI in order tosave unnecessary complexity. When the bi-directional predictionis selected, the forward and backward motion/disparity vectors(MVBI�FW,i/DVBI�FW,t and MVBI�BW,i/DVBI�BW,t) and RDCostBI are savedas the results of the bi-directional prediction. Once the forward,backward, and bi-directional predictions are performed, RDCostFW,RDCostBW, and RDCostBI are compared and the one with the mini-mum RDCost is eventually selected as the final search result ofthe current MB.

5. Experimental results

This section presents the performance and the coding complex-ity of the proposed algorithms. We used six public multiview se-quences [29], ‘‘Ballroom’’, ‘‘Exit’’, ‘‘Vassar’’, ‘‘Flamenco2’’, ‘‘Race1’’,and ‘‘Rena’’, with the image size of 640 � 480 for performancecomparison. For simplicity, but without loss of generality, view 1was encoded as B-view, and view 0 and view 2 were used as theleft and right reference views, respectively. For the implementa-tion, the proposed algorithm was built based on the Joint Multi-

view Video Coding (JMVC version 7.2) [5]. In each sequence, 100frames were encoded by the proposed algorithm and four conven-tional methods, which are the full search algorithm (FSA), the well-known fast TZ search algorithm in JMVC [5], and the fast algo-rithms in REF_SPIE2006 [16], REF_CE2007 [20], and REF_CE2010[15]. Basically, the bitstreams were encoded by different algo-rithms according to the test condition in [30]. Four different QPs(i.e. QP = 22, 27, 32, and 37) with the classical AS_IBP structure[6], quarter-pel motion and disparity estimation with ±96-pelsearch range, and CABAC were used. Besides, the length of GOPwas set to 12. We incorporated the proposed techniques into theJMVC 7.2 [5], and let us call them IS and IS + SBP. IS uses the iter-ative search in the uni-directional prediction. For IS + SBP, it fur-ther extends IS by adopting the selective bi-directionalprediction. For the proposed algorithms, T1 and T2, were selectedby considering the tradeoff between the computational require-ment and the prediction accuracy for most sequences. In this pa-per, T1 and T2 were experimentally set to 5 and 20, respectively.Besides, the factor a was set to 0.33. TBI is set to 0.30. The determi-nation of all thresholds are based on the statistics of ‘‘Vassar’’, ‘‘Fla-menco2’’, ‘‘Race1’’, ‘‘Rena’’, ‘‘Uli’’, and ‘‘Jungle’’. It is interesting tonote that this training set of sequences for threshold setting is dif-ferent from the aforementioned testing sequences. For instance,‘‘Ballroom’’ and ‘‘Exit’’ are not in the training set. The aim of thisarrangement is to ensure the effectiveness of the proposed tech-niques when the sequences are not in the training set. All the sim-ulations were performed on an Intel(R) Xeon(R) X5550 2.67 GHzcomputer CPU with 12 GB RAM.

5.1. Analysis of computational complexity

To make the results independent of different platforms, thecomputational efficiency of the tested algorithms has been mea-

Page 9: J. Vis. Commun. Image R. - EIE | Homewcsiu/paper_store/Journal/...atively searching from one past reference frame from list0 and one future reference frame from list1. At last, the

Table 3Number of search points per MB and speed-up ratio for different sequences.

FSA TZ REF_SPIE 2006 [16] REF_CE 2007 [20] REF_CE 2010 [15] Proposed

IS IS + SBP

Ballroom QP22 132132 9233 67289 21275 4633 4112 4112– (14.31) (1.96) (6.21) (28.52) (32.13) (32.13)

QP27 132097 9125 67234 28762 4322 3795 3795– (14.48) (1.96) (4.59) (30.56) (34.81) (34.81)

QP32 132140 8908 67223 21188 3893 3589 2741– (14.83) (1.97) (6.24) (33.94) (36.82) (48.21)

QP37 132146 8486 67182 15730 3682 3330 2416– (15.57) (1.97) (8.40) (35.89) (39.69) (54.70)

Exit QP22 132220 8914 67258 6651 3621 2540 2540– (14.83) (1.97) (19.88) (36.51) (52.06) (52.06)

QP27 132190 8324 67178 5039 3072 2197 2197– (15.88) (1.97) (26.23) (43.03) (60.17) (60.17)

QP32 132152 7875 67123 5150 3251 2069 1150– (16.78) (1.97) (25.66) (40.65) (63.89) (114.92)

QP37 132089 7436 67064 4925 3539 1944 927– (17.76) (1.97) (26.82) (37.32) (67.96) (142.49)

Vassar QP22 132156 7686 67196 4444 2594 1754 1754– (17.19) (1.97) (29.74) (50.95) (75.35) (75.35)

QP27 132064 7151 67103 4792 2818 1652 1652– (18.47) (1.97) (27.56) (46.86) (79.94) (79.94)

QP32 132039 6156 67072 4967 2433 1564 515– (21.45) (1.97) (26.58) (54.27) (84.44) (256.39)

QP37 132063 5217 67092 4261 2211 1549 378– (25.31) (1.97) (30.99) (59.73) (85.24) (349.37)

Flamenco2 QP22 132549 8701 67467 4929 9179 2751 2751– (15.23) (1.96) (26.89) (14.44) (48.18) (48.18)

QP27 132547 8745 67433 4789 8860 2636 2636– (15.16) (1.97) (27.68) (14.96) (50.28) (50.28)

QP32 132539 8805 67392 4707 8534 2526 1514– (15.05) (1.97) (28.16) (15.53) (52.48) (87.54)

QP37 132483 8794 67323 4476 8151 2362 1327– (15.07) (1.97) (29.60) (16.25) (56.09) (99.84)

Race1 QP22 132048 9434 67272 6855 10812 4403 4403– (14.00) (1.96) (19.26) (12.21) (29.99) (29.99)

QP27 132081 9687 67246 6184 13563 4265 4265– (13.63) (1.96) (21.36) (9.74) (30.97) (30.97)

QP32 132098 9683 67235 5918 17135 4047 3424– (13.64) (1.96) (22.32) (7.71) (32.64) (38.58)

QP37 132113 9386 67249 5177 16241 3760 3103– (14.08) (1.96) (25.52) (8.13) (35.14) (42.58)

Rena QP22 132214 6006 67279 8541 3925 2521 2521– (22.01) (1.97) (15.48) (33.69) (52.45) (52.45)

QP27 132183 5828 67241 8569 3807 2414 2414– 22.68) (1.97) (15.43) (34.72) (54.76) (54.76)

QP32 132127 5369 67169 7845 3553 2204 1164– (24.61) (1.97) (16.84) (37.19) (59.96) (113.51)

QP37 132105 4484 67138 6327 3315 1943 820– (29.46) (1.97) (20.88) (39.85) (67.98) (161.10)

530Z.-P.D

enget

al./J.Vis.Com

mun.Im

ageR

.23(2012)

522–534

Page 10: J. Vis. Commun. Image R. - EIE | Homewcsiu/paper_store/Journal/...atively searching from one past reference frame from list0 and one future reference frame from list1. At last, the

Table 4The statistics of average number of iterations for different sequences.

Equal to 1 (%) Be smaller or equal to 5 (%) Average number of iterations

Ballroom QP22 73.58 96.94 1.62QP27 74.02 96.63 1.63QP32 74.49 96.09 1.66QP37 74.36 95.82 1.68

Exit QP22 70.33 98.77 1.53QP27 75.14 98.37 1.49QP32 78.94 98.33 1.43QP37 82.22 98.42 1.37

Vassar QP22 89.60 99.79 1.15QP27 90.13 99.80 1.15QP32 92.94 99.81 1.12QP37 95.88 99.83 1.07

Flamenco2 QP22 72.05 98.93 1.48QP27 72.67 98.66 1.49QP32 73.02 98.71 1.48QP37 73.03 98.50 1.5

Race1 QP22 53.09 96.48 1.98QP27 51.79 95.99 2.03QP32 52.06 94.88 2.08QP37 54.08 94.75 2.06

Rena QP22 79.44 98.11 1.43QP27 80.61 98.16 1.41QP32 82.60 98.21 1.37QP37 86.38 98.50 1.30

Table 5Average values of d and RSR at different QPs.

QP Ballroom Exit Vassar Flamenco2 Race1 Rena

Avg(d) Avg(RSR) Avg(d) Avg(RSR) Avg(d) Avg(RSR) Avg(d) Avg(RSR) Avg(d) Avg(RSR) Avg(d) Avg(RSR)

22 8.32 5.41 3.35 2.63 1.59 2.05 3.22 2.94 9.16 6.30 2.68 2.4527 7.19 4.83 2.61 2.35 1.88 2.03 2.84 2.79 8.65 5.81 2.74 2.3332 6.73 4.41 2.19 2.25 2.02 1.99 2.56 2.66 8.10 5.08 3.32 2.3137 5.92 4.00 1.96 2.24 1.41 1.98 2.33 2.46 7.48 4.42 2.86 2.06

Z.-P. Deng et al. / J. Vis. Commun. Image R. 23 (2012) 522–534 531

sured in terms of the average number of search points per MB re-quired for the whole prediction process, as shown in Table 3. FromTable 3, it shows that our algorithms can successfully reduce thecomputational complexity of FSA and the number of search pointsis fewer than those of various algorithms in REF_SPIE2006 [16], RE-F_CE2007 [20], REF_CE2010 [15] and TZ search for all sequences.For all QPs, our algorithms require the least number of searchpoints. The speed-up ratios compared to FSA of the algorithmsare also provided in the brackets of Table 3. The speed-up ratio isdefined as the number of search points of FSA divided by thoseof various algorithms. From Table 3, it can be found that the pro-posed IS can save a lot of search points and achieve high speed-up ratio, as compared with other algorithms. The significant com-plexity reduction is due to the fact that the proposed IS predicts themotion and disparity vectors jointly by a strategy using the corre-lation from neighboring successive view frames other than esti-mating motion and disparity vectors separately. With the aid ofthe adaptive refinement search range adjustment technique andthe motion vector obtained in the last iteration, the disparity vec-tor in the current iteration can be estimated rapidly and accurately.The search points required by the proposed IS only reduces thesearch points included in the uni-directional prediction part ofthe JMVC. The bi-directional prediction still consumes lots of com-plexity. As expected, the proposed IS + SBP further reduces thenumber of search point and achieves the largest speed-up ratioamong all algorithms, as shown in the last column of Table 3,and the speed up is up to 349 times of FSA in the ‘‘Vassar’’sequence.

To make further analysis of the proposed IS + SBP, the statisticsof the average number of iterations required by IS + SBP is alsoshown in Table 4. It can be seen that over 95% of MBs requires lessthan 5 iterations, and the average number of iterations alwayskeeps a small value. This explains why the overwhelmingspeed-up factor can be achieved by IS + SBP. It is clear from Table3 that the speed-up ratio of the proposed algorithms varies withdifferent sequences. It can be explained by an average value ofdForward/dBackward, as tabulated in Table 5. In this table, it can be ob-served that, for ‘‘Race1’’ and ‘‘Ballroom’’, a large value of dForward/dBackward occurs due to the existence of fast motion activity orocclusion area. Consequently, an increased refinement searchrange for uni-directional prediction is preferred to allow moresearch points for large dForward/dBackward, in order to maximize thepossibility of finding a better vector. The evidence has beenshown in Table 3. As a result, more search points are needed. Alarge value of dForward/dBackward also increases the chance of usingbi-directional prediction, which demands more computationalburden. Therefore, the speed-up ratios are lower for ‘‘Race1’’ and‘‘Ballroom’’ than other sequences such as ‘‘Vassar’’. Besides, it isinteresting to note that the speed-up ratio of IS + SBP increaseswhen QP increases. As explained in Section 3, a high QP value in-duces a high dQP, which results in lowering the probability ofusing the bi-directional prediction. The unnecessary bi-directionalprediction can then be skipped to reduce computational complex-ity. On the other hand, more accurate motion vectors tend to begenerated at a low QP value in order to keep small distortion.The bi-directional prediction is nearly performed on all MBs to

Page 11: J. Vis. Commun. Image R. - EIE | Homewcsiu/paper_store/Journal/...atively searching from one past reference frame from list0 and one future reference frame from list1. At last, the

532 Z.-P. Deng et al. / J. Vis. Commun. Image R. 23 (2012) 522–534

maximize the prediction accuracy and achieve higher precision ofmotion vectors at a low QP value. This explains why the numbers

Fig. 9. Rate-distortion performances of the tested algorithms for (a) Ballroom

of search points of IS and IS + SBP are almost the same at a low QPvalue.

, (b) Exit, (c) Vassar, (d) Flamenco2, (e) Race1, and (f) Rena sequences.

Page 12: J. Vis. Commun. Image R. - EIE | Homewcsiu/paper_store/Journal/...atively searching from one past reference frame from list0 and one future reference frame from list1. At last, the

Fig. 9 (continued)

Z.-P. Deng et al. / J. Vis. Commun. Image R. 23 (2012) 522–534 533

5.2. Analysis of R-D performance

The rate-distortion (R-D) curves of all algorithms for differentsequences are shown in Fig. 9. It is shown that there is nearly nodifference between the proposed algorithms and FSA, but the

proposed algorithms achieve higher coding efficiency in compari-son with REF_CE2010, REF_CE2007 and REF_SPIE2006. Table 6 alsosummaries the results the Bjontegaard delta bitrate (BDBR) andBjontegaard delta PSNR (BDPSNR) [31] as compared to FSA. It canbe seen that the proposed IS and IS + SBP outperform the algorithms

Page 13: J. Vis. Commun. Image R. - EIE | Homewcsiu/paper_store/Journal/...atively searching from one past reference frame from list0 and one future reference frame from list1. At last, the

Table 6R-D performance as compared to FSA using BD measure [31].

TZ REF_SPIE 2006 [16] REF_CE 2007 [20] REF_CE 2010 [15] Proposed

IS IS + SBP

BD PSNR(dB)

BD BR(%)

BD PSNR(dB)

BD BR(%)

BD PSNR(dB)

BD BR(%)

BD PSNR(dB)

BD BR(%)

BD PSNR(dB)

BD BR(%)

BD PSNR(dB)

BD BR(%)

Ballroom 0.02 �0.55 �0.18 4.67 �0.16 4.33 �0.32 8.67 �0.01 0.28 �0.01 0.27Exit 0.00 0.08 �0.05 2.06 �0.20 8.12 �0.30 12.52 �0.01 0.38 0.00 0.04Vassar 0.01 �0.32 �0.01 0.36 �0.01 0.74 �0.02 1.01 0.00 0.13 0.00 �0.35Falmenco2 0.04 �0.67 �0.37 7.26 �0.35 6.69 �0.14 2.71 0.02 �0.45 0.06 �1.20Race1 0.05 �1.13 �0.13 3.19 �0.48 11.62 �0.35 8.71 0.05 �1.26 0.03 �0.53Rena 0.09 �1.62 0.01 �0.20 0.04 �0.83 0.07 �1.25 0.06 �1.18 0.11 �2.05

534 Z.-P. Deng et al. / J. Vis. Commun. Image R. 23 (2012) 522–534

in REF_CE2010, REF_CE2007 and REF_SPIE2006. Meanwhile, thereis nearly no quality drop between the proposed IS, IS + SBP, TZsearch and FSA.

6. Conclusion

Based on the high correlation between neighboring viewframes, a fast algorithm has been proposed for both uni-directionalprediction and bi-directional prediction in MVC to reduce the com-plexity of the JMVC codec. Firstly, an iterative search with an adap-tive search range adjustment is performed in temporal and inter-view reference frames to get the refined motion and disparity vec-tors simultaneously. The refined vector obtained in the last motion(disparity) estimation can be used to get a new base vector in thenext disparity (motion) estimation through the loop constraintwith an efficient search range. Secondly, a selective bi-directionalprediction scheme is designed by analyzing the characteristic ofMVC sequences. Simulation results show that the speed of the pro-posed IS + IBP algorithm in average is 88 times over that of the fullsearch algorithm in JMVC, while the R-D performance is almostmaintained.

Acknowledgments

The work described in this paper is partially supported by theCentre for Signal Processing, Department of EIE, PolyU, a grantfrom the Internal Competitive Research Grant, PolyU, Hong Kong,China (PolyU G-YJ27), and a National Natural Science Foundationof China under Grant No. 30970780.

References

[1] MPEG, Call for proposals on multi-view video coding, MPEG N7327, Poznan,Poland, July 2005.

[2] A. Smolic, K. Mueller, N. Stefanoski, J. Ostermann, A. Gotchev, G.B. Akar, G.Triantafyllidis, A. Koz, Coding algorithms for 3DTV – A survey, IEEE Trans.Circuits Syst. Video Technol. 17 (2007) 1606–1621.

[3] MPEG, Survey of Algorithms used for Multi-view Video Coding (MVC), MPEGN6909, Hong Kong, China, January, 2005.

[4] G.J. Sullivan, T. Wiegand, H. Schwarz, Editors’ draft revision to ITU-T Rec. H.264| ISO/IEC 14496-10 Advanced Video Coding - in preparation for ITU-T SG 16AAP Consent (in integrated form), JVT-AD007, Geneva, CH, 2009.

[5] P. Pandit, A. Vetro, Y. Chen, WD 1 Reference software for MVC, JVT-AA212,Geneva, CH, April, 2008.

[6] P. Merkle, A. Smolic, K. Muller, T. Wiegand, Efficient prediction structures formultiview video coding, IEEE Trans. Circuits Syst. Video Technol. 17 (2007)1461–1473.

[7] M. Flierl, A. Mavlankar, B. Girod, Motion and disparity compensated coding, formultiview video, IEEE Trans. Circuits Syst. Video Technol. 17 (2007) 1474–1484.

[8] S.-W. Wu, A. Gersho, Joint estimation of forward and backward motion vectorsfor interpolative prediction of video, IEEE Trans. Image Process. 3 (1994) 684–687.

[9] R.S. Wang, Y. Wang, Multiview video sequence analysis, compression, andvirtual viewpoint synthesis, IEEE Trans. Circuits Syst. Video Technol. 10 (2000)397–410.

[10] L. Shen, Z. Liu, T. Yan, Z. Zhang, P. An, View-adaptive motion estimation anddisparity estimation for low complexity multiview video coding, IEEE Trans.Circuits Syst. Video Technol. 20 (2010) 925–930.

[11] L. Shen, Z. Liu, S. Liu, Z. Zhang, P. An, Selective disparity estimation and variablesize motion estimation based on motion homogeneity for multi-view coding,IEEE Trans. Broadcast. 55 (2009) 761–766.

[12] J. Huo, Y. Chang, M. Li, Y. Ma, Scalable prediction structure for multiview videocoding, in: IEEE International Symposium on Circuits and Systems (ISCAS),Taipei, Taiwan 2009, pp. 2593–2596.

[13] J.-P. Lin, A.C.-W. Tang, A fast direction predictor of inter frame prediction formulti-view video coding, in: IEEE International Symposium on Circuits andSystems (ISCAS), Taipei, Taiwan 2009, pp. 2589–2592.

[14] X. Xu, Y. He, Fast disparity motion estimation in MVC based on rangeprediction, in: 15th IEEE International Conference on Image Processing (ICIP),San Diego, USA, 2008, pp. 2000–2003.

[15] W. Zhu, X. Tian, F. Zhou, Y. Chen, Fast disparity estimation using spatio-temporal correlation of disparity field for multiview video coding, IEEE Trans.Consum. Electron. 56 (2010) 957–964.

[16] P. Lai, A. Ortega, Predictive fast motion/disparity search for multiview videocoding, in: Proceedings of SPIE, Visual Communications and Image Processing(VCIP), San Jose, USA, 2006, pp. 6077091–60770912.

[17] L.F. Ding, P.K. Tsung, S.Y. Chien, W.Y. Chen, L.G. Chen, Content-awareprediction algorithm with inter-view mode decision for multiview videocoding, IEEE Trans. Multimedia 10 (2008) 1553–1564.

[18] J.B. Lu, H. Cai, J.G. Lou, J. Li, An epipolar geometry-based fast disparityestimation algorithm for multiview image and video coding, IEEE Trans.Circuits Syst. Video Technol. 17 (2007) 737–750.

[19] X. Li, D. Zhao, S. Ma, W. Gao, Fast disparity and motion estimation based oncorrelations for multiview video coding, IEEE Trans. Consum. Electron. 54(2008) 2037–2044.

[20] Y. Kim, J. Kim, K. Sohn, Fast disparity and motion estimation for multi-viewvideo coding, IEEE Trans. Consum. Electron. 53 (2007) 712–719.

[21] I. Daribo, W. Mileda, B. Pesquet-Popescua, Joint depth-motion denseestimation for multiview video coding, J. Vis. Commun. Image Represent. 21(2010) 487–497.

[22] Y. Kim, J. Lee, C. Park, K. Sohn, MPEG-4 compatible stereoscopic sequencecodec for stereo broadcasting, IEEE Trans. Consum. Electron. 51 (2005) 1227–1236.

[23] Z.-P. Deng, Y.-L. Chan, K.-B. Jia, C.-H. Fu, W.-C. Siu, Fast iterative motion anddisparity estimation algorithm for multiview video coding, in: InternationalConference on the True Vision – Capture, Transmission and Display of 3DVideo (3DTV), Tampere, Finland, 2010.

[24] Y.S. Ho, K.J. Oh, Overview of multi-view video coding, in: 14th InternationalWorkshop on Systems, Signals and Image Processing (IWSSIP 2007), Maribor,Slovenia, 2007, pp. 479–486.

[25] H.-C. Lin, H.-M. Hang, Fast algorithm on selecting bi-directional predictiontype in H.264/AVC scalable video coding, in: International Conference onAcoustics, Speech, and Signal Processing (ISCAS), Paris, France, 2010, pp. 113–116.

[26] Y. Chan, W. Siu, An efficient search strategy for block motion estimation usingimage features, IEEE Trans. Image Process. 10 (2001) 1223–1238.

[27] S.-H. Lee, S.H. Lee, N.I. Cho, J.-H. Yang, Disparity vector prediction methods inMVC, JVT-U040, Hangzhou, China, October, 2006.

[28] H. Yang, J. Huo, Y. Chang, S. Lin, P. Zeng, L. Xiong, Regional Disparity BasedMotion and Disparity Prediction for MVC, JVT-V071, Marrakech, Morocco,January, 2007.

[29] A. Vetro, M. McGuire, W. Matusik, A. Behrens, J. Lee, H. Pfister, Multiview VideoTest Sequences from MERL, MPEG M12077, Busan, Korea, April, 2005.

[30] Y. Su, A. Vetro, A. Smolic, Common test conditions for multiview video coding,JVT-U211, Hangzhou, China, July, 2006.

[31] G. Bjontegaard, Calculation of average PSNR differences between RD-curves,VCEG-M33, Austin, USA, April, 2001.