ieee transactions on broadcasting, vol. 58, no. 1, … › wp-content › uploads › 2019 › 05...

10
IEEE TRANSACTIONS ON BROADCASTING, VOL. 58, NO. 1, MARCH 2012 47 A VBR Video Encoding for Locally Consistent Picture Quality With Small Buffering Delay Under Limited Bandwidth Hoonjae Lee and Sanghoon Sull, Member, IEEE Abstract—During consecutive large picture information changes under limited bandwidth, the use of constant bit rate (CBR) en- coding for high definition television (HDTV) broadcasting systems often causes serious picture quality degradation due to the tight constraint on small buffering delay, whereas a variable allocation of bits depending on the picture information changes results in large buffering delay. In this paper, we propose a frame-layer vari- able bit rate (VBR) video encoding method for maintaining locally consistent picture quality of an encoded video with small buffering delay, which can be used for a video on demand (VOD) system which streams an encoded video file to a client device under limited bandwidth or a real-time streaming application allowing a small delay. An objective function defined as a sum of the local distortion variation of the encoded pictures and the underflow levels at the de- coder buffer is minimized with respect to the discrete frame-layer quantization step sizes subject to a constraint on the target number of bits within a temporal sliding window. Then, the constrained dis- crete minimization problem is converted to an unconstrained con- tinuous minimization problem by introducing a Lagrange multi- plier and is solved by applying the first-order necessary condition for local minima with respect to the quantization step sizes. Exper- imental results demonstrate that the proposed method provides a lower local standard deviation of PSNR comparing with the recent VBR method by 29.2% on average with small buffering delay. Index Terms—Buffering delay, consistent picture quality, rate control, variable bit rate (VBR), video encoding. I. INTRODUCTION D IGITAL television broadcasting standards impose a tight constraint on small buffering delay under limited band- width. For example, the ATSC (Advanced Television systems Committee) HDTV standard [1] limits the buffering delay to 0.5 seconds under 19.39 Mbps. For this reason, HDTV broadcasting systems use a CBR encoding, but the problem is that serious pic- ture quality degradations often appear during consecutive large picture information changes due to such as abrupt brightness change, fast motion and scene change. To maintain consistent picture quality of an encoded video without such degradation, bits can be allocated variably during encoding, depending on Manuscript received February 01, 2011; revised May 19, 2011; accepted July 27, 2011. Date of publication September 22, 2011; date of current version Feb- ruary 23, 2012. This research was supported by Basic Science Research Pro- gram through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (No. 2010-0028117). The authors are with the School of Electrical Engineering, Korea Univer- sity, Sungbuk-ku, Seoul 136-701, Korea (e-mail: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TBC.2011.2164308 the picture information changes. For example, an encoder used in a VOD system that streams a pre-encoded video under lim- ited bandwidth can variably allocate bits for consistent picture quality, but a long buffering delay could occur. A typical VBR encoding method which variably allocates bits depending on the picture information changes is basically inter- ested in maintaining consistent picture quality subject to a target number of bits for the digital video storage applications such as digital versatile disk (DVD) or blu-ray disk. Jagmohan et al. [2] proposed a VBR encoding method for MPEG-4 [3] under the as- sumption that adjusting the resolution of a frame cause less dis- tortion than adjusting the quantization parameter, and Kamaci et al. [4] proposed a VBR encoding method which estimates the distortion for an encoded picture by analyzing the distribu- tion of the coefficients obtained by applying the discrete cosine transform. Bagni et al. [5] proposed a VBR encoding approach which adjusts the reference frame-layer quantization parameter based on a current frame, and Wang et al. [6] proposed a VBR encoding which limits the distortion of an encoded frame in a given range. These typical VBR encoding methods can maintain consistent picture quality, but they allocate bits without much considering the buffering delay or limited bandwidth and thus a long buffering delay could occur when the encoded video is delivered under limited bandwidth. On the other hands, the use of the CBR encoding and the buffer-constrained VBR yields smaller or negligible buffering delay when an encoded video is delivered under limited band- width. Xie et al. [7] proposed a CBR encoding method which adjusts the quantization parameter of a current frame to pre- vent the underflow and overflow of the decoder buffer when the buffer level is close to zero and the maximum buffer level, re- spectively. Rezaei et al. [8] proposed a VBR encoding which determines the quantization parameter of a current frame by adding or subtracting several control values from the quanti- zation parameter of the previous frame to maintain consistent picture quality of the encoded video while preventing the un- derflow and overflow of the decoder buffer given a buffer size and bandwidth. The buffer-constrained VBR encoding methods proposed in [9]–[11] determine the quantization parameter of a current frame by estimating the distortion of the encoded cur- rent frame and the number of bits to encode the current frame while preventing the underflow and overflow. Although these methods attempt to maintain consistent picture quality, there is an inherent limitation on the reduction of the abrupt pic- ture quality fluctuation during consecutive large picture infor- mation changes since they determine the quantization param- 0018-9316/$26.00 © 2011 IEEE

Upload: others

Post on 04-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IEEE TRANSACTIONS ON BROADCASTING, VOL. 58, NO. 1, … › wp-content › uploads › 2019 › 05 › A-VBR-Vide… · crete minimization problem is converted to an unconstrained

IEEE TRANSACTIONS ON BROADCASTING, VOL. 58, NO. 1, MARCH 2012 47

A VBR Video Encoding for Locally ConsistentPicture Quality With Small Buffering Delay

Under Limited BandwidthHoonjae Lee and Sanghoon Sull, Member, IEEE

Abstract—During consecutive large picture information changesunder limited bandwidth, the use of constant bit rate (CBR) en-coding for high definition television (HDTV) broadcasting systemsoften causes serious picture quality degradation due to the tightconstraint on small buffering delay, whereas a variable allocationof bits depending on the picture information changes results inlarge buffering delay. In this paper, we propose a frame-layer vari-able bit rate (VBR) video encoding method for maintaining locallyconsistent picture quality of an encoded video with small bufferingdelay, which can be used for a video on demand (VOD) systemwhich streams an encoded video file to a client device under limitedbandwidth or a real-time streaming application allowing a smalldelay. An objective function defined as a sum of the local distortionvariation of the encoded pictures and the underflow levels at the de-coder buffer is minimized with respect to the discrete frame-layerquantization step sizes subject to a constraint on the target numberof bits within a temporal sliding window. Then, the constrained dis-crete minimization problem is converted to an unconstrained con-tinuous minimization problem by introducing a Lagrange multi-plier and is solved by applying the first-order necessary conditionfor local minima with respect to the quantization step sizes. Exper-imental results demonstrate that the proposed method provides alower local standard deviation of PSNR comparing with the recentVBR method by 29.2% on average with small buffering delay.

Index Terms—Buffering delay, consistent picture quality, ratecontrol, variable bit rate (VBR), video encoding.

I. INTRODUCTION

D IGITAL television broadcasting standards impose a tightconstraint on small buffering delay under limited band-

width. For example, the ATSC (Advanced Television systemsCommittee) HDTV standard [1] limits the buffering delay to 0.5seconds under 19.39 Mbps. For this reason, HDTV broadcastingsystems use a CBR encoding, but the problem is that serious pic-ture quality degradations often appear during consecutive largepicture information changes due to such as abrupt brightnesschange, fast motion and scene change. To maintain consistentpicture quality of an encoded video without such degradation,bits can be allocated variably during encoding, depending on

Manuscript received February 01, 2011; revised May 19, 2011; accepted July27, 2011. Date of publication September 22, 2011; date of current version Feb-ruary 23, 2012. This research was supported by Basic Science Research Pro-gram through the National Research Foundation of Korea (NRF) funded by theMinistry of Education, Science and Technology (No. 2010-0028117).

The authors are with the School of Electrical Engineering, Korea Univer-sity, Sungbuk-ku, Seoul 136-701, Korea (e-mail: [email protected];[email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TBC.2011.2164308

the picture information changes. For example, an encoder usedin a VOD system that streams a pre-encoded video under lim-ited bandwidth can variably allocate bits for consistent picturequality, but a long buffering delay could occur.

A typical VBR encoding method which variably allocates bitsdepending on the picture information changes is basically inter-ested in maintaining consistent picture quality subject to a targetnumber of bits for the digital video storage applications such asdigital versatile disk (DVD) or blu-ray disk. Jagmohan et al. [2]proposed a VBR encoding method for MPEG-4 [3] under the as-sumption that adjusting the resolution of a frame cause less dis-tortion than adjusting the quantization parameter, and Kamaciet al. [4] proposed a VBR encoding method which estimatesthe distortion for an encoded picture by analyzing the distribu-tion of the coefficients obtained by applying the discrete cosinetransform. Bagni et al. [5] proposed a VBR encoding approachwhich adjusts the reference frame-layer quantization parameterbased on a current frame, and Wang et al. [6] proposed a VBRencoding which limits the distortion of an encoded frame in agiven range. These typical VBR encoding methods can maintainconsistent picture quality, but they allocate bits without muchconsidering the buffering delay or limited bandwidth and thusa long buffering delay could occur when the encoded video isdelivered under limited bandwidth.

On the other hands, the use of the CBR encoding and thebuffer-constrained VBR yields smaller or negligible bufferingdelay when an encoded video is delivered under limited band-width. Xie et al. [7] proposed a CBR encoding method whichadjusts the quantization parameter of a current frame to pre-vent the underflow and overflow of the decoder buffer when thebuffer level is close to zero and the maximum buffer level, re-spectively. Rezaei et al. [8] proposed a VBR encoding whichdetermines the quantization parameter of a current frame byadding or subtracting several control values from the quanti-zation parameter of the previous frame to maintain consistentpicture quality of the encoded video while preventing the un-derflow and overflow of the decoder buffer given a buffer sizeand bandwidth. The buffer-constrained VBR encoding methodsproposed in [9]–[11] determine the quantization parameter of acurrent frame by estimating the distortion of the encoded cur-rent frame and the number of bits to encode the current framewhile preventing the underflow and overflow. Although thesemethods attempt to maintain consistent picture quality, thereis an inherent limitation on the reduction of the abrupt pic-ture quality fluctuation during consecutive large picture infor-mation changes since they determine the quantization param-

0018-9316/$26.00 © 2011 IEEE

Page 2: IEEE TRANSACTIONS ON BROADCASTING, VOL. 58, NO. 1, … › wp-content › uploads › 2019 › 05 › A-VBR-Vide… · crete minimization problem is converted to an unconstrained

48 IEEE TRANSACTIONS ON BROADCASTING, VOL. 58, NO. 1, MARCH 2012

eter for a single current frame without considering its neigh-boring frames. Such quality fluctuation in short duration be-comes more annoying to viewers especially when a high-res-olution video is presented on a HDTV. The buffer-constrainedVBR encoding methods proposed in [12] and [13] reduce thedistortion variation of the encoded frames for consistent pic-ture quality of the encoded video. However, the performanceof those methods could be limited, since the encoding methodin [12] minimizes only the distortion difference between the en-coded current frame and the encoded past frame, and the en-coding method in [13] reduces the distortion variation by esti-mating the rate-distortion for the inter-frames without consid-ering the picture information changes. Several two-pass VBRencoding methods [14]–[18] analyzed the entire frames of avideo to maintain consistent picture quality before encoding, butdue to its two-pass encodings, these methods cannot be used forreal-time streaming applications even though a small delay isallowed.

In this paper, we propose a VBR video encoding method thatmaintains locally consistent picture quality of an encoded videowith small buffering delay under limited bandwidth. The pro-posed VBR encoding method can be used to encode videos fora VOD system which streams an encoded video file to a clientdevice under limited bandwidth or a real-time streaming ap-plication allowing a small delay. Our idea is to allocate morebits to the frames with the consecutive large picture informa-tion changes while assigning less bits to other frames with lesspicture information changes within a temporal sliding window.Thus, the quantization step size for a current frame is determinedby considering the current frame and its neighboring frameswhile reducing the decoder buffer underflows. We formulate ourencoding method as a constrained minimization problem andminimize an objective function which is defined as a sum of thelocal distortion variation of the encoded pictures and the bufferunderflow levels subject to a constraint on the target numberof bits within the temporal sliding window centered at a cur-rent frame. Since our formulation considers the temporal slidingwindow consisting of a given number of past, current and fu-ture frames, the encoding delay corresponding to the number ofthe future frames to be considered to encode the current frameoccurs although the encoding delay does not appear during de-coding.

In the constrained minimization formulation, the distortionof the encoded pictures and the number of bits needed to en-code pictures are modeled as simple functions of the frame-layer quantization step sizes for the current and future frameswithin the temporal sliding window, and we search for a set ofthe optimal quantization step sizes for the current and futureframes that minimize the objective function while satisfying theconstraint on the target number of bits where the actual valuesof the quantization step sizes, the distortion and the numberof bits to encode the past frames obtained from the previoustemporal sliding windows are used as the values for the pastframes during the minimization. To accomplish this, we con-vert the constrained minimization problem to an unconstrainedminimization problem by introducing a Lagrange multiplier,and solve it by applying the first-order necessary condition forlocal minima with respect to the frame-layer quantization step

sizes. After solving the unconstrained minimization, the part ofthe minimizing solution corresponding to the current frame isselected as a quantization step size for the current frame andthe temporal sliding window moves forward by one frame todetermine the quantization step size for the next frame. Notethat, even though we minimize the objective function which in-cludes only the local distortion variation and the decoder bufferunderflow levels without considering the distortion itself, theoverall picture quality degradation due to allocating insufficientnumber of bits does not occur because a set of quantization stepsizes is selected satisfying the constraint on the target numberof bits within the temporal sliding window. Experimental re-sults demonstrate that the proposed method provides a lowerlocal standard deviation of PSNR comparing to the buffer-con-strained VBR method [8] by 29.2% on average at the expenseof a small increase of buffering delay.

The paper is organized as follows. In Section II, we describea VBR encoding method for maintaining locally consistentpicture quality of the encoded video with small buffering delayunder limited bandwidth. In Section III, we demonstrate theperformance of the proposed method through experiments.Section IV concludes the paper.

II. PROPOSED VBR ENCODING METHOD

In this section, we describe the proposed VBR encodingmethod. We first describe our mathematical formulation tomaintain locally consistent picture quality with small bufferingdelay based on a frame-layer rate control scheme. Then, wepresent a rate-quantization (R-Q) model and a distortion- quan-tization (D-Q) model to estimate the rate and distortion of anencoded frame for a given quantization step size. Finally, wedescribe how to determine the quantization step size for eachframe based on our mathematical formulation, R-Q and D-Qmodels.

A. Mathematical Formulation

The goal of the proposed VBR encoding is to encode framesin a video so that the encoded frames have locally consistentpicture quality with small buffering delay under limited band-width, assuming that the encoded bits are delivered to a decoderfrom a VOD server or a real-time streaming server in a constantbit-rate. An objective function to be minimized is defined as asum of the local distortion variation of the encoded pictures andthe underflow levels at the decoder buffer subject to a constrainton the target number of bits within a temporal sliding window.

For a temporal sliding window centered at a current framewith the length of frames, the proposed encoding is formu-lated as a constrained minimization problem with respect to the

variables, which denote quantizationstep sizes corresponding to the current and future frames,respectively:

Page 3: IEEE TRANSACTIONS ON BROADCASTING, VOL. 58, NO. 1, … › wp-content › uploads › 2019 › 05 › A-VBR-Vide… · crete minimization problem is converted to an unconstrained

LEE AND SULL: VBR VIDEO ENCODING FOR LOCALLY CONSISTENT PICTURE QUALITY 49

subject to (1)

where that are the actual values of the quanti-zation step sizes used to encode the past frames are consideredas constants, represents the distortion resulting from en-coding with , is the number of bits to encodewith , is the average distortion for theencoded frames, and is a weight value. The constantdenotes the local target number of bits to encode the frames inthe temporal sliding window and calculated as follows:

(2)

where and denote a given target bit-rate and frame-rate, respectively.

The number of bits residing in the decoder buffer, ,after the encoded is decoded is computed by adding

(the number of bits of the encoded video datathat the decoder receives during the time interval of )to and subtracting the number of bits needed toencode from it as follows:

(3)

Defining the decoder buffer level as the value ofnormalized by , we have

(4)

The value of can be interpreted as the time required toreceive the amount of the encoded video data residing in the de-coder buffer. Thus, the negative value of implies that thedecoder should wait for seconds to receive the amount ofthe encoded video data. In our experiments, the typical valuesof range from 2 to 2 seconds. Since only the negativebuffer level corresponding to the decoder buffer underflow is tobe minimized, the sigmoid function is used to make theobjective function smooth during the minimization.

To find a solution of (1), a discrete space which includesall combinations of the discrete frame quantization step sizeswithin the temporal sliding window could be searched, but thesearch space is too huge since there exist possible setsof quantization step sizes if denotes the number of theavailable quantization step sizes defined in the video codingstandard such as MPEG-2/4 [3], [19] and H.264/AVC [20]. Inthis paper, the discrete minimization problem (1) is convertedto a continuous minimization problem by assuming the quan-tization step size for each frame is continuous. Then, the con-strained minimization problem is converted to an unconstrainedminimization problem by introducing a Lagrange multiplier forthe constraint on the target number of bits within the temporalsliding window, and is solved by applying the first-order nec-essary conditions for local minima with respect to the set ofquantization step sizes. Then, only the quantization step sizecorresponding to the current frame from the resulting solutionset is selected and used to actually encode the current frame .The temporal sliding window moves forward by one frame to

Fig. 1. Sigmoid function. (a) � � ��. (b) � � �.

determine the quantization step size for the next current frame.It is noted that our formulation maintains the overall picturequality although only the sum of the local distortion variationand the buffer underflow levels is minimized without includingthe term for the average distortion since a set of quantizationstep sizes should satisfy the constrainton the target number of bits through s during the mini-mization.

The underflow at the decoder buffer means that no video dataremains in the decoder buffer, causing a buffering delay in orderto receive more video data. Even though the buffer level cannothave a negative value physically, we can interpret the negativevalue of buffer level as the buffering delay as described previ-ously. Thus, the buffering delay for is obtained as follows:

(5)

Since only the levels of the underflows corresponding tothe negative values of is to be minimized to reduce thebuffering delay, the sigmoid function of is minimizedwith respect to the quantization step sizes to make the objectivefunction smooth. The well-known sigmoid function andthe first derivative of the sigmoid function are expressed asfollows:

(6)

where is a constant that determines the steepness of the sig-moid function. The sigmoid function is plotted in Fig. 1. Thesigmoid function acts like a unit-step function, but the sigmoidfunction is differentiable whereas the unit-step function is not.When the buffer level is input to the sigmoid functionwith a negative value of , the value of is close tozero if is positive, and the value of is close toone if is negative. Therefore, if we minimize the sum of

Page 4: IEEE TRANSACTIONS ON BROADCASTING, VOL. 58, NO. 1, … › wp-content › uploads › 2019 › 05 › A-VBR-Vide… · crete minimization problem is converted to an unconstrained

50 IEEE TRANSACTIONS ON BROADCASTING, VOL. 58, NO. 1, MARCH 2012

Fig. 2. Relation between a temporal sliding window and the encoding delay.

the sigmoid functions of s with a negative , the levels ofthe underflows can be minimized.

Note that the decoder buffer level can rise to a higher levelif the numbers of bits smaller than are usedto encode consecutive frames in (3) and (4). However, theaverage number of bits allocated to encode a frame becomes

with the proposed method since the target numberof bits within the temporal sliding window corresponding to

frames is constrained to in (1) and(2). Therefore, the decoder buffer level does not rise to a higherlevel even though we minimize the objective function whichincludes only the sum of the local distortion variation andthe decoder buffer underflow levels without considering theoverflow. In our experiments, we observed that the decoderbuffer level stayed under which is acceptable for clientdevices.

The proposed method requires future frames within atemporal sliding window to encode a current frame, resultingin the encoding delay in real-time encoding applications. Fig. 2illustrates the encoding delay related to a temporal slidingwindow. To determine the quantization step size for the currentframe , future frames within the temporal slidingwindow need to be considered and thus is encoded at

. Note that the encoding delay does not need to be takeninto account in our application because the encoding delay doesnot appear during decoding.

B. Models for Rate-Quantization and Distortion-Quantization

The actual values of and in (1) can be ob-tained only after encoding with , but the actual encodingof a frame to obtain and for a large number ofvalues of during the minimization is impractical becauseof high computational complexity. Therefore, we need to es-timate the values of and using the appropriaterate-quantization (R-Q) model and the distortion-quantization(D-Q) model, respectively. In this paper, we use a simple R-Qmodel and a D-Q model which are experimentally found out toyield satisfactory results to estimate the values of and

for current and future frames in a temporal window.1) Rate-Quantization Model: The R-Q models proposed in

[21], [22] use defined as the mean absolute differenceof the original image and its reconstructed image based onthe observation that more bits are needed to encode more com-

plex frames. Since is available only after is en-coded, it is estimated from in the conventional en-coding schemes under the restrictive assumption that consec-utive frames are similar [21]–[27]. However, the assumptionoften does not hold when there are large picture informationchanges.

In this paper, we use a modified linear R-Q model by using thesquare root of the sum of absolute difference (SAD) between theframe and its reference frame as the measure for the pictureinformation change during inter-frame coding of . Assumingone reference frame for simplicity where is the distancebetween and , the picture information change at issimply estimated as

(7)

where is the pixel value in the original image , and isthe number of pixels in . It is experimentally found out that

well reflects the large picture information changes whenis equal to one for the IPPP structure. Thus, the modified linearR-Q model for inter-frame using is given as follows:

(8)

where is the model parameter to be determined byusing the actual values of , whichare the picture information changes for recentinter-frames , and the actual values of

andthat were used to encode . Denoting

andby

and , respectively, and applying the least square method to (8)for recent inter-frames in a similar way in [22], we arrive at

(9)

Letting and be the -th element of and , respec-tively, is calculated with the following equation:

(10)

Then, is estimated from (8).If the frame is an intra-frame, is calculated based on

and which arethe actual values used to encode recent intra-frames. How-ever, the picture information changes corre-sponding to recent intra-frames and the current frame doesnot have to be taken into account because the number of bits toencode an intra-frame does not depend on other frames. Thus,the modified linear R-Q model for the intra-frame is given asfollows:

(11)

Denoting andby and ,

Page 5: IEEE TRANSACTIONS ON BROADCASTING, VOL. 58, NO. 1, … › wp-content › uploads › 2019 › 05 › A-VBR-Vide… · crete minimization problem is converted to an unconstrained

LEE AND SULL: VBR VIDEO ENCODING FOR LOCALLY CONSISTENT PICTURE QUALITY 51

respectively, is calculated by applying the least squaremethod to (11) for recent intra-frames and using (9) and(10). Then, is estimated from (11).

Note that the numerous macroblocks in the inter-frame couldbe encoded with the intra coding modes if there are large pictureinformation changes such as abrupt scene changes. However,it is common to allocate the smaller number of bits to encodethe inter-frames than the intra-frames. Therefore, we can usetwo simple R-Q models in (8) and (11) for the inter-frames andintra-frames, respectively, even in such special cases, to allocatethe adequate number of bits according to the frame type.

2) Distortion-Quantization Model: The relation betweenand is modeled as a quadratic function in [28] as

. By assuming vary within a narrowrange over the consecutive frames in the temporal slidingwindow to maintain locally consistent picture quality of theencoded video, the quadratic function can be locally approxi-mated as the linear function as follows:

(12)

where is a model parameter which is calculated by usingthe distortions of recent frames,

, and the quantization step sizes used to encoderecent frames, , as follows:

(13)

Note that since the temporal sliding window contains the pastframes which were already encoded, the estimated values ofthe distortion and rate are used only for the current and futureframes based on (8), (11) and (12) while the actual values of thedistortion and rate are used for the past frames.

To check the validity of the R-Q and D-Q models we used, theestimated and actual values of and when the 1800frames from three HD video sequences are encoded with varioustarget bit-rate by using the proposed R-Q model are plotted inFig. 3(a) and (b), respectively, and the mismatch (computed bysubtracting the actual value from the estimated value) for thenumber of bits and the distortion are plotted in Fig. 3(c) and (d),respectively. In Fig. 3(c), the mismatch becomes larger than

bits for the 13 inter-frames noted in the dashed box whenthe picture information change is large and the quantizationstep size is small in (8) under the target bit-rate of ,resulting in the larger estimated number of bits compared withthe actual number of bits. In this case, the actual numbers ofbits to encode all frames within the temporal sliding windowbecome smaller than the local target number of bits in(2). Therefore, the overall visual quality of the encoded framescould be slightly reduced, and the decoder buffer level couldincrease, implying that such relatively large mismatch does notresult in the underflow at the decoder. Note that the mismatch forthe number of bits to encode the intra-frames estimated by using(11) is smaller compared with that of the inter-frames where(8) used. In most cases, we can see that the values of rate anddistortion are well estimated by using our R-Q model and D-Qmodel so that the models could be used to estimated the rate

Fig. 3. Performance of our R-Q and D-Q models for 1800 frames from threeHD video sequences: (a) Actual and estimated numbers of bits. (b) Actual andestimated distortions. (c) Mismatches between the actual and estimated numbersof bits. (d) Mismatches between the actual and estimated distortions.

and distortion of current and future frames in a temporal slidingwindow.

C. Determination of Quantization Step Size

Rewriting (1) by using (2), (8) and (12), we obtain thefollowing:

(14)

To solve (14), the constrained minimization is converted toan unconstrained minimization by using a Lagrange multiplieras follows:

with

(15)

Page 6: IEEE TRANSACTIONS ON BROADCASTING, VOL. 58, NO. 1, … › wp-content › uploads › 2019 › 05 › A-VBR-Vide… · crete minimization problem is converted to an unconstrained

52 IEEE TRANSACTIONS ON BROADCASTING, VOL. 58, NO. 1, MARCH 2012

where is the cost function, and is theLagrange multiplier. We solve (15) by applying the first-ordernecessary conditions for local minima:

...

(16)

Then, to obtain the solution of (14), Newton’s method is ap-plied to find the roots of (16): We first rewrite (16) as follows:

where

(17)

To find the root of (17), we update iteratively so thatconverges to zero by obtaining the updating value from thefollowing equation for each step:

(18)

Equation (18) can be expanded by using Taylor series asfollows:

(19)

where denotes the high-order remainder, and is the Jacobianmatrix where the element of is defined as

(20)

From (18) and (19), we have the following linear equation byneglecting the high-order remainder :

(21)

Then, we can obtain by solving (21) and is updatedto . If we update iteratively until converges tozero, we can find the solution. By using the average valuesof the previous quantization step sizes as the initial values of

s, Newton’s iteration method generally converged well in ourexperiments.

The part of the solution of (14) corresponding to the currentframe is selected and converted to the nearest quantizationparameter, and used to encode the current frame. Then, the tem-poral sliding window moves forward by one frame and the quan-tization step size for the next frame is calculated in a similarway.

III. EXPERIMENTAL RESULTS

We implemented the proposed method to evaluate its perfor-mance by using the H.264/AVC reference software (JM13.2)on a PC with an Intel Core 2 Duo 2.4GHz CPU and 2GB RAM.For comparison, we also implemented the buffer-constrainedVBR method [8]. Experiments are performed on several HD(1920 1080) video sequences. Five video sequences (HDMusic Show 1, , HD Music Show 5) were generated froma terrestrial HDTV music show program containing the latestmost popular dance music and songs which have consecutivelarge picture information changes such as abrupt brightnesschange, fast motion and scene change. In addition, three HDsequences (Crow run, Duck Take Off, and Park Joy) are alsoused in the experiments. The original raw video sequences areencoded with the proposed method, the existing VBR method[8], and the H.264/AVC reference software with a constantquantization parameter, respectively. The video sequences areencoded with the frame rate of 30 fps and IPPP structure withGOP length of 15 frames. The target bit-rate for each video se-quence is set to the average bit-rate of the constant quantizationparameter (QP) encoding. For example, the target bit-rate forHD Music Show 1 is set to 6,394,395 bits/s.

In our experiments, the length of the temporal sliding windowis set to 60 frames corresponding to four GOPs, and in

(1) is set to by considering the range of the distortionand buffer level in (14). Newton’s method is implemented basedon the routine in [29] where the maximum iteration number isset to 30. The decoder buffer size of the existing VBR method[8] is set to the . When we simulate the situation thata pre-encoded video file is delivered from its beginning underlimited bandwidth, the initial buffer level is assumed to zero inour experiments.

Fig. 4 shows the comparison of the PSNRs and the decoderbuffer levels when HD Music Show 1 and HD Music Show 4 areencoded with the CBR encoding method [7], constant QP, ex-isting VBR method [8] and the proposed method, respectively,where the buffer size for the CBR encoding method is set tothe half of . Severe quality fluctuations in the encodedpicture appear with the CBR encoding method and the severeunderflow and large buffering delay occur with the constant QP

Page 7: IEEE TRANSACTIONS ON BROADCASTING, VOL. 58, NO. 1, … › wp-content › uploads › 2019 › 05 › A-VBR-Vide… · crete minimization problem is converted to an unconstrained

LEE AND SULL: VBR VIDEO ENCODING FOR LOCALLY CONSISTENT PICTURE QUALITY 53

Fig. 4. Comparison of the PSNRs and the buffer levels: (a) HD Music Show 1.(b) HD Music Show 4.

encoding especially when there are consecutive large picture in-formation changes. The existing VBR method [8] prevents theunderflow by abruptly increasing the quantization parameter fora current frame when the decoder buffer level is low and thus wecan see the large picture quality fluctuations both at the consec-utive large picture information changes and at the beginning ofthe video delivery where the buffer level is relatively low. In thecase of the proposed method, the picture quality of the encodedvideo is more locally consistent because the distortion variationis minimized within the temporal sliding window centered ata current frame. Moreover, the underflow levels at the decoderbuffer are also minimized, yielding the small buffering delay.Comparison of the encoding results for the 180st frame of theHD Music Show 1 is shown in Fig. 5. The CBR encoding method[7] results in the severe picture quality degradation and the ex-isting VBR method [8] shows some picture quality degradationwhereas the proposed method maintains the picture quality ofthe encoded video.

To compare the subjective visual quality of the encoded videofor the different rate control schemes, ten viewers evaluated the

Fig. 5. Comparison of the encoding result for the 180st frame of the HD MusicShow 1. (a) An original input image. (b) A magnified original input image cor-responding to the red box in (a). (c) Result of the CBR encoding [7] (PSNR:27.16dB). (d) Result of the existing VBR method [8] (PSNR: 33.76dB). (e) Re-sult of the proposed method (PSNR: 37.82dB).

TABLE ICOMPARISON OF THE SUBJECTIVE VISUAL QUALITY

visual quality of the HD Music Show 1 and HD Music Show 4encoded with the CBR encoding method [7], the existing VBRmethod [8], and the proposed method, respectively, while theencoded videos are displayed on a 24 inch HD monitor. Table Ishows the evaluation results where the score of 0 and 5.0 denotethe worst and the best visual quality, respectively, and we cansee that the proposed method gives the better visual subjectivequality of the encoded videos compared with the existing VBRmethod [8] and the existing CBR method [7] as the proposedmethod reduces the local standard deviation of PSNR of theencoded video.

Table II shows the performance comparison for the proposedmethod with the existing VBR method [8] and the constant QP

Page 8: IEEE TRANSACTIONS ON BROADCASTING, VOL. 58, NO. 1, … › wp-content › uploads › 2019 › 05 › A-VBR-Vide… · crete minimization problem is converted to an unconstrained

54 IEEE TRANSACTIONS ON BROADCASTING, VOL. 58, NO. 1, MARCH 2012

TABLE IIPERFORMANCE COMPARISON OF THE RATE CONTROL METHODS FOR VARIOUS

VIDEO SEQUENCES.

The local standard deviation is calculated within the sliding window of

length �� .

encoding. The proposed method provides a lower average localstandard deviation of the PSNR by 29.2% and maximum localstandard deviation of the PSNR by 35.1% in comparison withthe existing VBR method [8], while the average buffering delayis 0.14 second which is very small in comparison with the con-stant QP encoding. These results show that the proposed method

Fig. 6. Average local standard deviation of PSNR and the buffering delay whenthe HD Music show 1 is encoded with the various lengths of the temporal slidingwindow �� .

provides more locally consistent picture quality of the encodedvideo compared with existing VBR method [8] at the expenseof a small increase of the buffering delay. The constant QP en-coding usually gives good quality consistency for the encodedpictures, but the decoder buffer underflow cannot be consid-ered. Especially when the HD Music Show 3 is encoded withthe constant QP encoding method under the target bit-rate of

, the value of the number of bits residing inthe decoder buffer, , in (3) decreases to bits sinceit contains the severe consecutive picture information changes,resulting in the large buffering delay of 2.27 seconds.

The existing VBR method [8] results in small buffering delayat the expense of the large picture quality fluctuation because itassigns a large quantization parameter to the frame to preventthe underflow when the decoder buffer level is low, resulting inthe sudden picture quality degradation. On the other hands, theproposed method minimizes both the local distortion variationof the encoded pictures and the underflows of decoder buffers,and thus can achieve locally consistent picture quality of theencoded video with small buffering delay.

To observe the effect of choosing different value of the lengthof the temporal sliding window in (1), the HD Music Show 1is encoded with the various where is set to . Fig. 6shows the average local standard deviation of the PSNR and thebuffering delay. As increases, the average local standarddeviation of the PSNR slightly decreases, but the encoding timerapidly increases. In our experiments, is set to 60 frames foracceptable encoding time.

Fig. 7 shows the effect of the various weight values of in(1) on the average local standard deviation of PSNR and thebuffering delay for the HD Music Show 1 is encoded where thelength of the temporal sliding window is set to 60 frames.We can observe that the buffering delay becomes lower whilethe average local standard deviation of PSNR slightly increasesas the weight becomes larger. Therefore, the values of can beadjusted depending on various applications: for example, large

can be used if the buffering delay should be small. In our ex-periments, is set to . Note that, referring to the objec-tive function in (1), even though the size of the temporal slidingwindow is varied, the underflow level term (the second termin the objective function) remains near the local extrema so thatthe margin for the further improvement is small, because theunderflow level term is multiplied by a large weighting factor

Page 9: IEEE TRANSACTIONS ON BROADCASTING, VOL. 58, NO. 1, … › wp-content › uploads › 2019 › 05 › A-VBR-Vide… · crete minimization problem is converted to an unconstrained

LEE AND SULL: VBR VIDEO ENCODING FOR LOCALLY CONSISTENT PICTURE QUALITY 55

Fig. 7. Encoding results of the proposed method with various values of � forHD Music Show 1. (a) PSNR and decoder buffer level. (b) Average local stan-dard deviation of PSNR and buffering delay.

TABLE IIICOMPARISON OF THE AVERAGE ENCODING TIME (1920� 1080, 600 FRAMES)

in the proposed method. In the meantime, thelocal distortion variation term (the first term in the objectivefunction) decreases if is increased as can be seen in Fig. 6.Thus, the large weighting factor needs not be changed asis increased, but it should be increased as is decreased tomaintain the local distortion variation. For example, shouldbe increased by 3% when decreases from 60 to 30.

To compare the encoding time of each encoding method, wemeasure the average time to encode 600 frames of HD (19201080) video sequences. Table III shows that the encoding time ofthe proposed method is longer than the existing VBR method [8]by 14.6%, but this could not be problem for encoding videos forVOD applications. The encoding time of the CBR encoding [7]is larger than other methods because some frames are encodedmore than one.

IV. CONCLUSION

In this paper, we have proposed a VBR video encodingmethod to maintain locally consistent picture quality of theencoded video with small buffering delay under limited band-width. For locally consistent picture quality of the encodedvideo, the frame-layer bit allocation problem is formulated as aconstrained minimization problem so that an objective functionwhich is defined as a sum of the local distortion variation ofthe encoded pictures and the underflow levels at the decoderbuffer is minimized subject to a constraint on the local targetnumber of bits within a temporal sliding window. Experimentalresults demonstrate that the proposed method provides a lowerlocal standard deviation of PSNR by 29.2% comparing to theexisting buffer-constrained VBR method [8] at the expenseof a small increase of buffering delay. We plan to extend ourmethod to the macroblock level rate control by utilizing objectsegmentation and tracking, for example.

ACKNOWLEDGMENT

The authors would like to thank Korean BroadcastingSystem (KBS) for providing HD video sequences used in theexperiments.

REFERENCES

[1] Program and System Information Protocol for Terrestrial Broadcastand Cable (Revision B), ATSC Standard A/65B, Advanced TelevisionSystems Committee, 2003.

[2] A. Jagmohan and K. Ratakonda, “MPEG-4 One-pass VBR rate controlfor digital storage,” IEEE Trans. Circuits Syst. Video Technol., vol. 13,no. 5, pp. 447–452, May 2003.

[3] Coding of Audio-Visual Objects—Part 2: Visual, International Standard14496-2, ISO/IEC, 2004, ISO/IEC JTC1/SC29/WG11 (MPEG).

[4] N. Kamaci, Y. Altunbasak, and R. M. Mersereau, “Frame bit alloca-tion for the H.264/AVC video coder via cauchy-density-based rate anddistortion models,” IEEE Trans. Circuits Syst. Video Technol., vol. 15,no. 8, pp. 994–1006, Aug. 2005.

[5] D. Bagni, B. Biffi, and R. Ramalho, “A constant-quality, single-passVBR control for DVD recorders,” IEEE Trans. Consum. Electron., vol.49, no. 3, pp. 653–662, Aug. 2003.

[6] K. Wang and J. W. Woods, “MPEG motion picture coding withlong-term constraint on distortion variation,” IEEE Trans. CircuitsSyst. Video Technol., vol. 18, no. 3, pp. 294–304, Mar. 2008.

[7] B. Xie and W. Zeng, “A sequence-based rate control framework forconsistent quality real-time video,” IEEE Trans. Circuits Syst. VideoTechnol., vol. 16, no. 1, pp. 56–71, Jan. 2006.

[8] M. Rezaei, M. Hannuksela, and M. Gabbouj, “Semi-fuzzy rate con-troller for variable bit rate video,” IEEE Trans. Circuits Syst. VideoTechnol., vol. 18, no. 5, pp. 633–645, May 2008.

[9] Z. G. Li, L. Xiao, C. Zhu, and P. Feng, “A novel rate control schemefor video over the internet,” in IEEE Int. Conf. Acoust., Speech, SignalProcess., May 2002, vol. 2, pp. 2065–2068.

[10] S. Ma, W. Gao, and Y. Lu, “Rate-distortion analysis for H.264/AVCvideo coding and its application to rate control,” IEEE Trans. CircuitsSyst. Video Technol., vol. 15, no. 12, pp. 1533–1544, Dec. 2005.

[11] J. Y. Lee and H. W. Park, “A rate control algorithm for DCT-basedvideo coding using simple rate estimation and linear source model,”IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 9, pp.1077–1085, Sep. 2005.

[12] Z. Chen and K. N. Ngan, “Distortion variation minimization in real-time video coding,” Signal Process.: Image Commun., vol. 21, no. 4,pp. 273–279, Apr. 2006.

[13] L. J. Lin and A. Ortega, “Bit-rate control using piecewise approxi-mated rate-distortion characteristics,” IEEE Trans. Circuits Syst. VideoTechnol., vol. 8, no. 4, pp. 446–459, Aug. 1998.

[14] Y. Yu, J. Zhou, Y. Wang, and C. Chen, “A novel two-pass VBR codingalgorithm for fixed-size storage application,” IEEE Trans. Circuits Syst.Video Technol., vol. 11, no. 3, pp. 345–356, Mar. 2001.

Page 10: IEEE TRANSACTIONS ON BROADCASTING, VOL. 58, NO. 1, … › wp-content › uploads › 2019 › 05 › A-VBR-Vide… · crete minimization problem is converted to an unconstrained

56 IEEE TRANSACTIONS ON BROADCASTING, VOL. 58, NO. 1, MARCH 2012

[15] H. Yin, X. Fang, L. Chen, and J. Hou, “A practical consistent-qualitytwo-pass VBR video coding algorithm for digital storage application,”IEEE Trans. Consum. Electron., vol. 50, no. 4, pp. 1142–1150, Nov.2004.

[16] B. Han and B. Zhou, “VBR rate control for perceptually consistentvideo quality,” IEEE Trans. Consum. Electron., vol. 54, no. 4, pp.1912–1919, Nov. 2008.

[17] H. K. Arachchi and W. A. C. Fernando, “An intelligent rate controlalgorithm to improve the video quality at scene transitions for off-lineMPEG-1/2 encoders,” IEEE Trans. Consum. Electron., vol. 49, no. 1,pp. 210–219, Feb. 2003.

[18] D. Zhang, Z. Chen, and K. N. Ngan, “Constant distortion rate controlfor H.264/AVC high definition videos with scene change,” in IEEE Int.Symp. Circuits Syst., May 2008, pp. 3198–3501.

[19] Information Technology—Generic Coding of Moving Pictures and As-sociated Audio Information: Video, International Standard 13818-2,ISO/IEC, 2000, ISO/IEC JTC1/SC29/WG11 (MPEG).

[20] Coding of Audio-Visual Objects—Part 10: Advanced Video Coding, In-ternational Standard 14496-10, ISO/IEC, 2004, ISO/IEC JTC1/SC29/WG11 (MPEG).

[21] T. Chiang and Y. Q. Zhang, “A new rate control scheme using quadraticrate distortion model,” IEEE Trans. Circuits Syst. Video Technol., vol.7, no. 1, pp. 246–250, Feb. 1997.

[22] J. Dong and N. Ling, “On model parameter estimation for H.264/AVCrate control,” in IEEE Int. Symp. Circuits Syst., May 2007, vol. 1, pp.289–292.

[23] W. Yuan, S. Lin, Y. Zhang, W. Yuan, and H. Luo, “Optimum bit al-location and rate control for H.264/AVC,” IEEE Trans. Circuits Syst.Video Technol., vol. 16, no. 6, pp. 705–715, Jun. 2006.

[24] C. H. Lee, Y. H. Jung, S. J. Lee, Y. J. Oh, and J. S. Kim, “Real-timeframe-layer H.264 rate control for scene-transition video at low bitrate,” IEEE Trans. Consum. Electron., vol. 53, no. 3, pp. 1084–1092,Aug. 2007.

[25] Y. Liu, Z. G. Li, and Y. C. Soh, “A novel rate control scheme for lowdelay video communication of H.264/AVC standard,” IEEE Trans. Cir-cuits Syst. Video Technol., vol. 17, no. 1, pp. 68–78, Jan. 2007.

[26] C. Yu, J. Lu, X. Xie, and J. Zheng, “A novel content-based rate controlalgorithm for H.264,” in Proc. IEEE Int. Conf. Commun., Circuits Syst.,Jul. 2007, vol. 1, pp. 611–615.

[27] H. Xiong, J. Sun, S. Yu, J. Zhou, and C. Chen, “Rate control for real-time video network transmission on end-to-end rate-distortion and ap-plication-oriented QoS,” IEEE Trans. Broadcast., vol. 51, no. 1, pp.122–132, Mar. 2005.

[28] G. Sullivan and T. Wiegand, “Rate-distortion optimization for videocompression,” IEEE Signal Process. Mag., vol. 15, no. 6, pp. 74–90,Nov. 1998.

[29] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery,Numerical recipes in C: The Art of Scientific Computing, 2nd ed.Cambridge, U.K.: Cambridge University Press, 1992.

Hoonjae Lee received the B.S. degree in electronicengineering from the Korea University, Seoul, Korea,in 2006. He is currently working toward the M.S. andPh.D. joint degree in electrical engineering at KoreaUniversity.

His research interests are image and video signalprocessing, digital broadcasting, and other issues onimage and video technologies.

Sanghoon Sull (S’79–M’81) received the B.S.degree (with honors) in electronics engineering fromthe Seoul National University, Korea, in 1981, theM.S. degree in electrical engineering from the KoreaAdvanced Institute of Science and Technology in1983, and the Ph.D. degree in electrical and com-puter engineering from the University of Illinois,Urbana-Champaign, in 1993.

In 1983-1986, he was with the Korea BroadcastingSystems, working on the development of the teletextsystem. In 1994-1996, he conducted research on mo-

tion analysis at the NASA Ames Research Center. In 1996-1997, he conductedresearch on video indexing/browsing and was involved in the development ofthe IBM DB2 Video Extender at the IBM Almaden Research Center. He joinedthe School of Electrical Engineering at the Korea University as an Assistant Pro-fessor in 1997 and is currently a Professor. His current research interests includeimage/video processing, computer vision, video search, and their applicationsto 3D TV and smart phone. Dr. Sull is currently an Associate Editor for IEEETransactions on Circuit and System for Video Technology.