an evaluation of tcp-based rate-control algorithms for adaptive internet streaming … · 2019. 4....

An Evaluation of TCP-based Rate-Control Algorithms forAdaptive Internet Streaming of H.264/SVC

Robert Kuschnig, Ingo Kofler, Hermann HellwagnerInstitute of Information Technology (ITEC)

Klagenfurt University, Austria{firstname.lastname}@uni-klu.ac.at

ABSTRACTRecent work [18] indicates that multimedia streaming viaTCP provides satisfactory performance when the achievableTCP throughput is approx. twice the media bit rate. How-ever, these conditions may not be achievable in the Internet,e.g., when the delivery path offers insufficient bandwidthor becomes congested due to competing traffic. Therefore,adaptive streaming for videos over TCP is required and anumber of rate-control algorithms for video streaming havebeen proposed and evaluated in the literature.In this paper, we evaluate and compare three different rate-control algorithms for TCP in terms of the (PSNR) qualityof the delivered video and in terms of the timeliness of deliv-ery. The contribution of the paper is that, to the best of ourknowledge, this is the first evaluation of TCP-based stream-ing in an Internet-like setting making use of the scalabilityfeatures of H.264/SVC video. Two simple bandwidth esti-mation algorithms and a priority-/deadline-driven approachare described to adapt the bit rates of, and transmit, theH.264/SVC GOPs in a rate-distortion optimal manner. Theresults indicate that the three algorithms perform robustlyin terms of video quality and timely delivery, both on under-provisioned links and in case of competing TCP flows. Thepriority-/deadline-driven technique is even more stable interms of packet delays and jitter; thus, client buffers can bedimensioned more easily.

KeywordsH.264/SVC, TCP video streaming, adaptive video stream-ing, rate control, TCP fairness

1. INTRODUCTIONVideo streaming over the Internet based on the TCPtransport protocol became widespread in the recent past.The reason for this is the popularity of video portals likeYouTube or social networks that allow to share and dis-tribute user generated content [16]. Although, from a tech-nical point of view, the use of TCP for real-time multimedia

Figure 1: Use case

applications is considered controversial in the literature [10,5], there are good arguments for TCP. First, it offers a reli-able end-to-end transport mechanism that makes additionalprovisions like forward error correction unnecessary. Theconnection-oriented design of the protocol further allows foreasy traversal of firewalls and NAT devices that may ex-ist in the network [1]. As a consequence, the developmentand deployment of TCP-based streaming applications turnsout to be much less complex than approaches based on theconnectionless UDP protocol. Finally, TCP also offers con-gestion control that, at the same time, tries to maximizethe throughput while preventing the network from collaps-ing. This assures that only the fair share of the networkbandwidth is consumed by a single connection, which is con-sidered crucial for the stability of the Internet [4].

As the throughput of the TCP connection depends on both,the link capacity and the amount of congestion, the through-put can vary significantly over time. A typical use case isillustrated in Figure 1, where a home network consisting ofseveral computers or consumer electronic products is con-nected to the Internet via a DSL or cable-based access net-work. In this common deployment scenario, the capacity ofthe link and the congestion caused by parallel TCP connec-tions influence the available bandwidth for a video stream-ing application. Variations in the available bandwidth willresult in jerky playback and disruption of the video play-back if the throughput falls below the bit rate requirementof the video. Adaptive video streaming [1] tries to overcomethis, by adjusting the video bit rate to the available networkbandwidth. In adaptive TCP streaming, TCP’s adaptabil-ity to fluctuating network conditions is used to steer the ratecontrol. TCP-based video streaming and especially adaptivestreaming are extensively covered in the literature [17, 11, 8,6, 18, 15]. An approach which uses adaptive video transcod-ing based on the TCP transmission rate, is presented in[15]. Other approaches like the TCP-based priority/priority-progress streaming [17, 11] send the more important parts

of the video before the low-priority parts, in order to delivera rate-distortion optimal video.

Adaptive video streaming requires that the bit rate of thevideo can be adjusted on-the-fly on the server. However, bitrate adaptation based on transcoding or transrating comesalong with high computational cost and scalability issues[1]. An alternative is the use of a scalable video codingformat that offers simple video adaptation at the cost of amore sophisticated encoder/decoder design. Although scal-ability features were included in both standardized and pro-prietary codecs in the past, their success was rather lim-ited due to coding inefficiencies and decoding complexity.The Scalable Video Coding (SVC) extension [21] of the well-known H.264/AVC standard, however, aims at overcomingthe shortcomings of the past codecs. The performance andmerits of H.264/SVC have been successfully demonstratedin various use cases [22]. However, to the best of our knowl-edge, no evaluation w.r.t. adaptive Internet streaming basedon TCP and H.264/SVC has been conducted so far.

Therefore, we provide in this paper an in-depth evalua-tion of three different adaptive TCP-based video streamingapproaches that utilize scalability features of H.264/SVCvideos. The paper is organized as follows. Section 2briefly introduces the H.264/SVC video coding standard andits scalability features. An overview of TCP-based videostreaming, its challenges and existing approaches are pro-vided in Section 3. The use of H.264/SVC in combinationwith the adaptive streaming approaches and a detailed de-scription of the algorithms under evaluation are given in Sec-tion 4. Section 5 describes the evaluation setup and method-ology. The results of our extensive evaluation are presentedand discussed in Section 6. Section 7 concludes the paper.

2. H.264/SVCThe H.264/SVC video coding standard [21] was recentlystandardized as an extension to the well-known H.264/AVCstandard. In this context, a video stream is consideredscalable, if the stream can be adapted by the removal ofparts of the encoded bit stream. Since H.264/SVC extendsH.264/AVC, its design is also based on the distinction be-tween the Video Coding Layer (VCL) and a Network Ab-straction Layer (NAL). While the VCL deals with the com-pression of the video data, the NAL is used for organiz-ing the compressed video for storage and transmission [23].H.264/AVC introduced the concept of a NAL unit whichconsists of a one byte header and the payload data. NALunits can be classified into VCL NAL units, which containcoded video slices, and non-VCL NAL units which provideother decoder-relevant information like picture and sequenceparameter sets. The type of NAL unit is signaled in the onebyte header.

H.264/SVC introduces scalability in three dimensions. Tem-poral scalability is the property of a bit stream that videosequences with different frame rates can be extracted. Gen-erally, temporal scalability can be achieved by restrictingthe motion-compensated prediction to hierarchical predic-tion structures. Hierarchical B-pictures or non-dyadic struc-tures were already possible with coding tools standardizedin H.264/AVC. The only changes introduced by H.264/SVCare related to signaling the different temporal layers.

Spatial scalability, which allows the extraction of video se-quences with different spatial resolutions, is also realized in alayered fashion. In each spatial layer, motion-compensationprediction and intra-prediction can be employed. Addition-ally, SVC introduces inter-layer prediction which means thata picture at a higher spatial layer can be predicted by upsam-pling the signal from its corresponding lower layer picture.

For quality scalability, the H.264/SVC standard offers twoapproaches. Coarse-grain quality scalable coding (CGS) issimilar to spatial scalability, with the difference that thepictures at each layer have the same spatial dimensions.However, this concept comes along with a lower coding effi-ciency for small relative rate differences between successiveCGS layers. An efficient encoding of video bit streams thatcover a variety of bit rates can be realized by using medium-grain quality scalable coding (MGS). The MGS concept en-ables the removal of any NAL unit belonging to a qualityenhancement layer. This can be used to perform a finer-grained adaptation of the bit rate and with a higher flex-ibility. While CGS-based quality adaptation can only beswitched at defined points in the bit stream, MGS allows toswitch the quality and bit rate at every picture of the videostream. Furthermore, it is possible to partition the coeffi-cients of the transformation and distribute the informationamong several NAL units.

In order to signal the layer information in the bit stream,SVC extends the NAL unit concept of H.264/AVC [19].Newly introduced, SVC-specific NAL unit types requireda three-byte header extension. In the extended header, thelayer information is signaled by the fields quality id (QID),dependency id (DID), and temporal id (TID). In addition,the header contains a priority id (PRID) field which can beused to define a suggested adaptation path. This adapta-tion path specifies in which order the NAL units should bediscarded in case of adaptation. The assignment of the pri-ority id to NAL units is not further specified in the standardand can be allocated based on the needs of a certain appli-cation or use case. The Quality Level Assigner tool that isincluded in the JSVM [9] reference software, can be usedto assign priority values to NAL units contained in the bitstream. The assignment is done in a way that the extractionbased on the PRID is optimal w.r.t. rate-distortion.

3. TCP-BASED VIDEO STREAMINGVideo streaming based on TCP has become very popu-lar because of its easy deployment and congestion aware-ness. For these reasons, a lot of content providers useTCP for streaming multimedia content over the Internet[16, 10]. While TCP performs very well in networks withsmall round-trip times (RTTs), it becomes more prob-lematic in networks with high bandwidth-delay products,like the Internet. TCPs congestion control is based onthe additive-increase/multiplicative-decrease (AIMD) algo-rithm. The MD step reduces the TCP window (and there-fore the throughput) in case of packet loss drastically, beforethe AI step tries to increase the window once again. Thisleads to a notable variation of the throughput in networkswith large RTTs. An extensive analysis on TCP streamingwas conducted in [18]. The results of the analysis indicatethat streaming shows good results when the available band-width is twice the media bit rate. This kind of overprovi-

sioning is feasible for streaming media at low bit rates, buthigh-definition media demands significantly higher through-put. While a large number of last-mile networks offer down-link bandwidths greater than 4 Mbps [3], having twice thebit rate of an HD stream is rather an exception. In addition,TCP features no implicit error recovery on connection abortor stalls; this has to be handled on the application layer.New TCP implementations like TCP Cubic [7] are tacklingthese problems by introducing mechanisms for fast recov-ery on packet losses. In contrast to this, unreliable butcongestion-aware transport protocols, like the DatagramCongestion Control Protocol (DCCP) [5], were proposed.While they are able to reduce end-to-end latencies by omit-ting reliability, they suffer from deployment problems in ex-isting network infrastructures.Apart from the systematic problems of TCP, also perfor-mance considerations have to be made regarding the TCPimplementation. The TCP stack is in general implementedin the kernel space of the operating system because of perfor-mance and security issues. Therefore, an interface betweenthe user space and the kernel space is needed, which is rep-resented by the TCP socket API and a socket buffer. TheTCP socket buffer has a direct impact on the TCP perfor-mance, because it limits the maximum throughput by re-stricting the maximum size of the TCP window [24]. So theTCP socket buffer size corresponds directly to the maximumTCP throughput. In general, it is recommended to set theTCP socket buffer size to twice the value of the bandwidth-delay product (BDP) [24]. For adaptive real-time streamingit is not the goal to fully utilize the network link, but totransmit the video data in real-time. Therefore the TCPsocket buffer is dimensioned regarding the maximum videobit rate and not the maximum available bandwidth.

In the literature different approaches for adaptive TCPstreaming exist. We selected the approaches for our evalua-tions, which only observe the TCP channel and use the infor-mation in an adaptive streaming process. So the streamingsystems do not change the behavior of the TCP transmis-sion system. In the following three different approaches foradaptive TCP streaming are introduced; their realisation byusing H.264/SVC will be further explained in Section 4.

Application-layer Bandwidth Estimation: The first approachmeasures the available bandwidth based on the time spentfor the transmission of a specific media block [15]. The de-livery module writes the media data into a blocking TCPsocket and estimates the actual delivery bit rate by sim-ply counting the bytes for a time slot of one second. Afterretrieving an estimation value for the available bandwidth,the adaptation process uses this information to configure themedia transcoder. This approach to adaptive TCP stream-ing is very straight forward and does not use operating sys-tem specific information. As shown in [15] it can adapt tothe available bandwidth, but is very implementation specificbecause it relies on the blocking socket API.

TCP Stack-based Bandwidth Estimation: The current con-gestion window and the estimated RTT of the TCP connec-tion can be used to estimate the available bandwidth [12].In [2] for example a simple model for the expected latencyof packets sent with TCP Reno is presented. The modelalso estimates the available channel rate and the expected

Figure 2: Priority-order within a video segment

distortion. The TCP parameters are directly used in theencoding process of the video, which allows the streamingsystem to adapt to the available network conditions.

Priority Streaming: In contrast to traditional buffering atthe receiver, which only tries to overcome bandwidth short-ages, priority streaming [17, 11] tries to improve the qual-ity of the video over a period of time. A video sequenceis split into fragments, each comprising the same play-outduration (GOP). The idea is to rearrange the video syntaxelements (e.g., slices, frames, layers) in a video fragment de-pending on its priority. For example, using a non-scalablevideo codec, the I-frames would precede the P-frames andthe B-frames (see Figure 2). The reordered video fragmentis called a video segment. As a result of this reordering, it isnot necessary to retrieve the whole video segment; it can betruncated at virtually any point. The quality of the videodepends only on how much of the video segments was re-trieved. The size of the GOP directly influences the qualityvariations between the video segments, which favors largerGOPs. On the other hand, the play-out delay increases withthe size of the GOP. In most cases, the size of the GOP isrestricted by the use case, which dictates maximum play-outdelay. A TCP-based priority streaming system uses a sin-gle TCP connection for transmission. From the server, eachvideo segment is streamed to the client. When reaching thetimeout of a video segment, the server stops transmittingthe segment and switches to the next one. From the re-ceived video segment, the video can be reconstructed anddisplayed to the user.

A different but somehow related paradigm to the approachesunder investigation is the stream-switching technique. Incase of stream switching, the content is available at theserver at different bit rates; depending on the actual net-work conditions the bit rate can be switched. This tech-nique is already deployed in commercial applications likethe HTTP-based streaming used in Apple products [14] orthe HD Internet streaming solution provided by Move Net-works [13]. While the products mentioned above rely onthe client to decide when to switch to a higher or lower bitrate version of the content, it is also possible to implementserver-side stream switching. The most important differenceto the approaches evaluated in this paper is that the stream-switching solution does not scale very well. First, the con-tent has to be encoded using different bit rates which causesan additional effort especially when considering streamingof live content. This would require to encode the contentin parallel with multiple encoders. Second, all the differ-ent bit rate versions of the same video have to be stored

Figure 3: Server-side adaptive streaming system

at the server which causes additional storage demand thatincreases with the number of bit rates offered. Both the im-plications on the encoder and storage demand lead to thefact that stream-switching solutions only offer a handful ofdifferent video bit rates for economical reasons. In contrast,the approaches based on H.264/SVC allow to have a signifi-cant higher number of different bit rate versions of the samecontent. When using MGS quality scalability in combina-tion with bit stream extraction based on the priority id, thebit rate can be adapted in up to 64 discrete steps. This leadsto a much higher dynamic range of available bit rates andthe advantage of having a single bit stream stored at theserver. Additionally, only a single encoder is required in thecase of live content.

In the following section, we will show how these approachescan be used in conjunction with H.264/SVC and describethe implementations used for the evaluation.

4. ADAPTIVE STREAMING OF H.264/SVCFor our evaluation of rate-adaptive streaming algorithms,we adapted the three approaches mentioned above for usingH.264/SVC. We intentionally focused on adaptation mecha-nisms that can be characterized as server-side (server-based)algorithms. The rationale behind that is that this allows aneasier deployment of the adaptive streaming solutions. Anarchitectural figure of such a server-side streaming solutionis given in Figure 3. At the server the video is stored asscalable H.264/SVC bit stream. A rate-adaptive compo-nent adapts the bit stream accordingly and transmits thebit stream to the client. The transmission is performed ona per-GOP basis. At the client the the GOP is received,buffered and decoded for the final playback. Additionally,all three approaches do not rely on explicit feedback fromthe client like notifying the current buffer status or similar.The only kind of feedback that is used at the server is theimplicit feedback via the TCP protocol. For instance, block-ing send operations on the socket on the server side are usedas indication of the connection status.

Another characteristic that is shared by all three approachesis that the timely transmission of the video is controlledat the server side. On the one hand, this means that theserver takes care that not too much of the video content is

Figure 4: GOP timing

streamed out in advance. The transmission of the contenttakes place at the same speed as the video is consumed at theclient side. This real-time behavior clearly distinguishes theapproaches from other paradigms (download and play) thatcan be found in other streaming solutions in the Internet.The advantage of the real-time playout is that a single clientwith a fast network connection cannot consume more thanits fair share, which is at most the maximum bit rate of thevideo, and cannot overload the streaming server. On theother hand, this server-side mechanism is further necessarybecause otherwise the scheduled transmission times of videosegments and the actual transmission times can differ incase of congestion. In case of persistent congestion, thisdifference slightly increases over time and causes the videosegments to arrive too late at the client. Since this behaviordrains the buffer at the client and causes the video playbackto stop at the player, all three approaches employ counter-measures to prevent this drift. In the following the threeimplementations of the approaches are described in detail.

Application-layer Bandwidth EstimationThe first rate-adaptive approach is based on bandwidth es-timation that is done at the application layer (APP-BE).Basically, the server estimates the bandwidth of the TCPconnection by considering the number of bytes sent throughthe socket API during a time interval. In our implementa-tion, the estimation is performed on a per-GOP basis. Thenumber of bytes of the GOP to transmit is known in ad-vance by the server. By measuring the time that is requiredto send the GOP via the socket API the server calculates thecurrent throughput of the TCP connection. For calculatingthe bandwidth estimate for the next GOP to send, the mea-sured throughput values of the last five GOPs are averaged.Based on the estimate, the next GOP is adapted by usingH.264/SVC-based adaptation so that the total bit rate ofthe GOP is below the bandwidth estimate. Additionally,it should be noted that the estimate is further adjusted bythe buffer control algorithm to prevent a drift of the actualtransmission time of each GOP in case of congestion.

More formally, the algorithm can be described as follows.Let dgop be the playback duration of a GOP in secondsand tsch

i the scheduled start time for transmitting the i-thGOP at the server. Without loss of generality, we definetsch0 = 0 and assume the scheduled start of any other GOPi as tsch

i = tsch0 + dgop ∗ i. In contrast to the scheduled start

time we measure the actual start time of each GOP (tacti )

−2 0 2 4 6 8

0.2

0.4

0.6

0.8

1.0

1.2

1.4

delta/dgop

fbct

l(del

ta, d

gop)

Figure 5: Buffer control function

at the moment before the first byte of a GOP is sent via thesocket API. After successfully transmitting the GOP, theactual time is sampled again and the transmission duration(dtrans

i ) of GOP i is determined. The introduced times-tamps and durations are illustrated in Figure 4. Based onthe known length of the GOP in bytes (lgop

i ), the measuredthroughput during transmission of GOP i can be simply cal-culated as thi = lgop

i /dtransi . The estimated bandwidth for

the next GOP i+ 1 is then calculated by averaging over thelast five throughput measurements and adjusting the aver-age by a multiplicative term. This adjustment is requiredfor the buffer control and prevents that the scheduled starttimes tsched

i and the actual start times tacti begin to diverge.

The multiplicative term is determined by the buffer controlfunction fbctl(delta, d

gop) that depends on the difference be-tween both timestamps and the duration of one GOP. Thefunction is illustrated in Figure 5 and described in more de-tail in the Appendix A. Finally, the bandwidth estimation(be) for GOP i+ 1 is calculated as follows.

bei+1 =1

5

iXj=i−4

thj · fbctl(tacti + dtrans

i − tschi+1, d

gop) (1)

The bandwidth estimation for the next GOP is then usedfor adapting the GOP in a way that it matches the targetedbandwidth. For that purpose, the NAL units are optionallydiscarded by using the adaptation path defined by the prior-ity id. As already mentioned above, the approach preventsfrom streaming out faster than real-time by imposing theconstraint that the transmission of a GOP may not startearlier than scheduled, i.e., ∀i : tact

i ≥ tschedi .

A crucial requirement for this approach is that the size ofthe GOP is larger than the TCP socket buffer on the server.Otherwise, the socket implementation simply copies the datato transmit into the TCP socket buffer and immediatelyreturns from the send operation. This implies that in orderto achieve good performance, the TCP socket buffer on theserver should be adjusted according the bit rate of the videoand the envisaged RTT range.

Figure 6: PRID-based NAL unit reordering

TCP Stack-based Bandwidth EstimationThe second approach under investigation is the TCP Stack-based Bandwidth Estimation (TCP-BE). This algorithm hasvery much in common with the application layer mechanismintroduced above. Also in this approach, an estimate of thebandwidth is obtained and the next GOP to transmit isadapted based on that estimate. Furthermore, the estimateis again adjusted by a multiplicative term in order to preventdrift between the scheduled and actual transmission time ofeach GOP. However, the main difference is the way how thebandwidth estimation is calculated. Instead of measuringthe TCP throughput at the application layer, the informa-tion about the TCP connection is obtained directly fromthe network stack. This is done by using the getsockopt

API call in combination with the option TCP_INFO, which issupplied by most *nix operating systems. The informationrelevant for the bandwidth estimation consists of the cur-rent congestion window size cWnd, the maximum segmentsize MSS and the estimate of the round trip time RTT. Thethroughput estimate for the i-th GOP can then be calcu-lated as [12]:

thi =cWnd ·MSS

RTT(2)

The bandwidth estimation (be) for GOP i+ 1 is calculatedbased on Equation 1 by taking into consideration the buffercontrol function. The rest of the approach is identical toApplication-layer Bandwidth Estimation.

Deadline-driven Adaptive StreamingIn contrast to the previous approaches that rely on explic-itly estimating the bandwidth, the Deadline-driven AdaptiveStreaming (DEADLINE) algorithm is based on the prioritystreaming paradigm as introduced in Section 3. For our eval-uation, the video is split up into video segments, where onesegment corresponds to exactly one GOP. In a nutshell, thisapproach rearranges the NAL units of each GOP accordingto their priority before transmitting them. On the server acertain time interval is scheduled for the transmission of therearranged GOP. The server tries to transmit as much dataof the GOP as possible during that time interval. After thescheduled time interval has passed, i.e., the deadline for theGOP has been reached, the server starts transmitting thenext GOP.

Typically, each GOP of the H.264/SVC scalable bit streamconsists of H.264/AVC-compliant NAL units that form thebase layer and H.264/SVC-specific NAL units that representenhancement layers. Normally, these units are transmittedin decoding order which means that both kinds of NAL units

are interleaved. Following the priority streaming paradigm,they become rearranged so that the NAL units are transmit-ted in order of their priority id. NAL units with identicalvalues of their priority id are arranged in decoding order.In this context, a lower numerical value of the priority idsignals a higher priority of the NAL unit according to [20].The basic principles of this reordering are illustrated in Fig-ure 6. For the sake of simplicity the figure shows a GOPconsisting of only four pictures (I, P, B, B) in decoding or-der. Each of the pictures is represented by one NAL unitcarrying the base layer (BL) and two NAL units represent-ing enhancement layers (EL1, EL2). The numerical valuesin the boxes represent the priority id value of each NALunit; in our configuration all NAL units of the base layerhave the highest priority. For transmission, the NAL unitsare ordered according to their priority. As a result of thescalable property of H.264/SVC, the quality of the decodedbit stream at the client monotonically increases with theamount of NAL units transmitted by the server.

As the decoder at the client side requires the NAL units inthe proper decoding order, this reordering at the server hasto be reversed at the client. For supporting the reorder-ing, we embedded proprietary reordering information in thetransmitted bit stream. Although this reconstruction at theclient can also be achieved using standard-compliant meth-ods, we intentionally used a proprietary approach for thesake of simplicity.

The transmission of the rearranged GOPs is based on thetransmission schedule as already defined for the previous twoapproaches. The scheduled start time for the transmissionof the i-th GOP is denoted as tsched

i . Obviously, the dead-line for the i-th GOP is the start time of the next GOPtschedi+1 . During the time period [tsched

i , tschedi+1 ), the server

sends the NAL units via the socket API and checks aftereach send operation if the deadline has been reached. How-ever, the send operation itself is never preempted. In caseof an overprovisioned link, the transmission will be finishedbefore the deadline is reached. The transmission of the nextGOP however will be delayed until the scheduled start timeof the next GOP. If the bandwidth of the link is insufficientto transmit the whole GOP in the given time interval, theserver will skip the remaining NAL units and proceed withthe next GOP. It should be noted that this approach doesnot rely on an explicit H.264/SVC adaptation process, theactual adaptation is performed implicitly by the deadline-driven transmission.

RemarksEach of three approaches relies on H.264/SVC adaptationbased on the priority id mechanism. For our actual research,we used the Quality Level Assigner tool of the JSVM [9] soft-ware to obtain the information about the importance of eachNAL unit. This tool determines the priority information forMGS NAL units in a rate-distortion optimal way. However,it uses a semantic of the priority id that is different from theone specified in [20]. A post-processing step for assigning theproper priority id values based on the output of the QualityLevel Assigner is therefore necessary. As a consequence, ourfurther evaluation is limited to adaptation in the SNR do-main, although the approaches themselves are more generic.This means that they are also applicable for temporal or

Figure 7: Evaluation setup

spatial adaptation by assigning the priority ids appropri-ately. We considered MGS scalability as most suitable forthe envisaged use case in the context of Internet stream-ing. MGS scalability allows a smooth playback at the client(e.g., in contrast to temporal adaptation) and also does notintroduce too severe quality changes that would arise whenswitching between different spatial resolutions. Temporaland spatial scalability could be taken into account as a fall-back solution in case of harder congestion situations andnetwork conditions.

5. EVALUATIONWe evaluated the behavior and performance of the proposedalgorithms under conditions typical for Internet streaming.These include under-provisioned network links and conges-tion due to competing TCP traffic. A further criterion forassessing applications that are intended to be deployed inthe Internet is their TCP friendliness. This is regarded asa necessity for the stability of the Internet and implies thatan application/protocol does not consume more than its fairbandwidth share compared to competing TCP connections.Since all of the proposed approaches only make use of asingle TCP connection, a TCP friendly behavior can be as-sumed for all systems.

Evaluation SetupThe evaluation setup consists of six Linux boxes representingtwo servers, two routers and two clients as illustrated in Fig-ure 7. Each of them runs the Ubuntu Linux operating sys-tem (kernel 2.6.27). All client and server computers are con-figured to use the TCP Reno implementation. The routersare using the Netem1 kernel component for performing emu-lation of network characteristics like delay, jitter and packetloss. The symmetric end-to-end delay of the Internet andthe provider network is emulated by Router 1. The packetdelay is normally distributed with 10% standard deviationand is applied to all packets. Router 2 emulates the asym-metric access network to the clients and limits the up- anddownstream bandwidth (BW), while allowing a maximumqueuing delay of 200 ms. The asymmetry of the networkwas emulated by setting the upstream bandwidth to 1/8 ofthe downstream bandwidth, which is common for DSL/cableaccess networks. The implementation of the three adaptiveTCP streaming approaches is based on Python. For the eval-uation, the video streaming server component is deployed onServer 1 while the client component is located at Client 1.To emulated a congested link, Client 2 issues parallel HTTPdownloads from Server 2, that compete for their networkshare. The Apache2 HTTP Server is used for serving theHTTP downloads.

1http://www.linuxfoundation.org/en/Net:Netem

2http://httpd.apache.org

bitrate [kbit/s]

Ave

rage

PS

NR

[dB

]

32

33

34

35

36

750 1000 1250 1500 1750 2000

GOP soccer (4CIF)

65 frames129 frames257 frames

Figure 8: Rate-distortion curves of the test se-quences

Network ScenariosWe evaluated the adaptive streaming approaches underthree representative network conditions.

The first scenario is an overprovisioned network link, wherethe available network bandwidth substantially exceeds thevideo bit rate. Because TCP’s congestion avoidance tech-nique is based on the AIMD principle, throughput varia-tions may occur in networks with high bandwidth delayproducts. This scenario is used to evaluate the ability ofan approach to fully utilize the bandwidth for video trans-mission and to cope with the throughput variations. Atthe other extreme, the behavior of the approaches is fur-ther analyzed in a scenario of an underprovisioned networklink. This should demonstrate the capability of adapting thevideo to a given limited network bandwidth while maximiz-ing the video quality at the client. In the third scenario,we introduce cross traffic on the network link with one, twoand three concurrent infinite-source TCP streams. On sucha congested network link, an adaptive streaming system hasto cope with high bandwidth fluctuations, because of thecompetition between the concurrent TCP streams. Further,it should provide a TCP-friendly behavior.

ContentThe test sequences used for evaluation were created withthe H.264/SVC codec provided by the Joint Scalable VideoModel (JSVM) [9] 9.15 software. The video soccer in 4CIFresolution was encoded with different GOP sizes. The first65, 129 and 257 frames of the video soccer form a testsequence, each comprising 2.16, 4.30 and 8.57 seconds ofvideo at 30 fps, respectively. Each sequence features anH.264/AVC backward compatible base layer and one MGSquality enhancement layer. Within the MGS quality en-hancement layer, the transform coefficients are uniformlypartitioned into four NAL units. The Quality Level As-signer tool (JSVM) was used to assign 64 different priorityids to the NAL units based on rate-distortion values. InFigure 8 the rate-distortion values of the test sequences areshown, ranging from the lowest bit rate (highest priority) tothe full bit rate (lowest priority). It can be observed thatthe sequences with smaller GOP sizes have a slightly worse

rate-distortion performance, because the prediction struc-tures are restricted to the GOP length.

MethodologyAlthough each test sequence comprises a different number offrames, it can be observed in Figure 8 that the average PSNRvalues of the sequences are similar. The soccer65 sequenceis streamed 400 times in a loop, which results in approx.900 seconds play-out duration. Because each test sequencerepresents a video segment with a different duration, we arestreaming the soccer129 and soccer257 sequences 200 and100 times in a loop, respectively. This results in a similarplay-out duration, which enables a fair comparison of theevaluation results for the different GOP sizes. In additionto the different network scenarios, also the RTT is varied,ranging from 50 ms to 200 ms. Each experiment is repeatedthree times to reduce the influence of the runtime environ-ment on the results. In all evaluation runs, no frames orGOPs are lost, because the streaming systems are based onthe reliable transmission of TCP.

The TCP socket buffer size was determined by taking theconsiderations of Section 3 into account. The streaming sys-tems should be able to utilize overprovisioning to enhancetheir streaming performance, so we allow 50 % higher TCPthroughputs. So the TCP socket buffer size lsock is a resultof the maximum bit rate of the video brmax (≈ 2048 kbps)and the maximum round trip time RTTmax (200 ms), re-sulting in lsock = 1.5 · 2 · brmax ·RTTmax = 153600 bytes.

The metrics for comparing the adaptive streaming systemsare the reconstructed video quality in terms of PSNR andthe deviation between the scheduled and the actual arrivaltime of a GOP on the client, deltaC = tact

i − tschedi . For

the DEADLINE approach the deltaC has to be interpretedwith care, because the approach has to wait for a GOP to becompletely received before the decoding of the GOP on theclient can be started. The reason for this is the reorderingof the GOP and can be interpreted as additional buffering.In contrast to this, the APP-BE and TCP-BE approach canstart the decoding of the GOP immediately after receivingthe first frame of the GOP.In our evaluation, we focus on the dependability of the adap-tation algorithms in terms of timeliness of delivery. Thus, inthe startup phase we transmit a single GOP in full qualityto get a good starting point for the bandwidth estimation.

6. RESULTSIn our evaluation on adaptive Internet streaming, we con-duct measurements for RTTs ranging from 50 to 200 ms.Because the streaming performance of TCP at low RTTs isvery good, we decided to show the evaluation results for anRTT of 200 ms. In our opinion, this would represent theworst case scenario for our use case. In the following, a dis-cussion of the evaluation results for each network scenario ispresented.

The performance of the streaming systems in an overprovi-sioned network is shown in Figure 9. All approaches per-form well and can stabilize the video quality by utilizingthe additional bandwidth. This is supported by the anal-ysis in [18], which states that good streaming performancecan be expected when available bandwidth is about twice

APP−BE TCP−BE DEADLINE

PS

NR

[dB

]

31

32

33

34

35

36

37

38 GOP size



arriv

al ti

me

devi

atio

n (d

elta

C)

[sec

]

0

2

4

6

8

10

12 GOP size


Figure 9: Overprovisioned network: BW = 4096 kbps. PSNR and arrival time deviation on the client (deltaC)for the different algorithms and GOP sizes at 200 ms RTT.


PS

NR

[dB

]

31

32

33

34

35

36

37

38 GOP size



arriv

al ti

me

devi

atio

n (d

elta

C)

[sec

]

0

2

4

6

8

10

12 GOP size


Figure 10: Underprovisioned network: BW = 1536 kbps. PSNR and arrival time deviation on the client(deltaC) for the different algorithms and GOP sizes at 200 ms RTT.


PS

NR

[dB

]

31

32

33

34

35

36

37

38 GOP size



arriv

al ti

me

devi

atio

n (d

elta

C)

[sec

]

0

2

4

6

8

10

12 GOP size


Figure 11: Congested network: BW = 4096 kbps and one concurrent TCP stream. PSNR and arrival timedeviation on the client (deltaC) for the different algorithms and GOP sizes at 200 ms RTT.


PS

NR

[dB

]

31

32

33

34

35

36

37

38 GOP size



arriv

al ti

me

devi

atio

n (d

elta

C)

[sec

]

0

2

4

6

8

10

12 GOP size


Figure 12: Congested network: BW = 4096 kbps and two concurrent TCP streams. PSNR and arrival timedeviation on the client (deltaC) for the different algorithms and GOP sizes at 200 ms RTT.


PS

NR

[dB

]

31

32

33

34

35

36

37

38 GOP size



arriv

al ti

me

devi

atio

n (d

elta

C)

[sec

]

0

2

4

6

8

10

12 GOP size


Figure 13: Congested network: BW = 4096 kbps and three concurrent TCP streams. PSNR and arrival timedeviation on the client (deltaC) for the different algorithms and GOP sizes at 200 ms RTT.

the media bit rate. The small differences in the averagePSNR for the different GOP sizes are mainly caused by thebetter rate-distortion performance of larger GOPs (see Fig-ure 8). The variation of the emulated packet delay leadsto TCP throughput variations, which result in the adapta-tion of some GOPs even in the case of overprovisioning. Forthe evaluation on the timeliness of delivery of the GOPs,we calculate the arrival time deviation of the GOPs on theclient (deltaC) for the different algorithms and GOP sizes at200 ms RTT. If the available bandwidth exceeds the bit rateof the adapted video, the delivery of a single GOP may takeless time than the GOP playout duration. This can be ob-served in Figure 9, where GOPs arrive a little earlier thanexpected. The arrival time deviation is near zero, whichmeans that all GOPs arrive in time and only little bufferingwill be needed at the client. The standard deviation of thePSNR and arrival time deviation is shown as error bars inthe plots.

The adaptability of the streaming systems to a static bot-tleneck link with 1536 kbps bandwidth (underprovisionednetwork) is shown in Figure 10. The PSNR values in Fig-ure 10 are approx. 35 dB, which corresponds to a bit rate of≈ 1400 kbps (see Figure 8). This indicates that all streamingapproaches can adjust their streamout rate to the availablebandwidth. Also the larger GOP sizes help to stabilize thePSNR values, as can be seen from the lower standard devia-tion values. In bandwidth limited networks, the correctnessof the bandwidth estimation is crucial. The figure showsthat larger GOP sizes tend to increase deltaC for APP-BEand TCP-BE. This is mainly because for larger GOP sizesa wrong bandwidth estimate results in a larger deviationof the delivery duration, which directly influences deltaC.Although higher, the deltaC values for APP-BE and TCP-BE are in an acceptable range because of the buffer controlat the server. The DEADLINE approach does not involvebandwidth estimation, so timely delivery can be achieved.

The evaluation results for congested network links are shownin Figures 11, 12 and 13. The average PSNR decreases the

more congestion on the network link occurs. Also the qual-ity changes between the GOPs increase (standard deviation)with the congestion level. The PSNR values for Figures 11,12 and 13 are approx. 36, 34.5 and 33.5 dB, respectively.The PSNR values correspond to bit rates of 1800, 1250,1000 kbps, respectively (see Figure 8). So for all conges-tion scenarios the approaches only use their fair share ofthe available link bandwidth. While all of the approachesare able to adapt to available/fair bandwidth, it can be ob-served that the DEADLINE approach can use larger GOPsto stabilize the overall quality of the video. The findingsare similar to the results in the underprovisioned networkscenario. DEADLINE can achieve a stable performance forall GOP sizes, while APP-BE and TCP-BE suffer from highvariations at large GOP sizes.

The evaluation results indicate that the optimal GOPsize for APP-BE and TCP-BE in our evaluation setupis 65 frames, because of the high arrival time deviation(deltaC) for large GOPs in case of congestion. In contrast tothese approaches, DEADLINE is able to enhance its perfor-mance with larger GOP sizes, so a GOP size of 257 framesis considered to be optimal. Figures 14, 15 and 16 show theaverage PSNR of the adaptive streaming systems with re-spect to the RTT. In case of no congestion it can be observedthat the quality is near constant. In the congested networkscenarios, higher RTTs lead to lower PSNR values, which ismainly because of tough competition on the network link.

7. CONCLUSIONWe evaluated and compared three approaches to adaptiveTCP streaming (rate control) of H.264/SVC video in anInternet-like setting. To the best of our knowledge, this isthe first study making use of H.264/SVC MGS scalability insuch settings. We found that adaptive TCP streaming, de-spite the simplicity of the rate-control algorithms deployed,can effectively cope with bandwidth limitations and con-gestion, even under high RTTs. By utilizing H.264/SVCMGS-based scalability, the bit rate and quality of the de-livered video data can be a adapted in a fine-grained man-

50 100 150 200

31

32

33

34

35

36

37

RTT [ms]

PS

NR

[dB

]

RTT [ms]

1536 kbps4096 kbps4096 kbps − 1 TCP4096 kbps − 2 TCP4096 kbps − 3 TCP

Figure 14: Average PSNR for APP-BE with GOPsize 65 with respect to the RTT.

50 100 150 200

31

32

33

34

35

36

37

RTT [ms]

PS

NR

[dB

]

RTT [ms]


Figure 15: Average PSNR for TCP-BE with GOPsize 65 with respect to the RTT.

50 100 150 200

31

32

33

34

35

36

37

RTT [ms]

PS

NR

[dB

]

RTT [ms]


Figure 16: Average PSNR for DEADLINE withGOP size 257 with respect to the RTT.

ner. In order to stabilize TCP throughput over a periodof time and decrease video quality fluctuations and packetjitter, our adaptive streaming approach utilizes rather largeGOPs. We found, however, that the approaches using band-width estimation suffer from large GOP sizes, since inac-curate bandwidth estimates for GOP delivery may impactquality and increase jitter; client buffer dimensioning thusmay become difficult. In contrast, the priority-/deadline-driven streaming approach that reorders and transfers – upto a pre-determined deadline – GOP picture data accord-ing to their importance, can cope with large GOP sizes andstabilize the quality in a rate-distortion optimal manner.Packet delay jitter is more predictable and client buffer sizescan be determined more easily. However, the client buffermust provide room for hosting and re-ordering a full GOPbefore playout can start. Thus, also startup delay increasesby an additional GOP time. Another finding of our studywas that bandwidth estimation on the application level canwork quite well, comparably to making use of detailed in-formation from the TCP stack. In case an operating systemdoes not provide access to such information, the streamingserver may resort to the application-level method.

8. ACKNOWLEDGMENTThis work was supported in part by the Austrian ScienceFund (FWF) under project “Adaptive Streaming of SecureScalable Wavelet-based Video (P19159)” and by the EC inthe context of the P2P-Next project (FP7-ICT-216217).

9. REFERENCES[1] J. G. Apostolopoulos, W.-T. Tan, and S. J. Wee.

Video streaming: Concepts, algorithms, and systems.Technical report, HP Laboratories, 2002.

[2] A. Argyriou. Real-time and Rate-distortion OptimizedVideo Streaming with TCP. Elsevier Journal onSignal Processing: Image Communication,22(4):374–388, 2007.

[3] M. Dischinger, A. Haeberlen, K. P. Gummadi, andS. Saroiu. Characterizing Residential BroadbandNetworks. In Proceedings of the 7th ACM SIGCOMMConference on Internet Measurement (IMC ’07),pages 43–56, 2007.

[4] S. Floyd and K. Fall. Promoting the Use of End-to-endCongestion Control in the Internet. IEEE/ACMTransactions on Networking, 7(4):458–472, 1999.

[5] S. Floyd, M. Handley, and E. Kohler. ProblemStatement for the Datagram Congestion ControlProtocol (DCCP). RFC 4336 (Informational), 2006.

[6] A. Goel, C. Krasic, K. Li, and J. Walpole. SupportingLow Latency TCP-Based Media Streams. Technicalreport, Oregon Graduate Institute School of Scienceand Engineering, 2002.

[7] S. Ha, I. Rhee, and L. Xu. CUBIC: A NewTCP-friendly High-speed TCP Variant. SIGOPSOperating Systems Review, 42(5):64–74, 2008.

[8] P.-H. Hsiao, H. T. Kung, and K.-S. Tan. Video overTCP with Receiver-based Delay Control. InProceedings of the 11th International Workshop onNetwork and Operating Systems Support for DigitalAudio and Video (NOSSDAV ’01), pages 199–208,2001.

[9] Joint Video Team (JVT) of ISO/IEC MPEG and

ITU-T VCEG. Joint Scalable Video Model. Doc.JVT-X202, 2007.

[10] C. Krasic, K. Li, and J. Walpole. The Case forStreaming Multimedia with TCP. In Proceedings ofthe 8th International Workshop on InteractiveDistributed Multimedia Systems (IDMS ’01), pages213–218, 2001.

[11] C. Krasic, J. Walpole, and W.-C. Feng.Quality-adaptive Media Streaming by Priority Drop.In Proceedings of the 13th International Workshop onNetwork and Operating Systems Support for DigitalAudio and Video (NOSSDAV ’03), pages 112–121,2003.

[12] M. Mathis, J. Semke, J. Mahdavi, and T. Ott. TheMacroscopic Behavior of the TCP CongestionAvoidance Algorithm. ACM SIGCOMM - ComputerCommunication Review, 27(3):67–82, 1997.

[13] Move Networks. Move Adaptive Stream - ProductSheet. http://www.movenetworks.com. Last accessedon 2009-09-22.

[14] R. Pantos. HTTP Live Streaming. Internet Draftdraft-pantos-http-live-streaming-01, 2009.

[15] M. Prangl, I. Kofler, and H. Hellwagner. Towards QoSImprovements of TCP-Based Media Delivery. InProceedings of the Fourth International Conference onNetworking and Services (ICNS ’08), pages 188–193,2008.

[16] M. Saxena, U. Sharan, and S. Fahmy. AnalyzingVideo Services in Web 2.0: A Global Perspective. InProceedings of the 18th International Workshop onNetwork and Operating Systems Support for DigitalAudio and Video (NOSSDAV ’08), pages 39–44, 2008.

[17] W. Feng, M. Liu, B. Krishnaswam, A. Prabhudev. APriority-Based Technique for the Best-Effort Deliveryof Stored Video. In Proceedings of the SPIE/ISTMultimedia Computing and Networking 1999(MMCN’99), 1999.

[18] B. Wang, J. Kurose, P. Shenoy, and D. Towsley.Multimedia Streaming via TCP: An AnalyticPerformance Study. ACM Transactions on MultimediaComputing, Communications and Applications,4(2):16:1–16:22, 2008.

[19] Y. Wang, M. Hannuksela, S. Pateux, A. Eleftheriadis,and S. Wenger. System and Transport Interface ofSVC. IEEE Transactions on Circuits and Systems forVideo Technology, 17(9):1149–1163, 2007.

[20] S. Wenger, Y.-K. Wang, T. Schierl, andA. Eleftheriadis. RTP Payload Format for SVC Video.Internet Draft draft-ietf-avt-rtp-svc-19, 2009.

[21] T. Wiegand, G. Sullivan, H. Schwarz, and M. Wien,editors. ISO/IEC 14496-10:2005/Amd3: ScalableVideo Coding. International StandardizationOrganization, 2007.

[22] M. Wien, H. Schwarz, and T. Oelbaum. PerformanceAnalysis of SVC. IEEE Transactions on Circuits andSystems for Video Technology, 17(9):1194–1203, 2007.

[23] W. Ye-Kui, M. M. Hannuksela, S. Pateux,A. Eleftheriadis, and S. Wenger. System andTransport Interface of SVC. IEEE Transactions onCircuits and Systems for Video Technology,17(9):1149–1163, 2007.

[24] E. Yildirim, D. Yin, and T. Kosar. Balancing TCP

Buffer vs Parallel Streams in Application LevelThroughput Optimization. In Proceedings of theSecond International Workshop on Data-awareDistributed Computing (DADC ’09), pages 21–30,2009.

APPENDIXA. BUFFER CONTROL FUNCTIONThe buffer control function is used for controlling the bufferby adjusting the estimated bandwidth. The rationale behindthis is that without adjustment the scheduled start time andthe actual start time of the transmission of each GOP tendto diverge. This is a consequence of the fact that the esti-mation of the available bandwidth is not perfect and over-or underestimates the actual value. On the other hand, if aTCP connection does not try to increase the throughput overtime and does not participate in the competition of possibleconcurrent TCP connections, it will suffer from starvation.

The function that is given in Equations 3 and 4 is designedby considering the following situations. If the difference be-tween scheduled and actual start time (delta) is less than5 percent of the GOP duration in seconds, the bandwidthestimation is increased by 50 percent. This allows the al-gorithms to be more aggressive and competitive in case oflow delta values. However, when delta exceeds five timesthe GOP duration, the actual transmission of the GOP lagssignificantly behind the scheduled transmission. In that casethe actual rate for video transmission is chosen to be 1/5 ofthe estimated bandwidth. No adjustment takes place if deltais exactly 0.5 times of the GOP duration, i.e., the functionvalue is 1. Based on these three operating points, a curvewith the general form f(x) = a

x+b+ c is fitted to match the

operating points described above. The parameters of theresulting function ψ(x) are given in Equation 4.

fbctl(delta, dgop) :=

8<: 1.5 if delta/dgop < 0.051/5 if delta/dgop >= 5

ψ( deltadgop ) otherwise

(3)

ψ(x) :=1.46

x+ 0.893+ 0.0476 (4)

an evaluation of tcp-based rate-control algorithms for adaptive internet streaming … · 2019. 4....

Documents