flexible dual tcp/udp streaming for h.264 hd...

Flexible Dual TCP/UDP Streaming forH.264 HD Video Over WLANs

Jing Zhao and Ben LeeOregon State University

School of ElectricalEngineering and Computer

ScienceCorvallis, OR 97331

{zhaoji,benl}@eecs.orst.edu

Tae-Wook Lee,Chang-Gone Kim, and

Jone-Keun ShinLG Display Co. Ltd

LCD LaboratoryPaju-si, Gyeonggi-do, Korea

{twlee, cgkim02,rgbshin}@lgdisplay.com

Jinsung ChoKyung Hee University

Department of ComputerEngineering

Yongin-si 446-701, [email protected]

ABSTRACTHigh Definition video streaming over WLANs faces manychallenges because video data requires not only data in-tegrity but also frames have strict playout deadline. Tra-ditional streaming methods that rely solely on either UDPor TCP have difficulties meeting both requirements becauseUDP incurs packet loss while TCP incurs delay. This pa-per proposed a new streaming method called Flexible Dual-TCP/UDP Streaming Protocol (FDSP) that utilizes the ben-efit of both UDP and TCP. The FDSP takes advantage ofthe hierarchical structure of the H.264/AVC syntax and usesTCP to transmit important syntax elements of H.264/AVCvideo and UDP to transmit non-important elements. Theproposed FDSP is implemented and validated under differ-ent wireless network conditions. Both visual quality and de-lay results are compared against pure-UDP and pure-TCPstreaming methods. Our results show that FDSP effectivelyachieves a balance between delay and visual quality, thusit has advantage over traditional pure-UDP and pure-TCPmethods.

Categories and Subject DescriptorsH.4 [Information Systems Applications]: Communica-tions Applications; C.2.1 [Computer-Communication Net-works]: Network Architecture and Design—wireless com-munication

General TermsDesign, experimentation, performance

KeywordsHD Video Streaming, H.264/AVC, WLANs, TCP, UDP

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.ICUIMC ’03 Kota Kinabalu, MalaysiaCopyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00.

1. INTRODUCTIONHigh Definition (HD) video streaming over WLANs has

become a viable and important technology as network band-width continues to improve and the use of smartphones,Mobile Internet Devices, and wireless display devices in-creases. Some notable HD wireless streaming technologiesinclude Apple AirPlay R© [1], Intel WiDi R© [2], and CaviumWiVuR© [3]. These technologies are deployed in an ad hocmode, and the state-of-the-art video compression standardH.264 facilitates wireless video streaming by providing moreefficient compression algorithm and thus less data needs tobe transmitted through the network. Moreover, H.264 pro-vides many error-resilience and network-friendly features,such as Data Partitioning (DP), Flexible Macroblock Or-dering (FMO), and Network Adaption Layer (NAL) struc-ture [4]. However, wireless video streaming still faces manychallenges. This is because, unlike transmitting traditionaldata, video streaming requires not only data integrity butalso frames have strict playout deadline in the presence ofpacket delay and loss. Both of these factors are also closelyrelated to streaming protocols.

Transmission Control Protocol (TCP) and User DatagramProtocol (UDP) are the two fundamental Transport Layerprotocols used to transmit video data through the network.TCP is a reliable protocol but delay and bandwidth con-sumption increase due to re-transmissions of lost packets,which further increase the likelihood of packet loss. For ex-ample, HTTP-based streaming video relies on TCP. Muchwork has been done to hide or reduce delay caused by TCP[5–7], but this remains a major problem for real-time videostreaming. By contrast, UDP offers minimal delay but doesnot guarantee delivery of packets. These lost packets causeerrors that propagate to subsequent frames.

Although considerable amount of research has been doneon both TCP and UDP to improve video streaming in gen-eral, little attention has been paid to utilize the advantagesof using both TCP and UDP for wireless video streaming.Recently, Porter and Peng proposed a Hybrid TCP/UDPstreaming method, which relies on TCP to transmit higherpriority data and UDP to transmit lower priority data [8].However, they did not actually implement their method ina realistic network environment and instead used a tool torandomly remove data from an encoded video locally in or-der to simulate packet loss caused by UDP. This evalua-tion process lacks rigor and prevents them from providing

!"!"!" !"!"!"

#$%"

Figure 1: Typical GOP structure. Each arrow indicatesthe relationship between current (i.e., predicted) frame andreference frame(s).

meaningful results such as visual quality and buffering timecaused by using both TCP and UDP.

In contrast to previous research that rely on either UDPor TCP, this paper presents a new video streaming methodcalled Flexible Dual-TCP/UDP Streaming Protocol (FDSP)that utilizes the benefits of combining TCP and UDP. FDSPachieves a balance between visual quality and buffering timeby sending important data via TCP and less important datavia UDP. In order to validate our idea, FDSP was imple-mented and tested on a complete video streaming simula-tor using HD videos. Our results show that the proposedmethod has advantage over traditional pure-UDP and pure-TCP streaming methods. Although this paper focuses onstreaming real-time HD video in a wireless ad-hoc networkenvironment, FDSP can also be applied to sub-HD videosand other types of networks.

The rest of the paper is organized as follows: Sec. 2 presentsa background on H.264, streaming protocols, and packeti-zation methods. Sec. 3 discusses the related work. Theproposed streaming method is presented in Sec. 4 and ourexperimental study and results are discussed in Sec. 5. Fi-nally, Sec. 6 concludes the paper and discusses possible fu-ture work.

2. BACKGROUNDThis section provides the background information neces-

sary to understand the proposed FDSP, and the relationshipbetween how H.264 video is encoded and streamed and theeffect of packet delay and loss on its visual quality.

2.1 Features of H.264H.264 is the state-of-the-art video compression standard.

Compared to its predecessors, H.264 provides more aggres-sive compression ratio and has network-friendly features thatmake it more favorable for mobile video streaming.

There are several characteristics of H.264, and video com-pression in general, that are important for efficient wirelessvideo streaming. The two most important characteristicsare the syntax for encoding video data to bitstream dataand how the bitstream is packetized and transmitted. Theseissues are discussed below.

2.1.1 I, P, and B-FrameAn encoded video stream consists of a sequence of Group

of Pictures (GOPs) as shown in Fig. 1. Each GOP consists ofan intra-frame (I-frame), predicted-frames (P-frames), andbi-predicted frames (B-frames). An I-frame contains all thedata required to reconstruct a complete frame and does notrefer to other frames. Conversely, P- and B-frames require

SPS PPS IDR Slice Slice SliceNetwork Abstraction Layer PPS Slice

VCL NAL Units

Slice LayerSlice

Header Slice Data

MB MB MB MB MB MB MB ...

Macroblock Layer

skip indication

Type Prediction Coded Block Pattern QP Residual

Intra mode (s)

INTRA

Reference Frames Motion vectors

INTER

Luma blocks Cb blocks Cr blocks

Figure 2: H.264 bitstream syntax [9].

reference frame(s) during decoding. If a reference frame con-tains errors, these errors will propagate through subsequentframes that refer to this frame. Since an I-frame does not de-pend on any other frame, error propagation will cease whena new I-frame arrives. Consequently, I-frames should begiven higher priority if possible during video streaming.

2.1.2 H.264 Bitstream Syntax and the Effect of DataLoss on Video

The hierarchical structure of the H.264 bitstream syntaxis shown in Fig. 2. Network Adaption Layer (NAL) con-sists of a series of NAL units. Three common NAL unitsare Sequence Parameter Set (SPS), Picture Parameter Set(PPS), and slice. SPS contains parameters common to anentire video, such as profile and level the coded video con-forms to. Therefore, if SPS is lost, then the entire videocannot be decoded. PPS contains common parameters thatare applied to a sequence of frames, such as entropy codingmode employed. If PPS for a sequence of frames is lost,then these frames cannot be decoded. A slice is a unit forconstructing a frame, and a frame can have either a singleslice or multiple slices. A slice can be I-slice, P-slice, B-slice,or Instantaneous Decoder Refresh (IDR) slice. An IDR sliceis a special form of I-slice that indicates that this slice can-not reference any slice before it, and is used to clear thecontents of the reference frame buffer. Each slice containsa slice header and a slice data containing a number of mac-roblocks (MBs). Slice header contains information commonto all the MBs within a slice. If a slice header is lost, thenthe entire slice cannot be decoded even if the slice data isproperly received [4, 9].

Fig. 3 illustrates the effect of packet loss on a frame froma HD video clip called “battlefield” streamed via UDP usingVLC media player [10] and analyzed using Wireshark [11]and Elecard StreamEye Studio [12]. Fig. 3a shows the orig-inal transmitted frame, while Fig. 3b shows the receivedframe with some information missing due to packet loss.In this example, the slice header for Slice 4 is lost, thus

3/15/2012 Sponsored by LG Display 1

Slice

1

2

3

4

5

6

7

8

(a) The original frame.“Header” is important!

Slice

1

2

3

4

5

6

7

8

Slice4: First 8 dropped, last 7 received

Slice5: Last 2 packets dropped

*Grey area means the slice cannot be reconstructed correctly! 3/15/2012 Sponsored by LG Display 1

Header is lost!

The entire slice is lost

Latter part of slice is lost

(b) The received frame.Figure 3: Effect of slice header loss.

the entire slice cannot be decoded. In contrast, the sliceheader for Slice 5 is received but the last two RTP packetsare lost, which allowed most of the slice to be decoded. Af-terwards, Error Concealment (EC) techniques can be usedto recover the lost information with some artifacts (which isnot shown). Therefore, PPS, SPSs, and slice headers are themost important data, and thus more care should be givento them during video streaming.

2.1.3 Data PartitioningDP is an error-resilience feature in H.264. The coded data

for each slice is placed in three separate data partitions A, B,and C. Partition A contains the slice header and a header foreach MB (i.e., MB type, quantization parameter, and mo-tion vectors), Partition B contains Coded Block Patterns(CBPs) and coefficients for intra-coded MBs, and PartitionC contains CBPs and coefficients for inter-coded MBs [4].To decode Partition B, Partition A must be present. To de-code Partition C, both Partition A and B must be present.DP can be used with Unequal Error Protection (UEP) meth-ods to improve streaming performance. UEP will be dis-cussed further in Sec. 3. Although DP is a powerful tool forerror resiliency, it has not yet been widely adopted because itrequires videos to be re-encoded and 802.11e networks [13].

2.2 Streaming ProtocolsExisting streaming protocols include Real Time Streaming

Protocol (RTSP), HyperText Transfer Protocol (HTTP),Microsoft Media Server (MMS), and Real-time TransportProtocol (RTP). Note that RTSP, HTTP, MMS, and RTP


RTP1

NAL1 NAL2 NAL3

RTP2 RTP3

TS1 TS2 TS3 TS4 TS5 TS6 TS7

Figure 4: Method 1 H264→TS→RTP.

are Application Layer protocols so they do not deliver thestreams themselves. For example, RTP uses UDP or TCPto deliver multimedia data. RTSP, HTTP, and MMS addmore control features for streaming but they also use TCPor UDP to deliver multimedia data.

RTSP allows a client to remotely control a streaming me-dia server. For example, a client can play, pause, and seeka video during streaming. RTSP can be used together withRTP Control Protocol (RTCP) to obtain statistical data onQuality of Service (QoS). Typically, RTSP uses TCP to de-liver control signal and RTP/UDP to deliver multimediadata.

HTTP also allows a client to control streaming, and usesTCP to transmit both multimedia and control data. SinceHTTP uses TCP, packets are never lost. Another advantageof HTTP is that it works across firewalls as the HTTP portis usually turned on. However, HTTP will incur high end-to-end delay when lost packets need to be retransmitted.

RTP typically uses UDP to deliver multimedia data. AnRTP header contains a sequence number and a timestamp.Sequence number is increased by one for each packet sentand is used for packet-loss detection. Timestamp can beused to synchronize multiple streams such as video and au-dio. Note that there is no control functionality by using onlyRTP/UDP.

For our purpose, the focus is on RTP/UDP and RTP/TCPdirect streaming as they are fundamental to all other stream-ing protocols.

2.3 Packetization MethodsPacketization method defines how a H.264 bitstream is en-

capsulated into RTP packets. Different packetization meth-ods can affect video streaming performance and thus visualquality. Two basic packetization methods are discussed be-low.

2.3.1 Method 1 (H264→TS→RTP)Transport Stream (TS) is a legacy format designed to

transport MPEG-2 video streams and is still used to carryH.264 video. This method is illustrated in Fig. 4. First, eachTS packet is filled with H.264 bitstream as much as possi-ble until the limit of 188 bytes is reached. Afterwards, eachRTP packet is filled with as many TS packets as possible un-til the Maximum Transmission Unit (MTU) size is reached.Because the MTU size is 1500 bytes for Ethernet, there aretypically seven TS packets in one RTP packet. Method 1does not consider the H.264 NAL unit structure.

2.3.2 Method 2 (H264→RTP)This method shown in Fig. 5 is specifically designed for

H.264 video. If the size of a NAL unit is less than or equalto the MTU size, one RTP packet contains only one NAL


RTP1

NAL1 NAL2 NAL3

RTP2 RTP3 RTP4

Figure 5: Method 2 H264→RTP.

unit. If the size of a NAL unit is greater than the MTUsize, the NAL unit will be fragmented into multiple RTPpackets. This method also supports NAL unit aggregation,which is used when NAL units are very small. For example,since SPS and PPS have a few bytes at most, they can beaggregated with other NAL units into a single RTP packetto reduce the overhead for headers [4].

Method 2 has several advantages over Method 1 [14]. Themost important one is that intelligently mapping NAL unitsto RTP packets provides better error resilience since the lossof one RTP packet affects only the NAL unit inside thatpacket (unless aggregation is used). In contrast, the size ofRTP packets in Method 1 is fixed regardless of the size ofNAL units. Therefore, multiple NAL units can be corrupted.In addition, Method 2 does not use TS packetization so thereis less packet header overhead. For these reasons, Method 2is used in the proposed FDPS method.

3. RELATED WORKUDP is generally accepted to be more suitable than TCP

for real-time video streaming since it offers low end-to-enddelay for smooth video playout [4, 15]. Although UDP isprone to data loss, multimedia data to a certain degree (un-like traditional data) is loss-tolerant. In addition, a decoderuses EC techniques to reduce the artifacts caused by dataloss. Numerous EC techniques have been developed to re-duce the impact caused by packet loss [16, 17]. However,as discussed in Sec. 2.1.2, if lost packets contain importantdata, such as SPS, PPSs, and slice headers, the decoder sim-ply cannot reconstruct the video even with the aid of EC.

In order to tolerate packet loss caused by UDP, UEP isoften used in UDP-based streaming [4, 18, 19]. UEP aimsto prioritize important data over the others because somesyntax elements are more critical than others. A basic UEPmethod is to send important packets more than once, whichraises the probability for the packets to arrive at the re-ceiver [4]. More advanced UEP methods incorporate For-ward Error Correction (FEC) [18, 19]. By using FEC tocode important packets with redundancy, a receiver can re-cover these lost packets without retransmission. However,FEC introduces additional overhead, which increases net-work bandwidth required to transmit video.

Despite the conventional wisdom that TCP is not de-sirable for streaming, a significant fraction of commercialvideo streaming traffic uses it [20]. TCP provides guaran-teed service so the transmitted packets are always preserved.Nevertheless, TCP’s re-transmission and rate control mech-anisms incur delay, which can cause packets to arrive afterthe playout deadline. A typical solution for this problemis to add a buffer in front of the video decoder. At thebeginning of video streaming, the decoder waits until thebuffer is filled before displaying video to accommodate ini-tial throughput variability or inter-packet jitters. This wait-ing time is called initial buffering. After the decoder startsto decode video data in the buffer, decrease in throughput

within a TCP session may cause buffer starvation. Whenthis happens, the decoder stops displaying video until suf-ficient number of video packets are received. This waitingtime is called rebuffering [5]. Buffering prevents late pack-ets to be dropped; however, network congestion can causelong initial buffering and frequent rebuffering that degradesusers’ experience. Mok et al. performed a subjective as-sessment to measure Quality of Experience (QoE) of videostreaming [21]. Their analysis showed that the frequency ofrebuffering is the main factor responsible for variations inuser experience. Much research has been done on determin-ing the appropriate buffer size to reduce the frequency ofrebuffering [5,6]. Besides buffer size estimation, Brosh et al.presented packet splitting and parallel connection methodsto reduce TCP delay [7].

Another approach to improve wireless video streaming isusing IEEE 802.11e networks, which define a set of QoSenhancements through modifications to the Media AccessControl (MAC) layer [13]. In an 802.11e network, delay-sensitive data such as video and audio can be assigned tohigher priority class. If contention occurs at the MAC layer,smaller contention window size is used to transmit data withhigher priority, and thus lower transmission delay can beachieved. 802.11e is specially tailored for multimedia, but ithas not been widely adopted perhaps due to the hardwarechanges required.

To the best of our knowledge, the work closest to ourswas done by Porter and Peng [8]. The authors proposedthe Hybrid TCP/UDP method that prioritizes video databy using H.264 Data Partitioning (see Sec. 2). However,our proposed FDSP method has a couple of advantages overtheir method. First, FDSP is more flexible because it is nottied to DP. FDSP integrates a H.264 syntax parser withinthe streamer to segregate SPS, PPSs, and slice headers fromrest of data. Thus, videos do not have to be re-encodedand the network does not have to support DP. Moreover,any syntax element from a video stream can be segregatedand prioritized. For example, some of the slice data (inaddition to SPS, PPS and slice header can be segregatedand prioritized to further improve visual quality. In contrast,DP strictly defines contents of the three partitions and sowhat can be prioritized are fixed. Second, authors did notimplement their proposed method as a complete system thatincludes a network simulator. For example, they used amodified rtp loss utility (provided by JM reference software)to preserve partition A and randomly drop other partitionsfor a given video to simulate network behavior.

In contrast, FDSP was implemented on a complete videostreaming simulator (see Sec. 4) that simulates realistic net-work scenarios and provides more insight on frame-by-framebehavior and buffering time, which are key indicators ofvideo streaming QoE. Moreover, our simulation study isbased on HD video, which is state-of-the-art and is also moredemanding in terms of bandwidth and delay requirements.

4. FLEXIBLE DUAL-PROTOCOL VIDEOSTREAMING

The system diagram of the proposed FDPS is shown inFig. 6. The Sender consists of an encoder, MUX, Dual Tun-neling (UDP+TCP), and the H.264 syntax parser. MUXtogether with the H.264 Syntax Parser are responsible forsegregating critical data from the video bitstream and steer-

Figure 6: Flexible Dual-protocol Streaming system diagram.

Video Source Filter

Video Writer Filter

QualNet Connector

Sender Receiver Network

DirectShow Multimedia Module

QualNet

DirectShow Connector

Figure 7: The general structure of OEFMON.

ing the two parts to UDP and TCP tunnels. Dual Tunnel-ing keeps both UDP and TCP sessions active during videostreaming. The Receiver consists of Dual Tunneling, DE-MUX and a decoder. DEMUX re-orders the packets, mergesUDP packets with TCP packets, and sends them to the De-coder, which is implemented using FFmpeg [22].

The proposed FDPS was implemented within Open Evalu-ation Framework for Multimedia Over Networks (OEFMON)[23], which integrates the DirectShow multimedia module [24]and the QualNet network simulator [25]. A simplified dia-gram of OEFMON is shown in Fig. 7. The main componentsused by FDSP are QualNet Connector, Video Source Filter,and Video Writer Filter. The QualNet Connector is respon-sible for RTP paketization. The Video Source Filter readsan H.264 file and sends the data to the QualNet Connectorand the Video Writer Filter writes the decoded frame datato a raw video file. The detailed discussion of OEFMONcan be found in [23].

The following subsections describe the implementation ofthe key components of FDSP.

4.1 Dual Tunneling (UDP+TCP)OEFMON already implements UDP streaming in the Qual-

Net network simulator. In order to implement Dual Tunnel-ing, the existing code for UDP streaming required modifi-cation and a TCP streaming module needed to be imple-mented. QualNet is a discrete-event simulator and an eventis represented by a data structure called MESSAGE. The orig-inal code already contains MESSAGE for UDP and there is apair of MESSAGEs (one for the sender and another for the re-ceiver). The changes required for UDP mainly involved re-structuring the code to handle the corresponding MESSAGEs.However, the implementation of TCP requires more MES-

SAGEs because it uses three-way handshaking. MESSAGEs such

HDR HDR HDR

HDR H DR HDR

Send 3 RTP packets via TCP:

Receive 5 TCP segments:

Figure 8: TCP Fragmentation.

as MSG_TRANSPORT_FromAppOpen (request to open TCP socket)and MSG_TRANSPORT_FromAppListen (respond to request) mustbe properly handled before transmitting video data. To en-able Dual Tunneling, the functions for handling both UDPand TCP MESSAGEs were implemented inside a single appli-cation file called app_fdspvideo.cpp in QualNet.

4.2 TCP Segment ReassemblyTCP is a stream-oriented protocol (instead of being packet-

oriented) and data is viewed as an unstructured, but or-dered, stream of bytes [26]. Because TCP does not pre-serve data message boundaries, an RTP packet carried byTCP may be divided into several segments. Fig. 8 illustratesthe fragmentation caused by TCP. In order to recover RTPpackets from a TCP byte stream, TCP segments must bereassembled.

The length information of a UDP packet is required toaccomplish TCP segment reassembly. Because the standardUDP header does not include length information, the 12-byte RTP header is extended by two bytes to indicate thelength of each UDP packet. This allows the receiver to de-termine whether or not the current TCP segment containsa complete RTP packet.

The algorithm for TCP reassembly involves first checkingwhether an RTP header is complete. If not, it waits fornext TCP segment(s) to complete the RTP header. Oncethe RTP header is complete, it determines whether the RTPpayload is complete using the RTP length information. Ifnot, it waits for the next TCP segment(s) to complete theRTP payload.

4.3 H.264 Syntax ParserThe H.264 Syntax Parser was developed based on an open

source library called h264bitstream [27]. The parser was im-plemented within QualNet and linked to app_fdspvideo.cpp.Before streaming, the parser parses the video bitstream andreturns its syntax information (such as start address andlength and type of each NAL unit) as input to MUX. Dur-ing streaming, each NAL unit is encapsulated into an RTPpacket by the QualNet Connector in OEFMON. At the sametime, MUX uses the stored syntax information to determinewhether an RTP packet that contains a NAL unit is SPS,PPS or slice header. If an RTP packet contains an importantNAL unit, MUX will steer it to the TCP tunnel; otherwise,the packet will be steered to the UDP tunnel.

4.4 MUX/DEMUXThe implementation of MUX and DEMUX is relatively

straightforward. At the sender, MUX takes as input thesyntax information generated from the H.264 Syntax Parser.Based on this, MUX steers the corresponding RTP packetsto either TCP or UDP tunnel. At the receiver, DEMUXfirst checks the timestamps of received RTP packets fromthe UDP tunnel and drops late RTP packets. If an RTPpacket arrives in time, DEMUX merges this packet with

1 2 Video Stream 200m

3 4 CBR 200m

10m


3 4 CBR1 200m

10m

5 6 200m

10m

CBR2


3 4 CBR 200m

10m

5 6 200m CBR

200m

Configuration 1 Configuration 2

Configuration 3

Figure 9: Simulated network scenarios.

other RTP packets from the TCP tunnel. DEMUX keepsmerging on-time RTP packets and forms the final receivedvideo file.

5. EXPERIMENT SETUP AND RESULTSThe primary video selected for our experiments is 360

frames of raw HD YUV video (12 seconds of 1920×1080@30fps) from an “African Cats” trailer. The YUV file is en-coded using x264 with an average bitrate of 4 Mbps and sin-gle slice per frame1. Using OEFMON, a 802.11g ad-hoc net-work with 18 Mbps bandwidth was setup and three networkscenarios were created to evaluate video streaming perfor-mance. In Scenario 1, one pair of nodes stream the primaryvideo over the network. At the same time, a second pair ofnodes generates a 10 Mbps of constant bitrate (CBR) dataas background traffic. Scenario 2 adds one more CBR dataof 4 Mbps so that the network becomes saturated. Scenario3 repeats the network traffic of Scenario 1, but the nodesare now positioned in a classic hidden-node arrangement.The placement of devices for three scenarios are shown inFig. 9. After video streaming completes, the sent and re-ceived video files are decoded using FFmpeg and PSNR iscomputed between the two YUV files using Avisynth [28].Since PSNR calculation for missing frames and two identi-cal frames are not well defined, this paper uses 0 dB as thePNSR value for missing frames and follows the method usedby Avisynth that uses 111 dB to indicate perfect PSNR.In addition to PSNR information, initial buffering and re-buffering are recorded to evaluate end-to-end delay.

Our main objective for the experiments is to show the ad-vantage of the proposed FDSP over traditional pure-UDPand pure-TCP streaming methods. For FDSP, all the im-portant data (SPS, PPSs, and slice headers) will be first sentvia TCP and then the rest of data will be sent via UDP. Thetime spent sending all the important data in FDSP is treatedas initial buffering. For the pure-TCP method, a buffer isadded to simulate initial buffering and rebuffering. In or-der to compare between FDSP and pure-TCP, the size ofthe buffer for pure-TCP is properly adjusted so that bothmethods have the same initial buffering time.

The PSNR comparison for Scenario 1 is shown in Fig. 10.The figure contains PSNR values as well as frame sizes forthe three streaming methods. As expected, pure-UDP hasthe worst PSNR with an average of 54 dB, while FDSP

1The current version of OEFMON does not support multipleslices.

0 50 100 150 200 250 300 350

−100

−50

0

50

100

Frame Number

PS

NR

0

1

2

3

4

5

6

7

8x 10

6

Fra

me

Siz

e (b

yte)

FDSPpure−UDPpure−TCP

I

III

I

t tt t

e

I

Frame134−146

Figure 10: PSNR comparison for Scenario 1.

StreamingMethod

Initial Buffer-ing (sec.)

RebufferingCount

Avg. Rebuffer-ing Time (sec.)

FDSP 1.1 0 N/Apure-UDP 0.0 0 N/Apure-TCP 1.1 3 1.2

Table 1: Buffering comparison for Scenario 1.

achieves 79 dB. This clearly illustrates the advantage ofFDSP over pure-UDP in terms of visual quality. Pure-TCPhas perfect PSNR of 111 dB because there is no packet loss.

In Fig. 10, the PSNR trends for FDSP and pure-UDPare in general similar. The transition points (marked as t)between non-perfect PSNR and perfect PSNR match fairlywell with the position of I-frames (marked as“I”). This is be-cause I-frames usually have much larger data size than P- orB-frames. If the network does not have enough bandwidth, acertain portion of the packets for an I-frame will most likelybe lost or delayed, which reduces PSNR down from 111 dB.On the other hand, if an I-frame is received without anyloss, it indicates that the network has enough bandwidth totransmit large frames. Therefore, the subsequent P- or B-frames are unlikely be lost under the same network conditionsince they are smaller than I-frames. Thus, PSNR remainsat 111 dB until the next I-frame. An exception to this sim-ilarity starts at point e, which is caused by a packet loss.The packet-level log in OEFMON indicates the last packetfor this I-frame is lost for the pure-UDP streaming, thusPSNR is not perfect. Due to error propagation, the sub-sequent frames also cannot achieve perfect PSNR until thenext I-frame. Moreover, in contrast to FDSP, pure-UDP hasmany missing frames whose PSNR value is 0. The packet-level log indicates that those frames having PSNR value of0 dB are caused by slice header loss. This again shows theimportance of slice header and the advantage of FDSP.

The delay analysis for Scenario 1 is shown in Table 1.Although pure-TCP and FDSP have same initial bufferingtime, pure-TCP incurs very frequent rebuffering. During 12seconds of video streaming, rebuffering occurs three timesand each lasts on average 1.2 seconds. As mentioned inSec. 3, frequency of rebuffering is the main factor responsiblefor the variations in users’ experience. Such a high frequencyof rebuffering can be very annoying even though pure-TCP

0 50 100 150 200 250 300 350

−100

−50

0

50

100

Frame Number

PS

NR

0

1

2

3

4

5

6

7

8x 10

6

Fra

me

Siz

e (b

yte)



StreamingMethod


Rebufferingcount




provides perfect visual quality. This is a clear advantage ofFDSP over pure-TCP.

For Scenario 2, the network was saturated by introduc-ing another CBR data of 4 Mbps. The PSNR and delaycomparisons are shown in Fig. 11 and Table 2, respectively.The pure-UDP method has a very low average PSNR valueof 16 dB mainly because there are only 120 frames out of360 that can be reconstructed by FFmpeg. In contrast, all360 frames are reconstructed using FDSP, which results inmuch better PSNR value of 44 dB. FDSP experiences nomissing frames because all the slice headers are properly re-ceived. For example, Fig. 11 shows that frames 134-146 arelost for pure-UDP, but these frames are not lost in FDSP.The packet-loss log in OEFMON shows that both pure-UDPand FDSP lose all the packets from UDP tunnel for theseframes. The only difference is that FDSP properly receivesthe slice headers for these frames from the TCP tunnel. Asdiscussed before, the presence of slice headers is critical for adecoder to reconstruct a frame. Once a slice header is prop-erly received, the decoder can use various EC techniques toconceal missing MBs even if rest of data is lost. For example,Fig. 12 shows the decoded frame 134. The upper-left water-mark shows the frame number 134, which is the informationretrieved from the packet that contains the slice header. Thelower-right watermark shows frame number 132, which in-dicates that FFmpeg using EC copied information from theprevious frame 132 to the current frame 134.

Our delay analysis shows that initial buffering time forboth FDSP and pure-TCP increases to 1.6 seconds, which iscaused by network saturation. However, rebuffing for pure-TCP occurs six times and each lasts on average 1.6 seconds.This implies that FDSP is very effective in a congested net-work because pure-UDP and pure-TCP tend to be unaccept-able in terms of visual quality and delay, respectively. Al-though FDSP incurs delay and results in non-perfect PSNR,

Figure 12: The decoded Frame 134 for FDSP.

0 50 100 150 200 250 300 350

−100

−50

0

50

100

Frame Number

PS

NR

0

1

2

3

4

5

6

7

8x 10

6

Fra

me

Siz

e (b

yte)



it effectively achieves a balance between delay and visualquality.

Fig. 13 and Table 3 show the PSNR and delay compar-isons for Scenario 3, respectively. As can be seen, PSNR anddelay further degrade compared to Scenarios 1 and 2. Thisis caused by packet collisions resulting from the hidden-nodeeffect. For pure-UDP, the average PSNR value is only 3 dBand there are total of 320 missing frames out of 360. FDSPhas an average PSNR value of 32 dB, which is significantlybetter than pure-UDP. Again, FDSP experiences no miss-ing frames and this is due to prioritizing slice headers usingTCP. Our delay analysis shows that hidden node further in-creases initial buffering time to 1.9 seconds for FDSP andpure-TCP. In addition, although pure-TCP has the samerebuffering count, each lasts on average 2.1 seconds.

As mentioned earlier, the current implementation of FDSPwill first send all the important data via TCP and then sendrest of data via UDP. For the test video, the initial buffering(i.e., the time spent sending all the important data) is lessthan 2 seconds. However, initial buffering can be quite longwhen streaming an entire movie. This issue can be solved

StreamingMethod


RebufferingCount




by adding a special buffer. The streaming process for FDSPwith the special buffer is as follows. First, an entire moviewill be divided to serval sub-streams with a fixed length (i.e.,n seconds). FDSP will send all the important data for thefirst n-second sub-stream via TCP and then rest of data forthe first n-second sub-stream via UDP. As long as FDSP isnot sending any UDP packet, it will send important data fornext n-second video sub-streams via TCP. FDSP will stopand perform re-buffering only if the important data for an-second sub-stream is not in the special buffer. This solu-tion is similar to pure-TCP but the special buffer only storesimportant data instead of all the video data. Because thedata required to send via TCP for FDSP is significantly lessthan pure-TCP, FDSP will experience much less rebufferingthan pure-TCP. In addition, the size of the special buffer isdetermined by the fixed length (n seconds) of sub-streamsand the choice of buffer size should be based on networkconditions as discussed in [5]. The implementation of thisspecial buffer is left as future work.

6. CONCLUSION AND FUTURE WORKThis paper presented a new streaming method called FDSP,

which utilizes the benefit of both UDP and TCP. The pro-posed FDSP was implemented within OEFMON and com-pared against traditional pure-UDP and pure-TCP methods.A 1920×1080 @30fps HD video clip was used to evaluatethe three streaming methods in an ad-hoc network environ-ment. Three network scenarios were constructed and foreach scenario both delay and visual quality were analyzed.Our analysis shows that FDSP achieves higher PSNR thanpure-UDP and less buffering time than pure-TCP for allthree scenarios. This shows that FDSP is effective in strik-ing a balance between delay and visual quality and thus hasadvantage over pure-UDP and pure-TCP methods.

As a future work, the special buffer will be added to FDSPto handle rebuffering and evaluated by streaming longer HDvideos. The frequency of rebuffering for FDSP is expectedto be very low because the data required to send via TCP issignificantly less than the pure-TCP method. In addition,the function for supporting multiple slices per frame will beadded to OEFMON. This will allow us to explore whetherFDSP can provide better performance in terms of delay andvisual quality by using videos encoded with multiple slicesper frame.

AcknowledgmentThis research was supported in part by LCD Laboratory,LG Display Co. Ltd, Korea.

7. REFERENCES[1] AirPlay. Available at

http://www.apple.com/itunes/airplay/.

[2] How to Connect Laptop to TV with Intel WirelessDisplay (WiDi). Available athttp://www.intel.com/content/www/us/en/architecture-and-technology/intel-wireless-display.html.

[3] ViVu. Available athttp://www.cavium.com/PureVu WiVu-Solution.html.

[4] S. Wenger. H.264/AVC over IP. IEEE Transactionson Circuits and Systems for Video Technology,13(7):645 – 656, July 2003.

[5] T. Kim and M. H. Ammar. Receiver BufferRequirement for Video Streaming over TCP. InProceedings of Visual Communications and ImageProcessing Conference, pages 422–431, 2006.

[6] X. Shen, A. Wonfor, R. V. Penty, and I. H. White.Receiver Playout Buffer Requirement for TCP VideoStreaming in the Presence of Burst Packet Drops. InLondon Communications Symposium, 2009.

[7] E. Brosh, S. A. Baset, V. Misra, D. Rubenstein, andH. Schulzrinne. The Delay-Friendliness of TCP. ACMSIGMETRICS Performance Evaluation Review,36(1):49–60, June 2008.

[8] T. Porter and X. H. Peng. Hybrid TCP/UDP VideoTransport for H.264/AVC Content Delivery in BurstLoss Networks. In 2011 IEEE InternationalConference on Multimedia and Expo (ICME), pages1–5, July 2011.

[9] I. E. Richardson. The H.264 Advanced CompressionStandard. John Wiley and Sons, Ltd., second edition,2010.

[10] Video LAN Client. Available athttp://www.videolan.org/vlc/index.html.

[11] Wireshark. Available at http://www.wireshark.org/.

[12] Elecard. Available athttp://www.elecard.com/en/index.html.

[13] IEEE Standard for Information Technology -Telecommunications and Information ExchangeBetween Systems - Local and Metropolitan AreaNetworks - Specific Requirements Part 11: WirelessLAN Medium Access Control (MAC) and PhysicalLayer (PHY) Specifications Amendment 8: MediumAccess Control (MAC) Quality of ServiceEnhancements. IEEE Std 802.11e-2005 (Amendmentto IEEE Std 802.11, 1999 Edition (Reaff 2003), 2005.

[14] A. MacAulay, B. Felts, and Y. Fisher. Whitepaper –IP Streaming of MPEG-4: Native RTP versusMPEG-2 Transport Stream.http://http://www.envivio.com/files/white-papers/RTPvsTS-v4.pdf,2005.

[15] D. Wu, Y. T. Hou, W. Zhu, Y. Q. Zhang, and J. M.Peha. Streaming Video over the Internet: Approachesand Directions. IEEE Transactions on Circuits andSystems for Video Technology, 11(3):282 –300, March2001.

[16] Y. Wang and Q. F Zhu. Error Control andConcealment for Video Communication: A Review.Proceedings of the IEEE, 86(5):974–997, May 1998.

[17] Y. Xu and Y. Zhou. H.264 video communication basedrefined error concealment schemes. ConsumerElectronics, IEEE Transactions on, 50(4):1135 – 1141,Nov. 2004.

[18] A. Nafaa, T. Taleb, and L. Murphy. Forward ErrorCorrection Strategies for Media Streaming overWireless Networks. IEEE Communications Magazine,46(1):72–79, Jan. 2008.

[19] J. Kim, R. M. Mersereau, and Y. Altunbasak.Distributed Video Streaming Using MultipleDescription Coding and Unequal Error Protection.IEEE Transactions on Image Processing,14(7):849–861, July 2005.

[20] B. Wang, J. Kurose, P. Shenoy, and D. Towsley.

Multimedia Streaming via TCP: An AnalyticPerformance Study. ACM Transactions on MultimediaComputing, Communications and Applications,4(2):16:1–16:22, May 2008.

[21] R. K. P. Mok, E. W. W. Chan, and R. K. C. Chang.Measuring the Quality of Experience of HTTP VideoStreaming. In 2011 IFIP/IEEE InternationalSymposium on Integrated Network Management, pages485–492, May 2011.

[22] FFmeg. Available at http://ffmpeg.org.

[23] C. Lee, M. Kim, S. J. Hyun, S. Lee, B. Lee, andK. Lee. OEFMON: An Open Evaluation Frameworkfor Multimedia Over Networks. IEEECommunications Magazine, 49(9):153–161, Sept. 2011.

[24] DirectShow. Available athttp://msdn.microsoft.com/en-us/library/windows/desktop/dd3754549.aspx.

[25] QualNet. Available at http://www.scalable-networks.com/content/products/qualnet.

[26] James F. Kurose and Keith W. Ross. ComputerNetworking: A Top-Down Approach. Addison-WesleyPublishing Company, USA, 5th edition, 2009.

[27] h264bitstream. Available athttp://h264bitstream.sourceforge.net/.

[28] AviSynth. Available athttp://avisynth.org/mediawiki/Main Page.

flexible dual tcp/udp streaming for h.264 hd...

Documents