[ieee 2008 ieee international symposium on consumer electronics - (isce 2008) - vilamoura, portugal...

Delivery of H264 SVC/MDC streams over Wimax and DVB-T networks

Victor Domingo Reguant, Francesc Enrich Prats, Ramon Martín de Pozuelo, Francesc Pinyol Margalef, and Gabriel Fernández Ubiergo

GTAM-Grup de Recerca en Tecnologies Audiovisuals i Multimedia.

ENGINYERIA I ARQUITECTURA LA SALLE. UNIVERSITAT RAMON LLULL.

Quatre Camins 2, 08022 Barcelona (Spain)

Email: {frane,victord,fpinyol,gabrielf}@salle.url.edu

ABSTRACT

This paper introduces an approach for the optimal delivery

(encapsulation and signalling) of video streams coded using

H.264 Scalable Video Coding (SVC) combined with

Multiple Description Coding (MDC). The solution

presented uses optimization and control strategies

depending on the different type of delivered services, the

terminals that will consume these services, the load process

of the video servers and the network conditions.

Index Terms— RTP, SVC, MDC, H264, MPEG-4

1. INTRODUCTION

This paper is related to the work performed under the

framework of the SUIT (Scalable, Ultra-fast and

Interoperable Interactive Television) European project

(IST-4-028042). In the SUIT project, two SVC (Scalable

Video Coding) [7,8] descriptions from a given content are

generated, being each one sent through a different network

(DVB-T and WiMAX). Due to the use of SVC, each

description contains a scalable version of the encoded

content. Thanks to the convergence of broadband and

broadcast networks, the receiving terminal , which may be

connected to both networks, can then combine the

descriptions while coping with transmission errors or

improving the quality of the received video. SUIT project

includes an end-to-end chain composed of a playout

system, broadcast and broadband networks, and terminals

of different computational and display capacities.

This paper focuses on the development of the playout

system; the system has to encapsulate, signal and

synchronise the different SVC descriptions to be optimally

delivered over the different networks according to network

conditions and the available bit rate.

2. PLAYOUT ARCHITECTURE

A summary of the playout architecture illustrated in figure

1 is presented in this section. The playout system includes

several H.264 SVC/MDC video servers delivering

broadcast, unicast and multicast services. All the video

servers are connected to a switch that multiplexes all the

input streams into two output streams, one connected to a

WiMAX base station, and the other one connected to a

DVB-T base station. The DVB-T transmission also

implements an RF return channel using DVB-RCT. The

pair of descriptions for each service are encapsulated and

synchronised using the RTP/RTCP (Real-time Transport

Protocol / Real-time Transport Control Protocol) [3,4,5]

protocol for H.264/SVC [7,8], in such a way that the

terminal is able to recover them correctly combining the

service descriptions. The SUIT playout has a distributed

architecture, which makes the managing of several video

servers easier. To ensure load sharing, the different services

are automatically distributed among the available video

servers.

Figure 1. SUIT playout architecture

2.1. Suit services

In order to demonstrate its performance, the playout allows

the management and configuration of several types of

services:

• Scalable robust SVC/MDC broadcast service: one

SVC description is broadcasted over DVB-T and the

other description over WiMAX network.

• Broadcast/QoS on demand: one SVC description is

broadcasted over DVB-T and the other one is sent by

unicast over WiMAX to the user that requests more

video quality.

• Broadband streaming Video on Demand (VoD):

one SVC description is sent on demand over WiMAX

network upon request.

The bit rates for each type of service are shown in table 1.

Table 1. Services bit rates. SVC= HD: 1280x704p-25 Hz (4

Mbps); SD: 640x352x25 (1.5 Mbps); CIF: 320x176x25 (0.5

Mbps)

DVB-T WiMAX

Service Bit Rate

(Mbps)

Service

Bit Rate

(Mbps)

1 D SVC HD Real

Time Broadcasting

4 – 6 2 D SVC HD Real

Time Broadcasting

0.5 – 6

1 D SVC HD

Recorded

Broadcasting

4 – 6 2 D SVC HD

Broadcasting (on

QoS demand)

0.5 – 6

1 D SVC SD Real

Time Broadcasting

1.5 – 2 2 D SVC SD Real

Time Broadcasting

0.5 – 2

1 D SVC SD

Recorded

Broadcasting

1.5 – 2 2 D SVC SD

Recorded

Broadcasting

0.5 – 2

Streaming (VoD) 0.5-6 p.u

Total 11-20 Total 2.5-16

3. RTP ENCAPSULATION AND

SYNCHRONIZATION

The playout system has to correctly encapsulate,

synchronise and signal the SVC/MDC contents delivered

through separate networks. Two H.264/SVC streams are

encapsulated into different descriptions being each one

delivered in different RTP sessions. Thus, the receiver is

able to decapsulate the incoming RTP descriptions and

combine them to obtain the main stream.

RTP encapsulation has been done following the newest

RFC [3,4,5] and taking into account the different video

parameters and services in order to achieve a better

performance of the transmission.

In the figure 2 the RTP encapsulation process is shown.

The extractor module gets the SVC NAL (Network

Abstraction Layer) units belonging to the available

descriptions files. Then, thanks to the video CGS (Course

Grain Scalability) coding, it delivers the NAL units that

match a specific target bit rate.

The RTP encapsulator has to encapsulate the NAL units

belonging to each description in different RTP sessions,

using the appropriate packetization mode, IP packet length

and RTP packet type.

Figure 2. RTP encapsulator process

According to RTP SVC standard [3] the following

packetizations modes are defined:

• Single NAL unit mode The transmission order of single NAL unit packets must

comply with the NAL unit decoding order.

• Non-interleaved mode NAL units are transmitted in NAL unit decoding order.

• Interleaved mode The interleaved mode allows transmission of NAL units out

of NAL unit decoding order. This mode is not used in SUIT

due to the fact that different layers of the SVC bitstream are

transported in the same RTP packet stream.

The following NAL unit types are also defined for RTP

packet payload:

• Single NAL Unit Packet

The single NAL unit packet contains only one NAL unit. A

NAL unit stream composed by decapsulating single NAL

unit packets in RTP sequence number order must conform

to the NAL unit decoding order.

• Aggregation Packets: This packetization mode is introduced to prevent media

transcoding between different networks, and to avoid

undesirable packetization overhead. This type of packets is

not used in SUIT because few times the NAL units can be

aggregated due to its big size.

• Fragmentation Units (FU):

This payload type allows fragmenting a NAL unit into

several RTP packets. Doing so on the application layer

instead of relying on lower layer fragmentation has the

advantage that the payload format is capable of transporting

NAL units bigger than 64 kbytes (the largest possible size

for IP packets) over an IPv4 network. This is useful to

deliver High Definition (HD) formats (there is a limit of the

number of slices per picture, which results in a limit of

NAL units per picture, which may result in big NAL units).

Usually in SD (Standard Definition) and CIF (Common

Intermediate Format) services the RTP encapsulator uses

the Single NAL unit packets (one NAL unit per RTP

packet) but for HD services it is better to use fragmentation

following the FU-A mode [3,5] (NAL unit fragmented into

several RTP packets) at the MTU size because some Intra

frames can be bigger than the maximum IP packet size.

Specifically to SVC [7,8], it is important to note that the

base layer and the enhancement layer are stamped with the

same timestamp in the RTP header (except for temporal

scalability layers). Therefore, for a given frame-rate we

have several layers (base layer, spatial layers, SNR layers)

which have to be timestamped with the same time, as

depicted in Figure 3.

In order to synchronise the two descriptions delivered by

the RTP encapsulator it is not necessary to send RTCP [4],

due to the fact that both descriptions have been generated

by the same system; the receiver only needs its own clock

at 90 KHz and the timestamp delivered in each RTP packet

to synchronise them. The combining process at the decoder

requires both descriptions to have the same timestamp.

Figure 3. H264/SVC layers

4. SIGNALLING OF RTP SVC/MDC STREAMS

The signalling of SVC/MDC streams is based on the SDP

(Session Description Protocol) [10]. The purpose of SDP is

to convey information about media streams in multimedia

sessions. As there is no RFC standard to describe SVC and

MDC streams, an IETF draft, called draft-schierl-mmusic-

layered-codec [11] has been considered in SUIT project.

This draft extends the SDP specification to support

signalling of relationships between media. It enables

signalling decoding dependency of different media

descriptions with the same media type in SDP. In SUIT,

different reasons can be envisioned, for example the

transporting of bitstream partitions of a hierarchical media

coding process (also known as layered media coding

process) or of a multi description coding (MDC) in

different network streams. The basic idea for all cases is

the separation of partitions of a media bitstream to allow

scalability in network elements. The two types of media

dependencies in SUIT are the following:

• Layered/hierarchical decoding dependencies: In

SUIT, one or more layers may be transported over

WiMAX or DVB-T network streams depending on the

available bit-rate of each network. The receiver selects

the required layers conveyed in the RTP session in

response to quality or bit-rate requirements. The base

layer, which is self-contained, can be decoded without

any dependency. In SUIT the signalling of the existing

layers in the media stream is done in-band using

specific NAL unit types.

• Multiple description decoding dependencies: In the

most basic form of multiple descriptive coding (MDC),

each partition forms an independent representation of

the media. In SUIT each partition represents an SVC

stream. That is, decoding of any of the partition yields

useful reproduced media data. The SUIT combiner can

parse the SDP file to be aware whether or not more

than one partition is available, then it can process them

jointly, and the resulting media quality increases. The

highest reproduced quality is available if all original

partitions are available for decoding. An SDP example

is shown in Figure 4.

v=0

o=mdcsrv 289083124 289083124 IN IP4

s=MULTI DESCRIPTION VIDEO SIGNALING

t=0 0

a=group:DDP 1 2

m=video 40000 RTP/AVP 96

c=IN IP4 224.2.17.12/127

a=mid:1

a=depend:mdc

m=video 40002 RTP/AVP 96

c=IN IP4 224.2.17.13/127

a=mid:2

a=depend:mdc

Figure 4. SDP file example for MDC dependency

5. OPTIMAL BIT RATE MANAGEMENT IN THE

PLAYOUT

Regarding the system bit rates control, the aim of the

playout is to maximise the DVB-T and WiMAX network

bit rates, ensuring always the best performance of the

system and providing the best image quality to the user.

As the system is based on the scalable video coding

extension of H.264/AVC (SVC), the data rate of each

service can be easily adapted depending on the available

bandwidth or the terminal capabilities.

The playout can take advantage of SVC techniques in order

to reduce the needed bit rate to serve clients of

heterogeneous capabilities at the same time, while

consuming less bit rate compared to simulcasting the

services.

The MDC techniques used together with SVC provide

more robustness to the video sequences at the cost of an

increase in bit rate. The playout is able to decide whether to

sacrifice the bit rate used to send a second description, and

thus the robustness, in favour of accommodating internet or

new VoD requests in the WiMAX network.

The bit rate management is based on a priority policy

applied to the different services; the playout has the ability

to change in real time the service bit rates in order to

provide the best QoS while maximizing the networks

throughput. These policies, defined for each service, are

taken into account by the playout algorithms. As the

intelligent playout has also the knowledge of the total bit

rate amount delivered in each network, it is able to control

and optimise the load of the services. The playout will deal

with different type of services (broadcast, multicast and

unicast) to solve, in real time, the optimal content

distribution according to available bit rates, service priority,

networks characteristics and terminal capabilities.

6. RESULTS

In order to demonstrate the benefits of using the FU-A

fragmentation defined in [3,5], the following test bed

configuration has been performed. The video server has

been connected to a terminal through the WiMAX/DVB-T

emulator developed by the Centre for Communication

Systems Research (CCSR) University of Surrey. This

emulator has been developed in the framework of the SUIT

project and uses some error patterns to emulate WiMAX

and DVB modulation conditions. In the tests a three spatial

layer SVC video of 15s duration has been used. (CIF:

500Kbps, SD: 2 Mbps, HD: 6 Mbps)

The video is delivered using the unicast mode to the

terminal at different bit rates and changing the modulation

parameters. The result of the tests shows that using

fragmentation decreases packet losses. In the tests we

have noticed that the major decrease occurs when the

packet size is less than approximately 4000 bytes. This

result is shown in Figure 5, where the values for three SVC

streams are presented. The threshold depends on the

complexity of the input sequence: high complexity

sequences will imply high values of the threshold.

16 QAM -1/2 SNR=8,3

0,00%

10,00%

20,00%

30,00%

40,00%

50,00%

60,00%

NO

FU-A

1000

060

0045

0042

5041

0040

8040

2540

20401

5401

340

1240

1140

1040

0030

0015

0010

00 400

100

Fragmentation packet size

Pa

ck

et

Lo

ss

es

3 layers (6 Mbps) 2 layers (2 Mbps) 1 layer (500 kbps)

Figure 5. WiMAX 16 QAM SNR=8,3dB test.

Figure 6 shows the packet loss against fragmentation

packet size for several transmission conditions. The

fragmentation is only useful when the transmission

conditions are good enough. Under bad transmission

conditions most of all packets, even if they are very small

are affected by the errors.

16 QAM -1/2

0,00%

20,00%

40,00%

60,00%

80,00%

100,00%

120,00%

NO

FU-A

10000 6000 4500 3000 1500 1000 400 100

Fragmentation packet size

Pa

cke

t L

osses SNR=5

SNR=6,65

SNR=8,3

SNR=9,95

SNR=13,25

Figure 6. Packet losses vs. fragmentation size in a WiMAX 16

QAM transmission

7. CONCLUSIONS

As conclusion, in this paper the study of the appropriate

RTP encapsulation/synchronization process is one step

forward to ensure the correct process and delivery of the

video information to the consumers providing the best

quality of services. By this way, it is possible to avoid

undesirable frame delays and packet losses inserted by the heterogeneous nature of the transport chain. It is also

important to emphasise the benefits of using an intelligent

playout to maximise the use of the total bandwidth of the

system, taking advantage of the scalability features of

H.264 SVC video, allowing the optimization of the bit

rates.

8. REFERENCES

[1] “H264/AVC Over IP” - Stephan Wenger, 1051-8215, IEEE

transaction on circuits and systems for video technology, vol.13,

No.07, July 2003.

[2] IETF RFC 2250: "RTP Payload Format for MPEG1/MPEG2 Video".

[3] IETF RFC 3984: "RTP payload for transport of H.264".

[4] IETF RFC 3550: "RTP, A Transport Protocol for Real Time

Applications"

[5] “draft-wenger-avt-rtp-svc-03.txt”, S. Wenger, Y.-K. Wang, T.

Schierl, June 2006

[6] IETF RFC 2326 Schulzrinne, H., Rao, A., and Lanphier, R., 1998.

Real Time Streaming Protocol (RTSP). IETF, Request for

Comments, RFC 2326.

[7] ITU-T Recommendation H.264: "Advanced video coding for

generic audiovisual services" / ISO/IEC 14496-10 (2005):

"Information Technology - Coding of audio-visual objects Part 10:

Advanced Video Coding".

[8] Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG

(ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6)15–21 July,

2006.

[9] Moving Picture Experts Group Mar. 2003. Information technology

— Multimedia framework (MPEG-21) — Part 7: Digital Item

Adaptation. ISO/IEC JTC1/SC29/WG11 FDIS 21000-7.

[10] IETF RFC 4566:”SDP: Session Description Protocol”.

Handley, M., Jacobson, V., and Perkins, C., Jul. 2006.

[11] T. Schierl, T., December. 2006. Signaling of layered and multi

description media in Session Description Protocol (SDP). IETF,

Internet Draft, draft-schierl-mmusic-layered-codec-02.

[ieee 2008 ieee international symposium on consumer electronics - (isce 2008) - vilamoura, portugal...

Documents