[ieee 2008 ieee international symposium on consumer electronics - (isce 2008) - vilamoura, portugal...
TRANSCRIPT
Delivery of H264 SVC/MDC streams over Wimax and DVB-T networks
Victor Domingo Reguant, Francesc Enrich Prats, Ramon Martín de Pozuelo, Francesc Pinyol Margalef, and Gabriel Fernández Ubiergo
GTAM-Grup de Recerca en Tecnologies Audiovisuals i Multimedia.
ENGINYERIA I ARQUITECTURA LA SALLE. UNIVERSITAT RAMON LLULL.
Quatre Camins 2, 08022 Barcelona (Spain)
Email: {frane,victord,fpinyol,gabrielf}@salle.url.edu
ABSTRACT
This paper introduces an approach for the optimal delivery
(encapsulation and signalling) of video streams coded using
H.264 Scalable Video Coding (SVC) combined with
Multiple Description Coding (MDC). The solution
presented uses optimization and control strategies
depending on the different type of delivered services, the
terminals that will consume these services, the load process
of the video servers and the network conditions.
Index Terms— RTP, SVC, MDC, H264, MPEG-4
1. INTRODUCTION
This paper is related to the work performed under the
framework of the SUIT (Scalable, Ultra-fast and
Interoperable Interactive Television) European project
(IST-4-028042). In the SUIT project, two SVC (Scalable
Video Coding) [7,8] descriptions from a given content are
generated, being each one sent through a different network
(DVB-T and WiMAX). Due to the use of SVC, each
description contains a scalable version of the encoded
content. Thanks to the convergence of broadband and
broadcast networks, the receiving terminal , which may be
connected to both networks, can then combine the
descriptions while coping with transmission errors or
improving the quality of the received video. SUIT project
includes an end-to-end chain composed of a playout
system, broadcast and broadband networks, and terminals
of different computational and display capacities.
This paper focuses on the development of the playout
system; the system has to encapsulate, signal and
synchronise the different SVC descriptions to be optimally
delivered over the different networks according to network
conditions and the available bit rate.
2. PLAYOUT ARCHITECTURE
A summary of the playout architecture illustrated in figure
1 is presented in this section. The playout system includes
several H.264 SVC/MDC video servers delivering
broadcast, unicast and multicast services. All the video
servers are connected to a switch that multiplexes all the
input streams into two output streams, one connected to a
WiMAX base station, and the other one connected to a
DVB-T base station. The DVB-T transmission also
implements an RF return channel using DVB-RCT. The
pair of descriptions for each service are encapsulated and
synchronised using the RTP/RTCP (Real-time Transport
Protocol / Real-time Transport Control Protocol) [3,4,5]
protocol for H.264/SVC [7,8], in such a way that the
terminal is able to recover them correctly combining the
service descriptions. The SUIT playout has a distributed
architecture, which makes the managing of several video
servers easier. To ensure load sharing, the different services
are automatically distributed among the available video
servers.
Figure 1. SUIT playout architecture
2.1. Suit services
In order to demonstrate its performance, the playout allows
the management and configuration of several types of
services:
• Scalable robust SVC/MDC broadcast service: one
SVC description is broadcasted over DVB-T and the
other description over WiMAX network.
• Broadcast/QoS on demand: one SVC description is
broadcasted over DVB-T and the other one is sent by
unicast over WiMAX to the user that requests more
video quality.
• Broadband streaming Video on Demand (VoD):
one SVC description is sent on demand over WiMAX
network upon request.
The bit rates for each type of service are shown in table 1.
Table 1. Services bit rates. SVC= HD: 1280x704p-25 Hz (4
Mbps); SD: 640x352x25 (1.5 Mbps); CIF: 320x176x25 (0.5
Mbps)
DVB-T WiMAX
Service Bit Rate
(Mbps)
Service
Bit Rate
(Mbps)
1 D SVC HD Real
Time Broadcasting
4 – 6 2 D SVC HD Real
Time Broadcasting
0.5 – 6
1 D SVC HD
Recorded
Broadcasting
4 – 6 2 D SVC HD
Broadcasting (on
QoS demand)
0.5 – 6
1 D SVC SD Real
Time Broadcasting
1.5 – 2 2 D SVC SD Real
Time Broadcasting
0.5 – 2
1 D SVC SD
Recorded
Broadcasting
1.5 – 2 2 D SVC SD
Recorded
Broadcasting
0.5 – 2
Streaming (VoD) 0.5-6 p.u
Total 11-20 Total 2.5-16
3. RTP ENCAPSULATION AND
SYNCHRONIZATION
The playout system has to correctly encapsulate,
synchronise and signal the SVC/MDC contents delivered
through separate networks. Two H.264/SVC streams are
encapsulated into different descriptions being each one
delivered in different RTP sessions. Thus, the receiver is
able to decapsulate the incoming RTP descriptions and
combine them to obtain the main stream.
RTP encapsulation has been done following the newest
RFC [3,4,5] and taking into account the different video
parameters and services in order to achieve a better
performance of the transmission.
In the figure 2 the RTP encapsulation process is shown.
The extractor module gets the SVC NAL (Network
Abstraction Layer) units belonging to the available
descriptions files. Then, thanks to the video CGS (Course
Grain Scalability) coding, it delivers the NAL units that
match a specific target bit rate.
The RTP encapsulator has to encapsulate the NAL units
belonging to each description in different RTP sessions,
using the appropriate packetization mode, IP packet length
and RTP packet type.
Figure 2. RTP encapsulator process
According to RTP SVC standard [3] the following
packetizations modes are defined:
• Single NAL unit mode The transmission order of single NAL unit packets must
comply with the NAL unit decoding order.
• Non-interleaved mode NAL units are transmitted in NAL unit decoding order.
• Interleaved mode The interleaved mode allows transmission of NAL units out
of NAL unit decoding order. This mode is not used in SUIT
due to the fact that different layers of the SVC bitstream are
transported in the same RTP packet stream.
The following NAL unit types are also defined for RTP
packet payload:
• Single NAL Unit Packet
The single NAL unit packet contains only one NAL unit. A
NAL unit stream composed by decapsulating single NAL
unit packets in RTP sequence number order must conform
to the NAL unit decoding order.
• Aggregation Packets: This packetization mode is introduced to prevent media
transcoding between different networks, and to avoid
undesirable packetization overhead. This type of packets is
not used in SUIT because few times the NAL units can be
aggregated due to its big size.
• Fragmentation Units (FU):
This payload type allows fragmenting a NAL unit into
several RTP packets. Doing so on the application layer
instead of relying on lower layer fragmentation has the
advantage that the payload format is capable of transporting
NAL units bigger than 64 kbytes (the largest possible size
for IP packets) over an IPv4 network. This is useful to
deliver High Definition (HD) formats (there is a limit of the
number of slices per picture, which results in a limit of
NAL units per picture, which may result in big NAL units).
Usually in SD (Standard Definition) and CIF (Common
Intermediate Format) services the RTP encapsulator uses
the Single NAL unit packets (one NAL unit per RTP
packet) but for HD services it is better to use fragmentation
following the FU-A mode [3,5] (NAL unit fragmented into
several RTP packets) at the MTU size because some Intra
frames can be bigger than the maximum IP packet size.
Specifically to SVC [7,8], it is important to note that the
base layer and the enhancement layer are stamped with the
same timestamp in the RTP header (except for temporal
scalability layers). Therefore, for a given frame-rate we
have several layers (base layer, spatial layers, SNR layers)
which have to be timestamped with the same time, as
depicted in Figure 3.
In order to synchronise the two descriptions delivered by
the RTP encapsulator it is not necessary to send RTCP [4],
due to the fact that both descriptions have been generated
by the same system; the receiver only needs its own clock
at 90 KHz and the timestamp delivered in each RTP packet
to synchronise them. The combining process at the decoder
requires both descriptions to have the same timestamp.
Figure 3. H264/SVC layers
4. SIGNALLING OF RTP SVC/MDC STREAMS
The signalling of SVC/MDC streams is based on the SDP
(Session Description Protocol) [10]. The purpose of SDP is
to convey information about media streams in multimedia
sessions. As there is no RFC standard to describe SVC and
MDC streams, an IETF draft, called draft-schierl-mmusic-
layered-codec [11] has been considered in SUIT project.
This draft extends the SDP specification to support
signalling of relationships between media. It enables
signalling decoding dependency of different media
descriptions with the same media type in SDP. In SUIT,
different reasons can be envisioned, for example the
transporting of bitstream partitions of a hierarchical media
coding process (also known as layered media coding
process) or of a multi description coding (MDC) in
different network streams. The basic idea for all cases is
the separation of partitions of a media bitstream to allow
scalability in network elements. The two types of media
dependencies in SUIT are the following:
• Layered/hierarchical decoding dependencies: In
SUIT, one or more layers may be transported over
WiMAX or DVB-T network streams depending on the
available bit-rate of each network. The receiver selects
the required layers conveyed in the RTP session in
response to quality or bit-rate requirements. The base
layer, which is self-contained, can be decoded without
any dependency. In SUIT the signalling of the existing
layers in the media stream is done in-band using
specific NAL unit types.
• Multiple description decoding dependencies: In the
most basic form of multiple descriptive coding (MDC),
each partition forms an independent representation of
the media. In SUIT each partition represents an SVC
stream. That is, decoding of any of the partition yields
useful reproduced media data. The SUIT combiner can
parse the SDP file to be aware whether or not more
than one partition is available, then it can process them
jointly, and the resulting media quality increases. The
highest reproduced quality is available if all original
partitions are available for decoding. An SDP example
is shown in Figure 4.
v=0
o=mdcsrv 289083124 289083124 IN IP4
s=MULTI DESCRIPTION VIDEO SIGNALING
t=0 0
a=group:DDP 1 2
m=video 40000 RTP/AVP 96
c=IN IP4 224.2.17.12/127
a=mid:1
a=depend:mdc
m=video 40002 RTP/AVP 96
c=IN IP4 224.2.17.13/127
a=mid:2
a=depend:mdc
Figure 4. SDP file example for MDC dependency
5. OPTIMAL BIT RATE MANAGEMENT IN THE
PLAYOUT
Regarding the system bit rates control, the aim of the
playout is to maximise the DVB-T and WiMAX network
bit rates, ensuring always the best performance of the
system and providing the best image quality to the user.
As the system is based on the scalable video coding
extension of H.264/AVC (SVC), the data rate of each
service can be easily adapted depending on the available
bandwidth or the terminal capabilities.
The playout can take advantage of SVC techniques in order
to reduce the needed bit rate to serve clients of
heterogeneous capabilities at the same time, while
consuming less bit rate compared to simulcasting the
services.
The MDC techniques used together with SVC provide
more robustness to the video sequences at the cost of an
increase in bit rate. The playout is able to decide whether to
sacrifice the bit rate used to send a second description, and
thus the robustness, in favour of accommodating internet or
new VoD requests in the WiMAX network.
The bit rate management is based on a priority policy
applied to the different services; the playout has the ability
to change in real time the service bit rates in order to
provide the best QoS while maximizing the networks
throughput. These policies, defined for each service, are
taken into account by the playout algorithms. As the
intelligent playout has also the knowledge of the total bit
rate amount delivered in each network, it is able to control
and optimise the load of the services. The playout will deal
with different type of services (broadcast, multicast and
unicast) to solve, in real time, the optimal content
distribution according to available bit rates, service priority,
networks characteristics and terminal capabilities.
6. RESULTS
In order to demonstrate the benefits of using the FU-A
fragmentation defined in [3,5], the following test bed
configuration has been performed. The video server has
been connected to a terminal through the WiMAX/DVB-T
emulator developed by the Centre for Communication
Systems Research (CCSR) University of Surrey. This
emulator has been developed in the framework of the SUIT
project and uses some error patterns to emulate WiMAX
and DVB modulation conditions. In the tests a three spatial
layer SVC video of 15s duration has been used. (CIF:
500Kbps, SD: 2 Mbps, HD: 6 Mbps)
The video is delivered using the unicast mode to the
terminal at different bit rates and changing the modulation
parameters. The result of the tests shows that using
fragmentation decreases packet losses. In the tests we
have noticed that the major decrease occurs when the
packet size is less than approximately 4000 bytes. This
result is shown in Figure 5, where the values for three SVC
streams are presented. The threshold depends on the
complexity of the input sequence: high complexity
sequences will imply high values of the threshold.
16 QAM -1/2 SNR=8,3
0,00%
10,00%
20,00%
30,00%
40,00%
50,00%
60,00%
NO
FU-A
1000
060
0045
0042
5041
0040
8040
2540
20401
5401
340
1240
1140
1040
0030
0015
0010
00 400
100
Fragmentation packet size
Pa
ck
et
Lo
ss
es
3 layers (6 Mbps) 2 layers (2 Mbps) 1 layer (500 kbps)
Figure 5. WiMAX 16 QAM SNR=8,3dB test.
Figure 6 shows the packet loss against fragmentation
packet size for several transmission conditions. The
fragmentation is only useful when the transmission
conditions are good enough. Under bad transmission
conditions most of all packets, even if they are very small
are affected by the errors.
16 QAM -1/2
0,00%
20,00%
40,00%
60,00%
80,00%
100,00%
120,00%
NO
FU-A
10000 6000 4500 3000 1500 1000 400 100
Fragmentation packet size
Pa
cke
t L
osses SNR=5
SNR=6,65
SNR=8,3
SNR=9,95
SNR=13,25
Figure 6. Packet losses vs. fragmentation size in a WiMAX 16
QAM transmission
7. CONCLUSIONS
As conclusion, in this paper the study of the appropriate
RTP encapsulation/synchronization process is one step
forward to ensure the correct process and delivery of the
video information to the consumers providing the best
quality of services. By this way, it is possible to avoid
undesirable frame delays and packet losses inserted by the heterogeneous nature of the transport chain. It is also
important to emphasise the benefits of using an intelligent
playout to maximise the use of the total bandwidth of the
system, taking advantage of the scalability features of
H.264 SVC video, allowing the optimization of the bit
rates.
8. REFERENCES
[1] “H264/AVC Over IP” - Stephan Wenger, 1051-8215, IEEE
transaction on circuits and systems for video technology, vol.13,
No.07, July 2003.
[2] IETF RFC 2250: "RTP Payload Format for MPEG1/MPEG2 Video".
[3] IETF RFC 3984: "RTP payload for transport of H.264".
[4] IETF RFC 3550: "RTP, A Transport Protocol for Real Time
Applications"
[5] “draft-wenger-avt-rtp-svc-03.txt”, S. Wenger, Y.-K. Wang, T.
Schierl, June 2006
[6] IETF RFC 2326 Schulzrinne, H., Rao, A., and Lanphier, R., 1998.
Real Time Streaming Protocol (RTSP). IETF, Request for
Comments, RFC 2326.
[7] ITU-T Recommendation H.264: "Advanced video coding for
generic audiovisual services" / ISO/IEC 14496-10 (2005):
"Information Technology - Coding of audio-visual objects Part 10:
Advanced Video Coding".
[8] Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG
(ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6)15–21 July,
2006.
[9] Moving Picture Experts Group Mar. 2003. Information technology
— Multimedia framework (MPEG-21) — Part 7: Digital Item
Adaptation. ISO/IEC JTC1/SC29/WG11 FDIS 21000-7.
[10] IETF RFC 4566:”SDP: Session Description Protocol”.
Handley, M., Jacobson, V., and Perkins, C., Jul. 2006.
[11] T. Schierl, T., December. 2006. Signaling of layered and multi
description media in Session Description Protocol (SDP). IETF,
Internet Draft, draft-schierl-mmusic-layered-codec-02.