H.265/HEVC video transmission over4G cellular networks
by
Aman Jassal
Dipl.Ing., Ecole Superieure d’Ingenieurs en Informatique et Genie desTelecommunications, 2008
A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OF
MASTER OF APPLIED SCIENCE
in
The Faculty of Graduate and Postdoctoral Studies
(Electrical and Computer Engineering)
THE UNIVERSITY OF BRITISH COLUMBIA
(Vancouver)
January 2016
c© Aman Jassal 2016
Abstract
Long Term Evolution has been standardized by the 3GPP consortium since
2008, with 3GPP Release 12 being the latest iteration of LTE Advanced,
which was finalized in March 2015. High Efficiency Video Coding has been
standardized by the Moving Picture Experts Group since 2012 and is the
video compression technology targeted to deliver High-Definition video con-
tent to users. With video traffic projected to represent the lion’s share of
mobile data traffic in the next few years, providing video and non-video
users with high Quality of Experience is key to designing 4G systems and
future 5G systems.
In this thesis, we present a cross-layer scheduling framework which de-
livers video content to video users by exploiting encoding features used by
the High Efficiency Video Coding standard such as coding structures and
motion compensated prediction. We determine which frames are referenced
the most within the coded video bitstream to determine which frames have
higher utility for the High Efficiency Video Coding decoder located at the
user’s device and evaluate the performances of best effort and video users
in 4G networks using finite buffer traffic models. We look into throughput
performance for best effort users and packet loss performance for video users
to assess Quality of Experience. Our results demonstrate that there is sig-
ii
Abstract
nificant potential to improve the Quality of Experience of best effort and
video users using our proposed Frame Reference Aware Proportional Fair
scheme compared to the baseline Proportional Fair scheme.
iii
Preface
I hereby declare that I am the author of this thesis. This thesis is an original,
unpublished work under the supervision of Dr. Cyril Leung. In this work,
I played the primary role in designing and performing the research, doing
data analysis and preparing the manuscript under the supervision of Dr.
Cyril Leung.
iv
Table of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Basics of H.265/HEVC . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Syntax Structures and Syntax Elements . . . . . . . . . . . . 4
2.2 Coding Structures and Reference Picture Lists . . . . . . . . 7
2.2.1 Coding Structures . . . . . . . . . . . . . . . . . . . . 8
2.2.2 Reference Picture Lists . . . . . . . . . . . . . . . . . 10
v
Table of Contents
2.3 Motion Compensated Prediction . . . . . . . . . . . . . . . . 13
2.4 Operation with Networking Layers . . . . . . . . . . . . . . . 15
3 Cross-Layer Frame Reference Aware Scheduling Framework 18
3.1 Mathematical Formulation of the Shared Resource Allocation
Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Solution to the proposed Shared Resource Allocation Problem 25
4 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.1 H.265/HEVC Video Content Generation . . . . . . . . . . . 28
4.2 LTE-Advanced System Model . . . . . . . . . . . . . . . . . 30
4.2.1 Network Model . . . . . . . . . . . . . . . . . . . . . 30
4.2.2 Traffic Model . . . . . . . . . . . . . . . . . . . . . . . 34
4.2.3 Channel Model . . . . . . . . . . . . . . . . . . . . . 35
4.2.4 Feedback Model . . . . . . . . . . . . . . . . . . . . . 40
5 Simulation Results and Analysis . . . . . . . . . . . . . . . . 42
5.1 Simulation Assumptions . . . . . . . . . . . . . . . . . . . . . 43
5.2 Simulation Results and Discussion . . . . . . . . . . . . . . . 48
5.2.1 Results for video users . . . . . . . . . . . . . . . . . 49
5.2.2 Results for Best Effort users . . . . . . . . . . . . . . 54
6 Conclusions and Future Work . . . . . . . . . . . . . . . . . . 60
6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
vi
List of Tables
2.1 Generic NAL unit syntax, adapted from [3] . . . . . . . . . . 5
2.2 Reference Picture Sets for the Hierarchical-B Coding Struc-
ture of GOP-size 8 . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Reference Picture Lists for the Hierarchical-B Coding Struc-
ture of GOP-size 8 . . . . . . . . . . . . . . . . . . . . . . . . 13
4.1 H.265/HEVC Table of Video Test Sequences . . . . . . . . . 28
4.2 H.265/HEVC Parameters . . . . . . . . . . . . . . . . . . . . 30
4.3 FTP Traffic Model 1 . . . . . . . . . . . . . . . . . . . . . . . 33
4.4 H.265/HEVC Traffic Model . . . . . . . . . . . . . . . . . . . 35
5.1 LTE-Advanced Parameters . . . . . . . . . . . . . . . . . . . 46
5.2 Offered Load and corresponding Resource Utilization . . . . . 49
vii
List of Figures
2.1 Frame dependencies in the reference coding structure. . . . . 9
2.2 Uni- and bi-predictive inter-prediction illustration from adja-
cent pictures, adapted from [4] . . . . . . . . . . . . . . . . . 14
2.3 RTP Single NAL unit packet structure . . . . . . . . . . . . . 16
2.4 H.265/HEVC system layer stack . . . . . . . . . . . . . . . . 17
4.1 Hexagonal Network Grid Layout . . . . . . . . . . . . . . . . 31
4.2 Wrap Around of Hexagonal Network . . . . . . . . . . . . . . 32
4.3 LTE Downlink PRB allocation illustration . . . . . . . . . . . 33
5.1 Video users’ active download time . . . . . . . . . . . . . . . 50
5.2 Satisfied Video User Percentage . . . . . . . . . . . . . . . . . 51
5.3 CRA LDU Loss Ratio . . . . . . . . . . . . . . . . . . . . . . 53
5.4 Average throughput for Best Effort users . . . . . . . . . . . 55
5.5 Coverage throughput for Best Effort users . . . . . . . . . . . 56
5.6 Illustration of the outer 10% of the coverage area . . . . . . . 57
5.7 Average BE user throughput in Cell-Edge region . . . . . . . 58
viii
List of Acronyms
3GPP Third Generation Partnership Project.
ADT Active Download Time.
BE Best Effort.
CB Coding Block.
CDF Cumulative Distribution Function.
CQI Channel Quality Indicator.
CSI Channel State Information.
CVS Coded Video Sequence.
DASH Dynamic Adaptive Streaming over HTTP.
EESM Exponential Effective SNR Mapping.
FDD Frequency Division Duplex.
GOP Group of Pictures.
ix
List of Acronyms
H.264/AVC Advanced Video Coding.
H.265/HEVC High Efficiency Video Coding.
HTTP Hypertext Transfer Protocol.
IETF Internet Engineering Task Force.
IP Internet Protocol.
ITU-R International Telecommunications Union Radiocommunications Sec-
tor.
JCT-VC Joint Collaborative Team on Video Coding.
KPIs Key Performance Indicators.
LDU Logical Data Unit.
LTE Long Term Evolution.
LTE-A LTE Advanced.
MANE Media Aware Network Element.
MIESM Mutual Information Effective SNR Metric.
MIMO Multiple Input Multiple Output.
MOS Mean Opinion Score.
MPEG Moving Picture Experts Group.
x
List of Acronyms
MU-MIMO Multi User Multiple Input Multiple Output.
NAL Network Abstraction Layer.
NGMN Next Generation Mobile Networks.
OFDMA Orthogonal Frequency Division Multiple Access.
OSI Open Systems Interconnection.
PB Prediction Block.
PLR Packet Loss Ratio.
PMI Precoding Matrix Indicator.
POC Picture Order Count.
PRB Physical Resource Block.
QAM Quadrature Amplitude Modulation.
QoE Quality of Experience.
QoS Quality of Service.
QPSK Quaternary Phase Shift Keying.
RBSP Raw Byte Sequence Payload.
RI Rank Indication.
RTP Real Time Protocol.
xi
List of Acronyms
RU Resource Utilization.
SINR Signal to Interference and Noise Ratio.
SNR Signal to Noise Ratio.
SRST Single RTP stream on a single media transport.
SU-MIMO Single User Multiple Input Multiple Output.
TCP Transmission Control Protocol.
UDP User Datagram Protocol.
UMTS Universal Mobile Telecommunications System.
VCL Video Coding Layer.
Wi-Fi Wireless Fidelity.
xii
Acknowledgements
I would like to take this opportunity to express my utmost gratitude and
sincerest thanks to my supervisor, Dr. Cyril Leung, who has given me great
support, encouragement and guidance throughout my work and my M.A.Sc
program. My discussions with him were a constant source of inspiration
and his insights helped make this research work more valuable. Without his
invaluable knowledge and understanding in this research area, this thesis
would have never been possible.
I would also like to thank Dr. Ahmed Saadani for his guidance and
support throughout my engineering program and at Orange Labs where he
gave me the opportunity to do research work on 4G systems. My former
colleagues, Mr. Sebastien Jeux and Dr. Sofia Martinez Lopez, and more
generally all the research community involved in research and standardiza-
tion with the 3GPP, have had a great influence on me and without their
inspiration I would have never undertaken my program at the University of
British Columbia.
All of the work that has been done in this thesis was supported in part by
the Natural Sciences and Engineering Research Council (NSERC) of Canada
under Grant RGPIN 1731-2013.
xiii
Dedication
To my parents and my sister
xiv
Chapter 1
Introduction
With the emergence of Long Term Evolution (LTE) and its subsequent it-
erations standardized by the Third Generation Partnership Project (3GPP)
consortium, video services are fast becoming the dominant data services
in 4G mobile networks and mobile video traffic is projected to account for
72% of the total mobile data traffic by 2019 [1]. The transmission of video
services over cellular networks is challenging due to the large bandwidth
requirement, the low latency required due to protocol stack inter-operation
and the effect of error propagation within the video sequence in the event
of packet losses. The current dominant standard for video coding is Ad-
vanced Video Coding (H.264/AVC) [2] and is used to deliver a wide range
of video services. However, H.264/AVC requires extremely high bandwidth,
making the delivery of High-Definition (HD) video services impractical. Its
successor, High Efficiency Video Coding (H.265/HEVC) [3], was standard-
ized by the Moving Picture Experts Group (MPEG) in 2012 and is expected
to reduce the bit rate compared to H.264 High Profile by about 50% while
maintaining comparable subjective quality [4]. Therefore H.265/HEVC is a
more practical choice for delivering HD and Ultra High-Definition (UHD)
video content to consumers using wired and wireless networks.
1
Chapter 1. Introduction
As we move towards 5G, one of the key targets that we need to achieve
is to provide a more consistent user experience across the whole network
as well as higher Quality of Experience (QoE) [5]. Cross-layer QoE-aware
resource allocation schemes have been proposed for Orthogonal Frequency
Division Multiple Access (OFDMA) systems [6], where the scheduling al-
gorithm uses the Mean Opinion Score (MOS) as a way to provide QoE.
Other attributes that the research community has been focusing on in order
to improve the QoE of video users are the playback buffer status and the
rebuffering time [7]-[8]. One of the limitations in these works is the reliance
on video traces that were generated for low-definition video sequences en-
coded using H.264/AVC, which are not representative of the targets that
5G networks are supposed to satisfy. Rather they are aimed at delivering
HD or UHD video services anywhere anytime. Other works have considered
H.265/HEVC video streaming over Wi-Fi wireless networks and shown that
the QoE of video sequences, reflected through the use of MOS, is very sen-
sitive to network impairments such as packet losses. Nightingale et al. [9]
assumed that packet losses are random; however in cellular networks this
assumption is rarely valid as the combination of traffic load, the characteris-
tics of the video sequence and the individual user’s link quality will dictate
the overall performance that can be achieved.
In this thesis, we focus on the use-case of H.265/HEVC video trans-
mission over 4G networks. Existing works have not used the compression
properties of H.265/HEVC, specifically in terms of exploiting the tempo-
ral inter-dependence between frames within coding structures, or evaluated
how well video services can be delivered in 4G/beyond-4G networks with
2
Chapter 1. Introduction
dynamic user arrivals. We use performance evaluation methodologies which
use Key Performance Indicators (KPIs) that have been recommended by the
Next Generation Mobile Networks (NGMN) Alliance for 5G networks [5].
The main novel contributions of this thesis are as follows:
1. The definition of a cross-layer scheduling framework exploiting frame
referencing to deliver video content
2. The evaluation of capacity for the delivery of H.265/HEVC video ser-
vices over beyond-4G networks
3. The joint-assessment of the QoE of video users and Best Effort users
The remainder of this thesis is organized as follows. Chapter 2 out-
lines the basics of the H.265/HEVC standard that are relevant to this work.
Chapter 3 presents the proposed cross-layer scheduling framework for video
content transmission. The simulation model is presented in Chapter 4. Sim-
ulation results, analysis and discussions are provided in Chapter 5. Conclu-
sions and future work are presented in Chapter 6.
3
Chapter 2
Basics of H.265/HEVC
In this chapter, we describe the features of the H.265/HEVC standard that
are directly relevant to this thesis and to the problem formulation that
will be presented and developed in Chapter 3. Specifically, we present the
high-level syntax used to represent the video data, the motion prediction
techniques used for video compression and the coding structures and refer-
ence picture lists used to perform the motion-predicted compensation task in
H.265/HEVC [3]. The main point to understand is that the encoder knows
about the specifics of the coding structure and it has to provide the decoder
about the information needed to reconstitute it. This is done through using
a given coding order (which is implicitly embedded in the way LDUs are
ordered) and through using Reference Picture Sets and Reference Picture
Lists (the former are explicitly transmitted and the latter are derived dur-
ing the decoding process). In this chapter we will explain how all of these
features work.
2.1 Syntax Structures and Syntax Elements
H.265/HEVC uses so-called syntax structures to represent the encoded video
data. An H.265/HEVC encoder generates syntax structures encapsulated
4
2.1. Syntax Structures and Syntax Elements
Table 2.1: Generic NAL unit syntax, adapted from [3]nal unit(NumBytesInNalUnit) {
forbidden zero bitnal unit typenuh layer idnuh temporal id plus1NumBytesInRbsp=0for(i=2; i < NumBytesInNalUnit; i++)
if(i+2 < NumBytesInNalUnit && next bits(24) == 0x000003) {rbsp byte[NumBytesInRbsp++]rbsp byte[NumBytesInRbsp++]i+=2emulation prevention three byte /* equal to 0x03 */} else
rbsp byte[NumBytesInRbsp++]}
inside logical data units called Network Abstraction Layer (NAL) units.
An H.265/HEVC decoder decapsulates NAL units and consumes syntax
structures to reconstitute a given picture1. The sequence of NAL units
can be viewed as a text written in a specific language with a syntax and
semantics that the decoder can read and understand. The syntax is the set
of words the decoder knows and the semantics tells the decoder how the
syntax is to be used. The information conveyed by the combination of the
syntax and the semantics is recovered through the decoding process, which
is fully specified in [3].
Table 2.1 illustrates the syntax structure of a generic NAL unit and the
syntax elements it carries, syntax elements are highlighted in bold. Syntax
elements have associated descriptors which are used for parsing purposes
but these are not covered in this thesis and the interested reader is in-
1In this thesis, we will interchangeably use the terms ”Picture” and ”Frame”.
5
2.1. Syntax Structures and Syntax Elements
vited to refer to [4] (Chapter 5) for more details. Every NAL unit carries
NumBytesInNalUnit bytes, which further breaks down into a 16-bit header
made of 4 syntax elements and a payload which is the Raw Byte Sequence
Payload (RBSP) data structure, carrying NumBytesInRbsp bytes. The
first syntax element is the forbidden zero bit (forbidden zero bit). The
second syntax element is nal unit type, which is written over 6 bits and
carries the type of the RBSP contained in the NAL unit. The values that
it can take are specified in Table 7-1 of [3], NAL unit types belong either to
Video Coding Layer (VCL) or non-VCL. VCL types comprise all NAL units
that contain coded video data whereas non-VCL types contain parameter
information. The third syntax element is the layer identifier, nuh layer id,
which is written over 6 bits. Its value is always 0 although other values
can be specified by future recommendations of ITU-T that relate to future
scalable or 3D video coding extensions of [3]. The fourth and final syntax
element of the header is the temporal identifier, nuh temporal id plus1,
which is written over 3 bits. Its value is typically 1, which means that there
is only one temporal layer. We assume that this is the case throughout the
thesis. The temporal identifier for the NAL unit, TemporalID, is obtained
as:
TemporalID = nuh temporal id plus1− 1 (2.1)
The payload of NAL units is the RBSP, denoted as the rbsp byte
syntax element, where rbsp byte contains NumBytesInRBSP bytes and
rbsp byte[i] is the ith byte of the RBSP. Because there are various types of
NAL units, the RBSP itself can be viewed as a syntax structure carrying syn-
6
2.2. Coding Structures and Reference Picture Lists
tax elements. For each nal unit type, the H.265/HEVC standard provides
the description of the associated syntax structure. For instance, the RBSP
of a Video Parameter Set has a dedicated syntax structure (Section 7.3.2.1
of [3]), the RBSP of a Clean Random Access NAL unit has a dedicated
syntax structure further broken into a slice segment header, a slice segment
data and trailing bits (Section 7.3.2.9 of [3]), etc. In order to guarantee
that every NAL unit has a unique start identifier byte, the H.265/HEVC
standard uses dedicated bytes called emulation prevention three byte.
During the decoding process, this byte is usually discarded. In this thesis,
we assume that a bitstream is only made of generic VCL NAL units and
from this point onwards, a NAL unit will be referred to as Logical Data Unit
(LDU).
2.2 Coding Structures and Reference Picture
Lists
An H.265/HEVC bitstream is made up of several entities called Coded Video
Sequence (CVS). A CVS is the coded representation of a sequence of pictures
which can be decoded using pictures within that sequence. Similarly, a coded
picture is the coded representation of a picture, which typically consists of
multiple LDUs. A coded picture is embedded in a so-called access unit which
contain all the LDUs associated with that picture. In this section we will
present some of the tools used by the H.265/HEVC standard for motion
compensated prediction: coding structures and reference picture lists.
7
2.2. Coding Structures and Reference Picture Lists
2.2.1 Coding Structures
H.265/HEVC relies on temporal coding structures to perform its video com-
pression task. A coding structure designates a set of consecutive pictures
with clearly defined dependencies between pictures and a given coding or-
der. The purpose of having pictures depend on others is for prediction, which
can be done from one picture or two pictures (called uni-prediction and bi-
prediction respectively). Coding structures define a coding order, which is
different from the output order: the coding order is the order in which pic-
tures are encoded while the output order is the order in which pictures are
displayed on the screen. Because of this difference, the H.265/HEVC stan-
dard uses a Picture Order Count (POC) to uniquely identify a given picture
in output order. From this point onwards and for the sake of convenience,
we will refer to the picture whose POC is equal to n as pocn.
The definition of a coding structure bears a strong similarity to that of a
Group of Pictures (GOP) in H.264/AVC. In earlier video compression stan-
dards such as H.264/AVC, a GOP designates a set of consecutive pictures
with clearly defined dependencies where the first picture is an intra-coded
picture (or equivalently an I-Frame). The difference between a GOP and
a coding structure is that the first picture in a coding structure does not
have to be an I-Frame. Basically, the pictures that belong to a coding struc-
ture only reference other pictures within the coding structure for prediction
purposes. In this case, the coding structure is called a closed GOP. The
H.265/HEVC standard also allows cases where a picture within a coding
structure references a picture from another coding structure, in which case
8
2.2. Coding Structures and Reference Picture Lists
Figure 2.1: Frame dependencies in the reference coding structure.
the coding structure is called an open GOP. Throughout this chapter, we
will use the hierarchical-B coding structure that was used by the Joint Col-
laborative Team on Video Coding (JCT-VC) for the Main Profile Random
Access encoder configuration as described in [10]. All figures and tables
will refer to that specific coding structure. For simplicity, throughout the
remainder of this thesis, we will refer to this coding structure simply as the
reference coding structure.
9
2.2. Coding Structures and Reference Picture Lists
Fig. 2.1 depicts four illustrations of frame dependencies in the reference
coding structure. Referenced pictures are denoted by a (*) and arrows point
from the referenced picture to denote all direct dependent pictures. Depen-
dent pictures can be either before or after the referenced picture in display
order. The reference coding structure is actually an open GOP coding struc-
ture and by design it operates with a GOP size of 8. We can see the open
side of the reference coding structure in Fig. 2.1 on the examples where poc0,
poc4 and poc6 are the referenced pictures. They are referred by pictures be-
yond the GOP size: poc0, poc4 and poc6 are all referenced by poc16. The
reference coding structure uses I-Frames and B-Frames. The coding order
of this coding structure is defined as {pocn, pocn−4, pocn−6, pocn−7, pocn−5,
pocn−2, pocn−3, pocn−1}. poc0 is a special case and constitutes a GOP on
its own since there are no pictures before poc0. Using this definition, we
can easily identify that after poc0, the next GOP is comprised of {poc8,
poc4, poc2, poc1, poc3, poc6, poc5, poc7}. The reference coding structure is
then be applied periodically on the succeeding pictures throughout the video
sequence. The encoder can change the coding structure if it yields better
performance but we assume that it remains unchanged throughout the en-
coding of a video sequence. The decoder at the receiver side will extract the
information regarding the referenced pictures from Reference Picture Lists,
which we describe in the next section.
2.2.2 Reference Picture Lists
Coding structures specify the coding order and the dependencies between a
given set of pictures. The decoder does not have any knowledge about the
10
2.2. Coding Structures and Reference Picture Lists
Table 2.2: Reference Picture Sets for the Hierarchical-B Coding Structureof GOP-size 8
Reference Picture Set Reference POCs
0 pocn−8, pocn−10, pocn−12, pocn−161 pocn−4, pocn−6, pocn+4
2 pocn−2, pocn−4, pocn+2, pocn+6
3 pocn−1, pocn+1, pocn+3, pocn+7
4 pocn−1, pocn−3, pocn+1, pocn+5
5 pocn−2, pocn−4, pocn−6, pocn+2
6 pocn−1, pocn−5, pocn+1, pocn+3
7 pocn−1, pocn−3, pocn−7, pocn+1
coding structure that was used by the encoder, it must derive this informa-
tion from the LDUs that carry the encoded video data. In this section, we
explain how the encoder transmits the information regarding the dependen-
cies between pictures.
At the receiver end, as a picture gets decoded, it is either displayed on
the screen or stored in the Decoded Picture Buffer until it is eventually
output. Any picture located in the Decoded Picture Buffer can be reused as
reference for prediction. Pictures that are available for inter prediction are
listed in a so-called Reference Picture Set. The Reference Picture Set is sent
in the Sequence Parameter Set and each picture indexed in there is explicitly
identified using its POC value. Table 2.2 lists the different Reference Picture
Sets defined for the reference coding structure that was used by the JCT-VC
for the Main Profile Random Access encoder configuration as described in
[10]. Eight Reference Picture Sets are defined and for a given picture pocn,
the corresponding referenced POCs are given. Since poc0 is the first POC
of a video sequence, there can be no negative POC, therefore if poci with
11
2.2. Coding Structures and Reference Picture Lists
i < 0 were to be in a Reference Picture Set, the picture would simply not
be included.
The LDUs of a given picture carry a header that specifies which Reference
Picture Set to activate. H.265/HEVC uses two Reference Picture Lists for
inter prediction, called List0 and List1. The decoder reconstructs these
lists from the Reference Picture Sets that were supplied in the Sequence
Parameter Set and this process is specified in Section 8.3.4. of [3]. The
main difference between a Reference Picture Set and a Reference Picture
List is that a Reference Picture List is a subset of the Reference Picture
Set which is actually used for inter prediction. For uni-predicted frames
(P-Frames) only List0 is activated while for bi-predicted frames (B-Frames)
both List0 and List1 are activated. Motion compensated prediction is then
performed using the activated lists. The resulting prediction can be either
made from one picture only or a combination of pictures. Using these lists,
the hierarchy between pictures can be recovered. Table 2.3 depicts the
hierarchical-B coding structure of size 8 that was used by the JCT-VC for
the Main Profile Random Access encoder configuration as described in [10].
This is the reference coding structure that we use throughout this thesis for
all our video sequences. For each picture, we provide the Reference Picture
Set that is used and the POCs of the pictures in the Reference Picture
Lists. The first picture of a coded video sequence is usually an I-Frame
and I-Frames do not use Inter Prediction. Therefore it does not have any
associated Reference Picture Set and its associated Reference Pictures Lists
are empty. poc8 and poc16 both use the same Reference Picture Set, however
for poc8, three of the pictures do not exist therefore poc8 only references
12
2.3. Motion Compensated Prediction
Table 2.3: Reference Picture Lists for the Hierarchical-B Coding Structureof GOP-size 8
POC RPS used List0 POCs List1 POCs
0 - N/A N/A
8 0 0 0
4 1 0, 8 8, 0
2 2 0, 4 4, 8
1 3 0, 2 2, 4
3 4 2, 0 4, 8
6 5 4, 2 8, 4
5 6 4, 0 6, 8
7 7 6, 4 8, 6
16 0 8, 6, 4, 0 8, 6, 4, 0
12 1 8, 6 16, 8
10 2 8, 6 12, 16
9 3 8, 10 10, 12
... ... ... ...
poc0. By combining the information in Table 2.2 and Table 2.3, one can
easily reconstitute the direct dependencies that we illustrated earlier in Fig.
2.1 for the reference coding structure.
2.3 Motion Compensated Prediction
There are two types of prediction used in video compression: Intra(-frame)
Prediction and Inter(-frame) Prediction. Intra prediction is used for intra-
coded frames (I-Frames) whereas inter prediction is used for all other frames,
which can be uni-predicted frames (P-Frames) or bi-predicted frames (B-
Frames). Inter prediction in H.265/HEVC relies on Motion Compensated
Prediction in order to perform efficient compression. The main idea be-
hind inter prediction is that a given picture uses another picture as ref-
13
2.3. Motion Compensated Prediction
Figure 2.2: Uni- and bi-predictive inter-prediction illustration from adjacentpictures, adapted from [4]
erence, searches for the block in that reference picture that best matches
the predicted area and encodes the information of the motion of that block
between both pictures. In H.265/HEVC, a given picture may use one or
two pictures as reference for inter prediction. Fig. 2.2 illustrates the con-
cept of uni-predictive and bi-predictive inter prediction. This is achieved
using the coding structures that we introduced in Section 2.2.1. poc does
uni-prediction from picture poc− 2 and does bi-prediction from its adjacent
pictures poc − 1 and poc + 1. Note that bi-prediction does not require the
pictures to be adjacent to poc, one CB from poc uses poc− 2 and poc− 1 for
bi-prediction.
The H.265/HEVC standard operates on a block-basis. The most basic
block used in H.265/HEVC is called a Coding Block (CB). Each picture is
partitioned into multiple CBs. Each CB is further partitioned into smaller
blocks called Prediction Block (PB). After the picture has been partitioned
into PBs, the encoder will then perform prediction on a PB-basis from the
reference pictures whose POCs are given in the Reference Picture Lists.
14
2.4. Operation with Networking Layers
The encoder will look through the reference pictures for the same area as
the one in the PB on a PB-basis using a rate-distortion criterion. Once
it finds the area which presents the lowest amount of rate-distortion, it
encodes the information of the shift as the tuple of the motion vector and
the reference picture’s POC. The motion vector is the shift between the
area corresponding to the PB and the area in the reference picture which
presented the lowest amount of rate-distortion. The basic idea behind rate-
distortion optimization is that the encoder looks for the best possible coding
mode that reduces the loss of video quality, i.e. the distortion, and the
required bit rate to encode that area, i.e. the rate. It is beyond the scope
of this thesis to delve into rate-distortion algorithms and their specifics and
the interested reader is invited to refer to [11] and to [4] (Chapter 2) for
more details on the application of rate-distortion in video compression.
2.4 Operation with Networking Layers
Video compression techniques such as H.264/AVC and H.265/HEVC operate
at the Application layer, which sits at the highest level in the Open Sys-
tems Interconnection (OSI) model [12]. The encoder generates LDUs which
are then sent to the lower layers for transmission over packetized networks
based on the Internet Protocol (IP). One of the commonly used solutions
for delivering video content over IP networks is to use the Real Time Proto-
col (RTP). The Internet Engineering Task Force (IETF) has formulated the
RFC 6184, which details the operation of RTP for delivering H.264/AVC
content [13]. Similarly the IETF has formulated a draft RFC for the op-
15
2.4. Operation with Networking Layers
eration of RTP for delivering H.265/HEVC content [14]. We will look into
the specifics of RTP operation for delivering H.265/HEVC content. In this
thesis, we assume that that for all users we have a Single RTP stream on a
single media transport (SRST) and all LDUs are sent in RTP packets that
use the Single NAL unit packet structure. Fig. 2.3 shows the structure of
such an RTP packet. The PayloadHdr field is the bit-exact copy of the LDU
header, the DONL field is optional and carries the 16 least significant bits
of the Decoding Order Number. We assume that this field does not exist.
The NAL unit payload data field is the payload of the LDU and the last
field is also optional and included for the purpose of padding. We assume
that all RTP packets have a padding field occupying 10 bytes. Given that
the RTP specification for H.265/HEVC is still at a draft-level at the time of
writing, we allow ourselves to make some modifications and introduce a new
field in the Single NAL unit packet structure: the RefCount field. Since the
encoder knows exactly what coding structure is used to compress a video
sequence, it can also keep track of the number of times a given picture is
referenced within the video sequence and propagate that information to the
Figure 2.3: RTP Single NAL unit packet structure
16
2.4. Operation with Networking Layers
Figure 2.4: H.265/HEVC system layer stack
RTP packets. We assume that the RefCount field occupies 2 bytes.
For live streaming services, RTP is used in conjunction with the User
Datagram Protocol (UDP) to supply packets to IP. Another solution that
has been developed for buffered streaming services by MPEG is Dynamic
Adaptive Streaming over HTTP (DASH). DASH performs video streaming
over the Hypertext Transfer Protocol (HTTP) using adaptive bit rate and is
codec-agnostic. Since this solution is based on HTTP, packets are supplied
to IP using the Transmission Control Protocol (TCP). IP packets can then
be supplied to different wireless access technologies, such as LTE or Wireless
Fidelity (Wi-Fi). Fig. 2.4 gives an illustration of how the protocol stacks
are set up. In this thesis, we will focus on using video streaming services
to cellular users. We assume the use of RTP and UDP to supply packets
over IP, using the modified Single NAL unit packet structure for the RTP
payload, and LTE-A as the air interface.
17
Chapter 3
Cross-Layer Frame Reference
Aware Scheduling
Framework
In the previous chapter, we presented some of the features of the H.265/HEVC
standard that are relevant for video compression. We presented coding
structures, syntax structures and syntax elements, which are used to en-
code video content. We also presented motion compensated prediction for
more bandwidth-efficient encoding and reference picture lists for helping the
decoder track which pictures to use as reference when doing motion predic-
tion. Using these features, we define a cross-layer scheduling framework
that exploits these features and delivers video content based on their de-
pendencies between each-other. In this chapter, we propose a mathematical
formulation of the shared resource allocation problem for delivering video
content and derive the optimal solution to this problem.
18
3.1. Mathematical Formulation of the Shared Resource Allocation Problem
3.1 Mathematical Formulation of the Shared
Resource Allocation Problem
Let us consider S to be the set of users actively sharing resources. Let
us consider a user k and let the channel capacity of user k for time-slot n
be denoted by Ck(n). Kelly [15] has provided a mathematical formulation
of the shared resource allocation problem, which has been widely used by
the research community for tackling rate control problems in communication
networks. This shared resource allocation problem, which we will call SRAP,
is formulated as the following constrained optimization problem and solved
at the beginning of every time-slot n.
SRAP:
maximize F (~r(n)) ,∑k∈S
Uk(rk(n)) (3.1)
subject to rk(n) < Ck(n), rk(n) ≥ 0, k ∈ S (3.2)
F is the objective function that we are trying to maximize, Uk(rk(n)) denotes
the utility function of user k and rk(n) is the average throughput of user k
up to time-slot n. Constraint (3.2) ensures that the rate of the user does
not exceed the channel capacity Ck(n) that user k is experiencing during
time-slot n. Under the assumptions that the objective function F in (3.1)
is strictly concave and differentiable and that the feasible region in (3.2) is
compact, we know from Nonlinear Programming Theory [16] that an optimal
solution exists for SRAP and Kelly has provided an explicit optimal solution
to this problem using Lagrangian methods [15].
19
3.1. Mathematical Formulation of the Shared Resource Allocation Problem
In wireless networks, the channel capacity and the number of users ac-
tively sharing resource varies with time. This is due to the random nature
of the wireless channel and the network’s traffic. As a result, the optimal
solution to SRAP also varies with time. Hosein [17] proposed a solution
to SRAP by observing that finding the optimal solution consists in finding
the user which maximizes the gradient of the objective function. Hosein
developed his solution by introducing update equations using exponential
smoothing filters in order to keep track of each user’s throughput, whose
expression is given as follows
rk(n+ 1) =
(1− 1
τ)rk(n) +
dk(n)
τif user k is served,
(1− 1
τ)rk(n) otherwise.
(3.3)
dk(n) is the throughput of user k estimated for time-slot n in bits per sec-
ond. τ > 1 is the time constant of the exponential smoothing filter. rk(n)
is the average throughput of user k up to time-slot n. Because the objec-
tive function is strictly concave, Hosein showed that all we need to find is
the direction, i.e. the user, which maximizes the gradient of the objective
function. If we denote this user as user k∗ then
k∗ = argmaxk∈S
{∇F (~r)}. (3.4)
As an example, if the utility function Uk of each user k is defined as the
logarithmic function of the rate of that user log(rk), then the maximum
gradient direction, i.e. the user maximizing the gradient function, is given
20
3.1. Mathematical Formulation of the Shared Resource Allocation Problem
by:
k∗ = argmaxk∈S
{dk(n)
rk(n)
}(3.5)
(3.5) is the well-known Proportional Fair metric, widely used for scheduling
in cellular networks such as Universal Mobile Telecommunications System
(UMTS) and LTE. An alternate way of finding this result is as follows2. The
utility function Uk of each user k is defined as the logarithmic function of
the rate of user k and we know how the rate of each user is computed. Let
us assume that user i is selected at time-slot n, the new utility value will be
∑k∈Sk 6=i
log((1− τ−1)rk(n)) + log((1− τ−1)ri(n) + τ−1di(n)). (3.6)
By adding and subtracting log((1− τ−1)ri(n) in Eq. (3.6), the sum will be
performed for all users and Eq. (3.6) then becomes
∑k∈S
log((1− τ−1)rk(n)) + log
((1− τ−1)ri(n) + τ−1di(n)
(1− τ−1)ri(n)
). (3.7)
After some simplifications, Eq. (3.7) eventually boils down to
∑k∈S
log((1− τ−1)rk(n)) + log
(1 +
1
(τ − 1)
di(n)
ri(n)
). (3.8)
From Eq. (3.8), it is obvious to see that the overall utility is maximized if
user i maximizes di(n)ri(n)
, which is the Proportional Fair metric. Hosein [17]
also proposed the use of barrier methods in order to account for Quality of
Service (QoS) constraints. In nonlinear programming, barrier methods are
2The author of this simple and elegant proof is Dr. Cyril Leung.
21
3.1. Mathematical Formulation of the Shared Resource Allocation Problem
used on optimization problems in order to force the solutions to remain in
the interior of the feasibility region. Another alternative to barrier methods
are penalty methods, which forces the solutions to remain in a certain area
of the feasibility region by imposing large penalties to solutions that lie
outside of that area. In this thesis, we propose to use barrier functions in
order to deliver video content by exploiting frame references. For a detailed
discussion of penalty and barrier methods, the interested reader is invited
to refer to Chapter 13 of [16].
In order to deliver video content, we extend the formulation of SRAP
to account for frame reference awareness and call this new problem SRAP-
FRA. We introduce a new constraint on the frame reference count of user k,
ck(n), to account for the fact that the network does not hold transmission
queues of infinite size. This also prevents the scenario where a video user
watches an infinitely long video sequence. This aspect is modelled through
Finite-Buffer traffic models and these will be discussed in greater detail in
Chapter 4. Just like SRAP, SRAP-FRA is also solved at the beginning of
every time-slot n. The expression of SRAP-FRA is as follows.
SRAP-FRA:
maximize F (~r(n),~c(n)) ,∑k∈S
Uk(rk(n), ck(n)) (3.9)
subject to rk(n) < Ck(n), rk(n) ≥ 0. (3.10)
ck(n) < C ′k(n), ck(n) ≥ 0. (3.11)
C ′k(n) is the constraint on the number of frame references the transmission
22
3.1. Mathematical Formulation of the Shared Resource Allocation Problem
queue of user k can hold at any given time-slot n, ck(n) is the average number
of frame references that user k has been receiving up to time-slot n and
Uk(rk(n), ck(n)) is the combined utility function of user k that we introduce
for our frame reference aware scheduling framework. For our scheduling
framework, we need to track for each user whether its transmission queue
is holding any frame that is referenced within the video sequence user k is
watching and take any decision based on that. Essentially, we are building
a scheduling framework where users watching video content get sent content
that the decoder needs to perform its task as efficiently as possible and by
incurring as little delay as possible in playback. To that end, we use barrier
functions and express the combined utility function for each user k as
Uk(rk(n), ck(n)) = Uk,1(rk(n)) + Uk,2(ck(n)), (3.12)
where
Uk,1(rk(n)) , log(rk(n)), Uk,2(ck(n)) , −λ exp(−µ(ck(n)− cmin)). (3.13)
In (3.13), Uk,2 is a generalized expression of a barrier function, λ and µ are
positive-valued parameters for adjusting the penalty for leaving the feasible
region. Hosein [17] has proposed the use of such functions for delivering
QoS though there is no indication in the literature to suggest that this type
of approach is the most optimal way of accounting for QoE constraints.
Other approaches and methodologies should definitely be investigated for
addressing such issues. Our motivation for using a barrier function based
23
3.1. Mathematical Formulation of the Shared Resource Allocation Problem
approach is to provide a simple scheduling framework.
In parallel to the update equation of the rate of user k, we also introduce
an exponentially smoothed update equation for keeping track of the frame
reference count of user k.
ck(n+ 1) =
(1− 1
T)ck(n) +
tk(n)
Tif user k is served,
(1− 1
T)ck(n) otherwise,
(3.14)
where ck(n) is the frame reference count of user k at the beginning of time-
slot n, cmin is the minimum number of frame references that we force the
system to provide to each video user, T > 1 is the time constant of the
exponential smoothing filter and tk(n) is the number of frame references
being transmitted to user k at time-slot n. Due to the assumptions that we
made regarding the proposed combined utility function, the formulation of
SRAP-FRA can be rewritten as
SRAP-FRA:
maximize F (~r(n),~c(n)) ,∑k∈S
(Uk,1(rk(n)) + Uk,2(ck(n))
)(3.15)
subject to rk(n) < Ck(n), rk(n) ≥ 0. (3.16)
ck(n) < C ′k(n), ck(n) ≥ 0. (3.17)
This is the cross-layer scheduling framework that we propose to solve in this
thesis and for which we derive a solution in the following section.
24
3.2. Solution to the proposed Shared Resource Allocation Problem
3.2 Solution to the proposed Shared Resource
Allocation Problem
In this section, we are going to derive the solution to the proposed optimiza-
tion problem SRAP-FRA (3.15). We need to find the user that maximizes
the gradient of the objective function. Since we constructed our combined
utility function as the sum of two separate utility functions (3.12), maxi-
mizing the combined utility can be written as maximizing∑
k∈S Uk,1(rk(n))
and∑
k∈S Uk,2(ck(n)) individually. We already know the solution to the
maximization of the sum of the first utility function Uk,1(rk(n)). We will
focus on deriving the solution to the maximization of the sum of the second
utility function Uk,2(ck(n)). Let us call j the user selected to be served at
time-slot n by the network. Let us call β the parameter with which we
parameterize the movement of the sum of the second utility functions in the
direction of serving user j. The objective function F can then be written
as:
Fj,2(β) = Uj,2(cj(n) + β(cj(n+ 1)− cj(n)))+∑k∈Sk 6=j
Uk,2(ck(n) + β(ck(n+ 1)− ck(n))) (3.18)
25
3.2. Solution to the proposed Shared Resource Allocation Problem
User j is served and all other users are not. Given the update equations of
the frame reference count (3.14), (3.18) simplifies to
Fj,2(β) = Uj,2(cj(n) + βtj(n)− cj(n)
T)+∑
k∈Sk 6=j
Uk,2(ck(n)− β ck(n)
T). (3.19)
Taking the partial derivative of Fj,2 with respect to β and setting β to 0, we
get:
∂Fj,2∂β
=tj(n)− cj(n)
TU ′j,2(cj(n))−
∑k∈Sk 6=j
ck(n)
TU ′k,2(ck(n)). (3.20)
Eq. (3.20) can be rewritten as:
∂Fj,2∂β
=tj(n)
TU ′j,2(cj(n))−
∑k∈S
ck(n)
TU ′k,2(ck(n)). (3.21)
Since we are looking to maximize∂Fj,2
∂β , we can ignore the second term of
(3.21) as this term is a sum which is common to all users in the network. We
also know the expression of Uk,2, so the expression of the maximum gradient
direction is
k∗ = argmaxk
{λµ
Ttk(n) exp(−µ(ck(n)− cmin))
}. (3.22)
Essentially, this means that the system will maximize the utility of users
by prioritizing the transmission of referenced frames ahead of unreferenced
frames. As we saw in Section 2.2.1, there is a clear hierarchy in the way
26
3.2. Solution to the proposed Shared Resource Allocation Problem
frames depend upon each-other in video sequences. If a video user is pro-
vided frames which the decoder can always decode or if the decoder does not
have to wait for other frames before being able to decode those frames, then
video users can watch video sequences with no perceptible delay and this
will enhance the Quality of Experience of video users. This sort of procedure
helps counter error propagation within the video decoding process, therefore
the proposed cross-layer scheduling framework can be seen as a form of error
resilience. Using (3.5), (3.12) and (3.22), the final expression of the metric
for the proposed scheduling framework (3.15) can then be expressed as:
dk(n)
rk(n)+λµ
Ttk(n) exp(−µ(ck(n)− cmin)). (3.23)
For the rest of this thesis, we shall refer to our proposed scheduling scheme
as Frame Reference Aware Proportional Fair (FRA-PF).
27
Chapter 4
System Model
In this chapter, we describe our system model and simulation methodology
for evaluating the performance of our proposed scheduling framework. Our
evaluation methodology is centered upon using system-level simulations. In
this chapter we will cover the components that are of utmost relevance to this
thesis. More in-depth and complete description of system level simulation
methodologies can be found in [18], [19] and [20].
4.1 H.265/HEVC Video Content Generation
Analytical traffic models have been proposed for near-real time video stream-
ing in [18], where the packet sizes and packet inter-arrival times are based
on truncated Pareto distributions. While this analytical model captures the
Table 4.1: H.265/HEVC Table of Video Test SequencesSequence length Frame rate Resolution
(frames) (fps) (px x px)
FourPeople 600 60 1280x720
Johnny 600 60 1280x720
KristenAndSara 600 60 1280x720
SlideShow 200 20 1280x720
SlideEditing 300 30 1280x720
28
4.1. H.265/HEVC Video Content Generation
variability in the packet sizes coming from the video source, it is agnostic to
the specifics of the H.265/HEVC standard and therefore cannot be relied on
for generating realistic video traffic. Moreover, our objective is to evaluate
the application level experience of H.265/HEVC video users and to this end,
we use HM 14.0 to generate video bitstreams [21]. We use different video
test sequences which were used for development and testing purposes by
MPEG: FourPeople, Johnny, KristenAndSara, SlideShow and SlideEditing.
The characteristics of these video test sequences are given in Table 4.1. For
each of these video sequences, we generate the corresponding bitstream and
trace files using HM 14.0 [21], from which we extract the information of the
Reference Picture Lists, as defined in Section 2.2.2, for all frames in order
to determine the frame reference dependence structure.
For simplicity, we assume that each frame consists of only one slice seg-
ment (see Section 2.1), so that each frame is encoded inside one LDU. The
GoP size is set to 8, the Intra-Period is defined as the interval between two
consecutive I-Frames in terms of frames. The Intra-Period is always set so
that an I-Frame can be found approximately every second. Its value de-
pends on the frame-rate of the video sequence: for a frame rate of {20, 24,
30, 50, 60} fps, the Intra-Period is set to {16, 24, 32, 48, 64} (respectively).
Aside from I-Frames, we use B-Frames only. Using the bitstreams generated
from the video sequences we selected, we create a custom Traffic Model for
each video sequence and use it as input to our LTE-A simulator, which is
described below. The H.265/HEVC parameters used to generate the bit-
streams are summarized in Table 4.2. Other parameters needed to run HM
14.0 are left to their default values as in [10].
29
4.2. LTE-Advanced System Model
Table 4.2: H.265/HEVC ParametersHigh Efficiency Video Coding Parameters
Video Sequence Length 10 secondsSliceMode 0Coding Unit size 64 pixels x 64 pixelsGoP size 8Quantization Parameter 32Frame Structure IBB...BIBB...BDecoding Refresh Type Clean Random Access
4.2 LTE-Advanced System Model
In order to evaluate the performance of our proposed scheduling framework,
we use system-level simulations based on openWNS and IMTAphy [22]-
[23]. The performance evaluation methodology is based on the simulation
methodology described in Annex A of the 3GPP Technical Report 36.814
[19] and in the Evaluation Methodology Document of IEEE 802.16m [18].
In this section, we will describe some of the components and features that
we use in our performance evaluation. Evaluation methodologies based on
system-level simulations require many components to capture aspects of the
physical layer and the protocols implemented at the link layer.
4.2.1 Network Model
We consider a downlink LTE Advanced (LTE-A) system using Frequency
Division Duplex (FDD) with N = 19 base stations. Each base station is
assumed to have three sectors each in order to provide coverage, thus there
is a total of 57 sectors in the network. An illustration of the hexagonal
grid layout is provided in Fig. 4.1. To ensure that all cells experience
30
4.2. LTE-Advanced System Model
Figure 4.1: Hexagonal Network Grid Layout
similar interference and that we accurately model the impact of outer-cells,
we implement a wrap-around technique. The full system is actually modelled
as a network consisting of 7 clusters, where each cluster is made of N = 19
base stations. The central cluster is where the users are created and where
all of the statistics are collected. Fig. 4.2 illustrates the concept of wrap-
around. Virtual clusters are depicted in grey while the central cluster is
depicted in white, the central base station of each cluster is depicted in
yellow. The surrounding clusters are virtual clusters in the sense that no user
is actually dropped there. All the cells in the virtual clusters are copies of the
31
4.2. LTE-Advanced System Model
Figure 4.2: Wrap Around of Hexagonal Network
original cells in the central cluster. Everything the virtual cells have is the
same in terms of antenna configuration, traffic and fast-fading, with the only
difference being the location. Users are dropped independently at uniformly
random locations in the central cluster. For all base stations, we assume that
each sector uses 4 transmit antennas and each user uses 2 receive antennas.
This corresponds to a 4x2 Multiple Input Multiple Output (MIMO) system.
The system bandwidth B is assumed to be 10 MHz. Resource Allocation
32
4.2. LTE-Advanced System Model
Figure 4.3: LTE Downlink PRB allocation illustration
Type is assumed to be 0, i.e. that we allocate groups of Physical Resource
Block (PRB) to users. For a system bandwidth of 10 MHz, the 3GPP
standard specifies that users are allocated groups of 3 contiguous PRBs.
Fig. 4.3 depicts a PRB allocation with 4 users in a system with 10 MHz of
bandwidth. Note that at 10 MHz, the last group only contains 2 PRBs as
the total number of PRBs at 10 MHz of bandwidth is 50.
Table 4.3: FTP Traffic Model 1
Parameter Statistical Characterization
File size 2 Megabytes
User arrival rate λbe Poisson distributed process with rate λbeNumber of downloads 1 (each user downloads a single file)
33
4.2. LTE-Advanced System Model
4.2.2 Traffic Model
We model two types of traffic: Best Effort (BE) traffic and video traffic.
Traffic type assignment probability between BE and video is 0.5 each. Usu-
ally users are assumed to be active for the entire duration of the simulation,
i.e. they are created at the beginning of the simulation and dropped at the
end of the simulation, as stated in [18]. In this thesis, we decided to use more
realistic traffic models. Users are created at random time instants accord-
ing to a Poisson distributed random process. Users remain in the network
until they have completed their session or until they are dropped from the
network. For the BE traffic model, we use FTP Traffic Model 1 defined in
the 3GPP Technical Report [19] and whose parameters are summarized in
Table 4.3.
Similarly, we define a traffic model for video users; in this thesis we use
our own custom traffic model. Because we need information about frame
reference dependencies, we turn to HM 14.0 to generate realistic video bit-
streams for use in our performance evaluation. Section 4.1 covers the actual
generation of the video bitstreams in more detail. We wrap the video bit-
streams around six times as each bitstream individually carries 10 seconds’
worth of video data. This helps us generate video traffic representing one
minute’s worth of video data. Video users remain in the network until there
are no more packets left for them to receive. The parameters of our video
traffic model are summarized in Table 4.4.
34
4.2. LTE-Advanced System Model
Table 4.4: H.265/HEVC Traffic ModelParameter Statistical Characterization
Video duration 1 minute
User arrival rate λv Poisson distributed process with rate λvNumber of sessions 1 (each user watches a single video once)
4.2.3 Channel Model
For every user in the network, we need to model the effects of the large-scale
and small-scale fading. Depending on the simulated scenario, the propaga-
tion and fading characteristics of the channel may be different. In this thesis,
we focus on the Urban Macrocell scenario, also referred to as Case 1 by the
3GPP, as defined by the 3GPP in Table A.2.1.1-1 of [24]. It should be
noted that Urban Macrocell is also a scenario defined by the International
Telecommunications Union Radiocommunications Sector (ITU-R) in report
M.2135 [25]. The ITU-R scenario defines users traveling at vehicular speeds
(30 km/h) whereas the 3GPP Urban Macrocell scenario defines users as
traveling at pedestrian speeds (3 km/h). The reason for using the 3GPP
Urban Macrocell scenario is because we consider services which require high
data rates, which are more practical if the users are moving at pedestrian
speed. System-level simulations typically rely on stochastic channel models
such as the Spatial Channel Model [26] to capture these aspects. Typically,
channel models capture the number of clusters3 and their spatial character-
istics such as the delay spread, the angular spread and the power carried
by each cluster. The original implementation of the system-level simulation
tool we used, IMTAphy, uses the channel model specified by the ITU-R in
3In this thesis, we will interchangeably use the terms ”Cluster” and ”Tap”.
35
4.2. LTE-Advanced System Model
report M.2135 [25]. In [25], the channel model for the Urban Macrocell sce-
nario is defined as a 20-tap model, whereas the channel model we decided to
use is the Spatial Channel Model [26], which is a 6-tap model. There are two
reasons for choosing the 3GPP Spatial Channel Model. The first reason is
that although the ITU-R Channel Model is more accurate, it requires a large
memory footprint in terms of storing cluster and ray specific information. It
also requires high computational power due to having to sum a large num-
ber of clusters for every link, for every subcarrier and for every time-slot.
The second reason is that we are looking to do a fair comparison between
two different scheduling schemes. The relevant aspect of the channel model
that we need in order to do this is to accurately capture statistical char-
acteristics of the channel such as Delay Spread and Angular Spread rather
than to provide accurate performance predictions in real environments. The
radio channel can typically be described through its large-scale and small-
scale characteristics. Large-scale characteristics are captured through the
path-loss and the shadow fading distribution. The deterministic path-loss
formula used for the Urban Macrocell scenario is defined in [24] as follows
PL(d) = 128.1 + 37.6 log10(d) (4.1)
where PL denotes the mean path loss in dB between a given user and a given
base station and d denotes the distance between the user and the base station
in kilometers. This mean path-loss formula is valid for carrier frequencies
around 2 GHz. The distance between a user and a base station must always
be at least 35 meters. The short-term statistics are characterized by small-
36
4.2. LTE-Advanced System Model
scale parameters. Let us denote the number of clusters in a link by N . The
generation of the parameters required to compute the channel coefficients
is documented in [26] and [20]. The eventual channel impulse responses
account for the aspects of modelling a MIMO channel and are given for a
given pair of antennas s and u (resp. station and user) and a given cluster
n:
hu,s,n(t) =
√
1
KR + 1hNLoSu,s,n (t) +
√KR
KR + 1hLoSu,s,n(t) n = 1,√
1
KR + 1hNLoSu,s,n (t) 2 6 n 6 N,
(4.2)
where KR is the Ricean factor, hNLoSu,s,n is the non line-of-sight component of
the channel and hLoSu,s,n is the line-of-sight component of the channel, which
is applied only to the first cluster. The way the Spatial Channel Model is
designed, the first cluster is the cluster for which the delay is the shortest.
The non line-of-sight channel component is expressed for a given cluster and
for a given pair of transmit-receive antenna elements as follows [26]:
hNLoSu,s,n (t) =
√PnM
M∑m=1
Frx,u,V (θn,m)
Frx,u,H(θn,m)
T
exp(jΦvvn,m)
√κ−1 exp(jΦvh
n,m)√κ−1 exp(jΦhv
n,m) exp(jΦhhn,m)
Ftx,s,V (φn,m)
Ftx,s,H(φn,m)
exp(jds2πλ
−10 sin(φn,m)) exp(jdu2πλ−10 sin(θn,m)) exp(j2πνn,mt) (4.3)
where Pn is the power of the nth cluster, M is the number of rays within the
cluster, Frx,u,V and Frx,u,H are the field patterns of the uth antenna element
37
4.2. LTE-Advanced System Model
at the receiver side in the vertical and horizontal polarizations respectively,
Ftx,s,V and Ftx,s,H are the field patterns of the sth antenna element at the
transmitter side in the vertical and horizontal polarizations respectively,
θn,m and φn,m are the arrival and departure angles of the mth ray in the
nth cluster, ds and du are the distance between antenna elements at the
transmitter and receiver side respectively, νn,m is the Doppler frequency
component of the mth ray of the nth cluster and t is the time instant.
Φvvn,m, Φvh
n,m, Φhvn,m and Φhh
n,m are uniformly generated random phases used
for initialization purposes.
In a similar fashion to the non line-of-sight channel component, the line-
of-sight channel component for a given pair of transmit-receive antenna el-
ements and is expressed as follows [26]:
hLoSu,s,n(t) =
Frx,u,V (θLoS)
Frx,u,H(θLoS)
T exp(jΦvv
LoS) 0
0 exp(jΦhhLoS)
Ftx,s,V (φLoS)
Ftx,s,H(φLoS)
exp(jds2πλ
−10 sin(φLoS)) exp(jdu2πλ−10 sin(θLoS)) exp(j2πνLoSt) (4.4)
where Frx,u,V and Frx,u,H are the field patterns of the uth antenna element
at the receiver side in the vertical and horizontal polarizations respectively,
Ftx,s,V and Ftx,s,H are the field patterns of the sth antenna element at the
transmitter side in the vertical and horizontal polarizations respectively,
θLoS and φLoS are the arrival and departure angles of the line-of-sight ray,
ds and du are the distances between antenna elements at the transmitter
and receiver respectively, νLoS is the Doppler frequency component of the
line-of-sight ray and t is the time instant. ΦvvLoS and Φhh
LoS are uniformly
38
4.2. LTE-Advanced System Model
generated random phases used for initialization purposes.
The channel impulse responses given by (4.2) are expressed in the time-
domain. Since we are considering an LTE-A air interface, which is based on
OFDMA, we need frequency domain channel coefficients. The frequency do-
main channel coefficients are obtained by applying a Fast Fourier Transform
on the time domain channel impulses responses. The equivalent frequency
domain channel matrix at the kth subcarrier for a 4x2 MIMO system are
given as:
H(k) =
H1,1(k) H1,2(k) H1,3(k) H1,4(k)
H2,1(k) H2,2(k) H2,3(k) H2,4(k)
, k ∈ {1, 2, ..., NFFT } (4.5)
where NFFT is the Fast Fourier Transform size. Let us denote the Fast
Fourier Transform by F . Each individual component of the channel transfer
function H(k) at a given time-instant t is a function of the channel impulse
responses given by (4.2) and is expressed as follows [20]
Hu,s(k) = F [hu,s,1(t), hu,s,2(t), ..., hu,s,N (t)], k ∈ {1, 2, ..., NFFT }. (4.6)
In the specific case of LTE, the subcarrier spacing is defined as 15000 Hz.
For a system bandwidth of size 10 MHz, we need a sampling rate that is at
least higher than 10 MHz and that is a multiple of the subcarrier spacing, i.e.
15000 Hz. Since Fast Fourier Transforms are optimized for lengths that are
integer powers of 2, we use a Fast Fourier Transform of size NFFT = 1024.
39
4.2. LTE-Advanced System Model
4.2.4 Feedback Model
Critical to the performance of most wireless communications systems are
mechanisms for delivering Channel State Information (CSI) to the transmit-
ter. It is shown in Chapter 8 of [27] that with CSI knowledge at the trans-
mitter, one can extract the maximum performance available from MIMO
systems. The 3GPP standard has outlined several control signalling mech-
anisms for each of the transmission modes it defines. In this thesis, we use
Transmission Mode 10 with 4-Tx Release 12 linear precoding matrices [28].
The 3GPP standard defines an implicit feedback mechanism to operate the
Uplink control signalling. What is meant by ”implicit” is that instead of
sending information about the channel matrix itself, the user sends quan-
tized information about different channel statistics that can help the net-
work make appropriate scheduling decisions. The 3GPP standard defines
the content of the control signalling through 3 indicators [28]:
• Rank Indication (RI),
• Precoding Matrix Indicator (PMI),
• Channel Quality Indicator (CQI).
The RI is the rank of the channel matrix, i.e. the number of degrees of
freedom that it can carry. The PMI is the index of the Precoding Matrix
that maximizes the received power at the receiver and the CQI is the spectral
efficiency that the receiver would be able to achieve. The PMI and CQI
reports are conditioned upon the value of the RI. The reporting mode we
use in this thesis is the Aperiodic CSI Reporting Mode 3-1, as defined in
40
4.2. LTE-Advanced System Model
Section 7.2.1 of [28]. Other reporting modes are also defined by the 3GPP
[28].
Aperiodic CSI Reporting Mode 3-1 consists in a single RI report, a single
PMI report and several subband CQI reports. The size of a subband is
specified by the 3GPP standard to be 6 PRBs for a system bandwidth of 10
MHz in [28]. Thus, a single CSI report from the user will contain one value
for the RI, one value for the PMI and nine values for the CQI (one CQI
value per subband). In this thesis, we assume that the periodicity of the
CSI reports is set to 5 ms. The RI is typically a statistic that is reported less
frequently than the PMI or the CQI and its periodicity is set to 20 ms. For
the subband CQI reports, we assume non-ideal channel estimation, which is
obtained by modelling a noisy sample of the interference covariance matrix
in the equalizer vector using the complex Wishart distribution [29].
41
Chapter 5
Simulation Results and
Analysis
Some of the key targets specified by the NGMN Alliance for 5G networks
can be broadly summarized as providing consistent user experience and en-
hanced Quality of Experience. These targets are defined and outlined in [5].
As an example, one target is for the network to be able to provide a certain
user throughput for 95% of the time across 95% of the coverage area. This
is typically referred to as the 5th percentile of the Cumulative Distribution
Function (CDF) of the user throughput. We also look at the average user
throughput as an indicator of the overall user experience.
In this chapter, our simulation assumptions and results are described,
including insights gained from our results. So far, all the works in the field
of video transmission over wireless networks use Full Buffer methodologies
to evaluate performance. The main problem with Full Buffer methodologies
is that they only capture performance metrics (for instance user throughput
and served cell throughput) in a range where the network is operating at full
load. Since cellular networks experience different types of loads depending
on the time of the day, it is useful for carriers to have a more complete
42
5.1. Simulation Assumptions
view of performance at different traffic load points. One motivation for
using traffic models where user arrivals are modelled according to a Poisson
distributed random process is to capture performance at traffic load points
that are meaningful to carriers.
Intuitively, we expect that performance will be good at low traffic load
points because there is a small number of users in the network, which results
in low interference and high user throughputs. This ensures that users that
enter the network are served quickly and leave quickly. This scenario is
not attractive to carriers because although the Quality of Experience is
excellent, they are earning little revenue due to the small number of users.
Conversely, we expect that performance will be bad at high traffic load points
because there is a large number of users in the network, which results in high
interference and low user throughputs. This scenario is also unattractive to
carriers because although revenues are high due to the large number of
users accessing their spectrum, the Quality of Experience is mediocre and
this will lead to customer dissatisfaction. The desirable scenario for carriers
is intermediate traffic loads: where the number of users on the network leads
to a reasonable revenue for the carrier; the resulting moderate interference
leads to acceptable throughputs and users can enjoy reasonably good Quality
of Experience.
5.1 Simulation Assumptions
In this section, we outline some of the assumptions made in our simulations.
The main components of our system model are described in Chapter 4. Here,
43
5.1. Simulation Assumptions
we describe some of the other assumptions made. We assume that the base
station in our LTE-A network is a Media Aware Network Element (MANE).
A MANE is a network node which has the ability to parse an encoded
video bitstream and identify specific LDUs. Since our LTE-A base stations
can parse video bitstreams, they can specifically look for each user’s LDUs
and keep track of the RefCount field in the LDUs. Using the information
carried by the RefCount field, the LTE-A system can then keep track of the
referenced frames being sent to each video user, using exponential smoothing
update equations (3.14) and allocate resources accordingly. In the simulation
of our proposed scheduling framework, the following parameter values are
used: λ = 25, µ = 1, T = 25 and cmin = 50.
Our motivation in this work is to model a realistic 4G/beyond-4G sys-
tem. Although several research projects on 5G have been initiated, there is
no air interface specified yet for a 5G system. Therefore we use a 4G air
interface with as many up-to-date features as possible to do our performance
evaluation using metrics which have been proposed for 5G systems. For our
LTE-A system, we decide to model a 4x2 MIMO system. We also assume the
use of Single User Multiple Input Multiple Output (SU-MIMO), as opposed
to Multi User Multiple Input Multiple Output (MU-MIMO). It is shown
in Chapter 7 of [27] that in MIMO systems, the availability of both multi-
ple transmit antennas and multiple receive antennas can provide additional
spatial dimensions for communication. These additional degrees of freedom
can be exploited by spatially multiplexing different data streams onto the
MIMO channel. The main difference between SU-MIMO and MU-MIMO
is that SU-MIMO will focus on sending multiple data streams towards the
44
5.1. Simulation Assumptions
same user whereas MU-MIMO will focus on sending data streams towards
spatially separate users. We also assume the use of Transmission Mode 10
and assume the use of 4-Tx Release 12 Precoding Matrices [28]-[30]. Trans-
mission Mode 10 is a mode where the system allows the use of so-called
non-codebook based precoding with up to 8 layers. It is beyond the scope
of this thesis to describe the physical layer procedures and processing fea-
tures that are relevant for the operation of Transmission Mode 10. More
detailed description of Transmission Mode 10 and the associated physical
layer procedures are provided in [31]-[28]. For system-level simulations, we
need link-to-system models that can accurately translate an instantaneous
Signal to Noise Ratio (SNR) value into a corresponding instantaneous block
error rate value. Several methods exist in the literature such as Exponen-
tial Effective SNR Mapping (EESM) [32] and Mutual Information Effective
SNR Metric (MIESM) [33]. In this thesis, we use EESM. The basic idea
behind EESM is as follows: let us assume a user received a transmission
over Nsc subcarriers with instantaneous SNR value γk at the kth subcarrier.
The instantaneous effective SNR γeff using EESM is obtained as:
γeff = −β ln
(1
Nsc
Nsc∑k=1
exp
(− γkβ
)), (5.1)
where β is a correction parameter used for tuning a specific modulation.
The resulting γeff is then mapped to a corresponding block error rate. The
values of the β parameters depend on the modulation and the code rate,
e.g. β = 1.49 for Quaternary Phase Shift Keying (QPSK) with a code rate
of 13 or β = 7.68 for Quadrature Amplitude Modulation (QAM)-16 with a
45
5.1. Simulation Assumptions
Table 5.1: LTE-Advanced Parameters
LTE Advanced Parameters
System Bandwidth 10 MHzChannel Model Spatial Channel Model [20]Scenario Urban Macro-cell [24]Carrier Frequency 2 GHzLink-to-System Interface Exponential ESMTraffic Model Finite BufferReceiver Type Wishart-IRC [29]MIMO scheme 4x2 SU-MIMOTransmission Mode TM 10Precoding Codebook 4-Tx Release 12 [30]CSI Reporting Mode Aperiodic Mode 3-1 [28]
code rate of 45 . These values can be found in Table 19.13, Chapter 19 of
[20]. Several sources exist for the values of β that can be applied in an
LTE or LTE-A system, for our simulations we use the β values given in [32].
Parameter values for our LTE-A simulations are summarized in Table 5.1
and reflect those used in study items that 3GPP technical groups have used
for 3GPP Release 12.
As discussed in Section 4.2.2, we use traffic models that generate user
arrivals according to Poisson processes. The traffic assignment probability
is 0.5 each and in our simulations, the user arrivals rates for the two traffic
models, i.e. BE and video, are equal. This ensures that the average number
of users generated for each traffic type is the same. The length of the
simulation is chosen such that we generate at least 8000 users for each traffic
type. This was done to ensure that all the metrics that are reported in this
thesis are obtained within a 95% confidence interval of ±10% around the
mean value.
46
5.1. Simulation Assumptions
We use offered load per sector and Resource Utilization (RU) as our
reference points. This is because for finite buffer traffic models the 3GPP
consortium decided to evaluate performance based on the RU values a cel-
lular network goes through and we decided to align our methodology with
those assumptions. RU is defined as the ratio of the aggregated number of
radio resource blocks allocated for data traffic to the total number of ra-
dio resource blocks in the system bandwidth available for data traffic [19].
We first ran simulations using the Proportional Fair scheme and determine
the offered loads corresponding to RU values between 40% and 70%. Then
we ran simulations using the proposed scheme for those offered loads and
compare the resulting performance and QoE for both BE users and video
users. These offered loads are listed in Table 5.2. It can be seen that for
the PF scheme, the offered load per sector values range between 5.88 Mbps
per sector and 6.94 Mbps per sector. The 95% confidence interval for the
reported RU values is within ±3.2% of the reported values.
For video users, we report the Active Download Time (ADT), the satis-
fied video user percentage and the packet loss ratio of Clean Random Access
NAL units. A user is considered to be satisfied if its MOS is greater than
4. Conversely a user is considered to be unsatisfied if its MOS is lower than
3. Nightingale [9] showed that even a slight degradation in radio conditions,
i.e. a packet loss ratio of 3%, is enough to make the Quality of Experi-
ence mediocre. Clean Random Access NAL units carry the encoded video
data of I-Frames and represent the largest percentage of the bitstream in
terms of bit rate. Since the decoding of the whole video sequence is basi-
cally reliant on the correct decoding of these LDUs, the packet loss ratio of
47
5.2. Simulation Results and Discussion
these LDUs provides a good indication of how much video content becomes
non-viewable.
For BE users, we report the absolute values of the average user through-
put and the 10th-percentile of the user throughput CDF. We also report the
average user throughput in the outer region of every cell. The reason we
choose to report the 10th-percentile instead of the 5th-percentile is because
much longer simulations would be required to generate results within a 95%
confidence interval. As an example, simulations generating on average 16000
users (8000 video users and 8000 BE users respectively) take between 48 to
72 hours of run time. In order to generate results where the 95% confidence
intervals of the 5th-percentile of the user throughput are within ±10%, we
would need to generate possibly over 30000 users. This could potentially
lead to simulation run times of over a week, which is highly impractical. In
this thesis, we will refer to the 10th-percentile of the user throughput CDF
as the coverage user throughput. A given BE user’s throughput is calculated
as the ratio of the total volume of the transferred data to the download time.
For BE users, the download time is defined as the difference between the
time instant of the last packet correctly received by the user and the time
instant of the first packet transmitted to the user.
5.2 Simulation Results and Discussion
In this section, we present our simulation results and discuss the main find-
ings. We will present our results for video users followed by those for BE
users.
48
5.2. Simulation Results and Discussion
Table 5.2: Offered Load and corresponding Resource Utilization
Offered Load Resource Utilization(Mbps / Sector) (%)
5.88 40.0PF 6.27 50.0
scheme 6.58 60.06.94 70.0
5.88 35.4FRA-PF 6.27 41.9scheme 6.58 47.8
6.94 53.7
5.2.1 Results for video users
For the performance evaluation of video users, we consider two metrics. The
first metric that we introduce is the ADT: which is the time a video user
spends actively downloading video content. The second metric is the MOS
provided by users about their viewing experience.
The 95% confidence intervals for the active download time are within
±6% of the reported values. Fig. 5.1 shows the active download times video
users spend downloading video content while they are in the network. Using
the Proportional Fair scheme, video users spend between 3.5 seconds and
8 seconds downloading video content (for offered loads between 5.9 Mbps
per sector and 6.9 Mbps per sector respectively). These numbers can be
explained by the fact that with the Proportional Fair algorithm tries to be
fair to all users, video and BE alike. Resources end up being shared by all
users. Using our proposed scheme, video users are given higher importance
if their transmission queues carry referenced frames. This is due to the
barrier functions we introduced in our scheduling framework. Therefore
49
5.2. Simulation Results and Discussion
5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.22
3
4
5
6
7
8
9
Offered Load [Mbps / Sector]
Vid
eo
Ac
tive D
ow
nlo
ad
Tim
e [
s]
PF
FRA−PF
Figure 5.1: Video users’ active download time
if a base-station is serving both video and BE users, video users will be
prioritized over BE users as long as they have referenced frames to receive.
Resource allocation is focused on video users first, which results in them
being served more quickly, as Fig. 5.1 shows. For offered loads between 5.9
Mbps per sector and 6.9 Mbps per sector, video users spend between 2.2
seconds and 4.2 seconds downloading video content. This is very significant
as any time video users do not spend downloading video content means that
the resources available at that time can be allocated to BE users.
Possibly the most important aspect in the performance evaluation of
video services is the MOS which reflects the quality of the viewing experience
from the users’ perspective. We are going to look into the MOS that users
would give based on the Packet Loss that they experience, which we denote
50
5.2. Simulation Results and Discussion
5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.250
55
60
65
70
75
80
85
90
95
100
Offered Load [Mbps / Sector]
Sati
sfi
ed
Vid
eo
User
Perc
en
tag
e [
%]
PF − MOS > 4
FRA−PF − MOS > 4
Figure 5.2: Satisfied Video User Percentage
as the satisfied video user percentage. The 95% confidence intervals of the
satisfied video user percentage results are within ±8% of the reported values.
It was shown in [9] that the MOS is very sensitive to the Packet Loss Ratio
(PLR). The findings in [9] were that for PLRs below 1.5% correspond to a
MOS above 4 (perceptible degradation but not annoying). Assuming that
a video user’s MOS is only affected by the PLR it experiences, we can state
that the QoE of a video user will be high if the PLR is below 1.5% (i.e.,
its MOS will be greater than 4, and the video user will be satisfied). The
QoE will be low if the PLR is higher than 1.5% (i.e., its MOS will be lower
than 4, and the video user will experience significant degradation). Fig. 5.2
shows the results in terms of video user percentage for which the MOS is
greater than 4.
Our proposed FRA-PF scheme leads to a higher percentage of satisfied
51
5.2. Simulation Results and Discussion
video users, which is expected as video users have unconditional priority
over BE users. As can be seen from Fig. 5.2, for offered loads around 5.9
Mbps per sector, both PF and FRA-PF schemes are able to satisfy over
90% of video users. However the performance of the PF scheme degrades
more quickly as the load increases: for offered loads around 6.8 Mbps per
sector, the FRA-PF scheme can satisfy over 80% of video users whereas the
PF scheme satisfies less than 60% of video users.
Another aspect that we look into is the percentage of Clean Random
Access (CRA) LDUs lost. I-Frames are typically carried inside CRA LDUs
and they represent the most significant portion of the bitstream in terms
of bits. Because of the way the video compression process is defined in
the H.265/HEVC standard, I-Frames are the frames that are referenced the
most throughout a video sequence and the loss of an I-Frame causes error
propagation within the decoding process at the receiver end. We aligned
our settings for the Intra-Period so that two I-Frames are one second apart
from each other [10].
Intuitively, the loss of an I-Frame causes the loss of about one second
of video content to the end user because all subsequent B-Frames reference
an I-Frame, directly or indirectly. Those B-Frames could, strictly speaking,
still be usable by the decoder to produce a picture. The problem is that
those B-Frames could potentially be incomplete, i.e. some sections could
be missing Luminance or Chrominance sample information. The whole idea
behind H.265/HEVC is to use motion compensated prediction in as many
frames as possible. Fig. 5.3 shows the results obtained for CRA LDU loss
ratio. Since the proposed FRA-PF scheme is able to locate referenced frames
52
5.2. Simulation Results and Discussion
5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.20
1
2
3
4
5
6
7
8
9
10
Offered Load [Mbps / Sector]
CR
A L
DU
Lo
ss R
ati
o [
%]
PF
FRA−PF
Figure 5.3: CRA LDU Loss Ratio
and transmit them with higher priority, FRA-PF has a lower CRA LDU loss
ratio.
Let us consider the bitstream of the video sequence FourPeople as an
example. The original video bitstream contains 9 CRA LDUs and 600 LDUs
in total. Since we wrap the bitstream around 6 times, this results in a total
of 54 CRA LDUs for a given user. With the PF scheme, the CRA LDU
loss ration goes from 1.6% to 9.0% out of the total 54 CRA LDUs as the
offered load changes from 5.9 to 6.9 Mbps per sector. This corresponds to
at least 1 LDU or at worst 5 LDUs. For offered loads near 7 Mbps per
sector, this means that as much as 5 seconds of video content becomes non-
viewable because of the loss of CRA LDUs. With the proposed FRA-PF
scheme, the CRA LDU loss ratio goes from 0.1% to 1.18% out of the total
54 CRA LDUs as the offered load changes from 5.9 to 6.9 Mbps per sector.
53
5.2. Simulation Results and Discussion
This means that in either case up to 1 LDU is lost. For offered loads near
7 Mbps per sector, this means that as much as 1 second of video content
becomes non-viewable because of the loss of CRA LDUs. This highlights
how the proposed FRA-PF scheme provides the decoder with the reference
frames to facilitate the task of decoding and also how the proposed scheme
locates the packets with greater importance for the H.265/HEVC decoder.
Providing referenced frames with greater priority helps maintain continuous
playback at the end user and contributes to enhance the viewing experience
of video users. From the user’s perspective, non-continuous video playback
will always constitute a source of dissatisfaction. Our proposed FRA-PF
scheme reduces the loss of packets carrying referenced frames, which will
help maintain continuous playback.
5.2.2 Results for Best Effort users
For BE users, we report the absolute gains of the average throughput and the
coverage throughput (which we defined in Section 5.1). The 95% confidence
intervals of the average throughput and coverage throughput are within±3%
and ±9% respectively of the reported values.
The average throughput is plotted as a function of the offered load in
Fig. 5.4. The offered load values of interest to us are in the range of 5.9
to 6.9 Mbps per sector. From Fig. 5.4, it can be seen that the with the
PF scheduling scheme, BE users can expect to get throughputs on average
between 15 Mbps and 10 Mbps. With our proposed FRA-PF scheme, users
can expect to get throughputs on average between 16 Mbps and 12 Mbps.
This is explained by the fact that our proposed FRA-PF scheme serves video
54
5.2. Simulation Results and Discussion
5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.210
11
12
13
14
15
16
Offered Load [Mbps / Sector]
Th
rou
gh
pu
t [M
bp
s]
PF
FRA−PF
Figure 5.4: Average throughput for Best Effort users
traffic more quickly, as shown in Fig. 5.1. As video users are served more
quickly, radio resources then become available to BE users. The availabil-
ity of more radio resources helps BE users leave the network more quickly
and therefore experience higher throughputs. Put simply: allocating the
resources to the right users at the right time will benefit all users. This is
shown by the results we have obtained in terms of the Resource Utilization
by the network and the average throughputs users can get on average.
Fig. 5.5 shows the coverage throughput results for offered load values
between 5.9 and 6.9 Mbps per sector. As expected, the coverage throughput
is much lower compared to the average throughput. In an LTE-A system us-
ing the PF scheduling scheme, users can expect to get coverage throughputs
between 4.9 Mbps and 1.5 Mbps. In an LTE-A system using our proposed
FRA-PF scheme, for the same offered load values, users can expect to get
55
5.2. Simulation Results and Discussion
5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.21
2
3
4
5
6
7
Offered Load [Mbps / Sector]
Th
rou
gh
pu
t [M
bp
s]
PF
FRA−PF
Figure 5.5: Coverage throughput for Best Effort users
coverage throughputs between 6.1 Mbps and 3.6 Mbps. This result is a lot
more significant than the average user throughput we have shown earlier.
It shows that 90% of users can expect a throughput of at least 3.6 Mbps,
which is more than double the throughput with the PF scheduling scheme.
Because we model a BE type of service, this improvement in throughput
translates into latency reduction since the volume of data to download is
fixed. For other services, e.g. Web Browsing, higher throughput can trans-
late into noticeably faster loading times and enhanced Quality of Experience.
We stated that the 95% confidence intervals of the coverage throughput are
within ±9% of the reported values, this is due to the fact that the statistics
of users that experience relatively low Signal to Interference and Noise Ratio
(SINR) are very sensitive. We model random user arrivals in our simula-
56
5.2. Simulation Results and Discussion
Figure 5.6: Illustration of the outer 10% of the coverage area
tions, which leads to inter-cell interference that varies with time. For users
experiencing low SINR, even slight improvements or degradations can have
very significants impacts on the eventual throughput they experience.
Finally, we examine the statistics for users that are geographically lo-
cated within the area covering the outer 10% of the coverage area, as de-
picted in Fig. 5.6, we will call this region the cell-edge region. The area
A of a hexagon is calculated as A = 2√
3a2, where a is the apothem of the
hexagon. Using a hexagonal network deployment as shown in Fig. 4.1 and
knowing that the inter-site distance is equal to 500 meters, we can easily
find that the apothem size is then 250 meters. The users in the cell-edge
region are those who lie outside the inner hexagon, i.e. outside the hexagon
of apothem a′ ' 237 meters. The results are shown in Fig. 5.7. For offered
loads ranging from 5.9 to 6.9 Mbps per sector, users in the cell-edge region
experience throughputs ranging from 11.4 to 8 Mbps with the baseline PF
scheme. With our proposed FRA-PF scheme, users in the cell-edge region
experience throughputs ranging from 12.2 to 9.9 Mbps per sector for the
same offered load values. The trend is consistent with those for the average
throughput and coverage throughput. However it is interesting to note that
the average throughput of users in the cell edge region is higher than the
57
5.2. Simulation Results and Discussion
5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.28
8.5
9
9.5
10
10.5
11
11.5
12
12.5
Offered Load [Mbps / Sector]
Th
rou
gh
pu
t [M
bp
s]
PF
FRA−PF
Figure 5.7: Average BE user throughput in Cell-Edge region
coverage throughput values reported in Fig. 5.5. This is because we generate
users randomly over time, which leads to inter-cell interference varying over
time. As a result, a user located in the cell-edge region but not interfered by
neighbouring cells will still experience reasonably high throughputs, as Fig.
5.7 shows. Usual Full Queue simulation methodologies operate in a range
where every cell in the network is always transmitting at all times, therefore
every cell is always interfering users in neighbouring cells at all times. As
a result, only an individual user’s radio conditions will determine whether
high throughputs are achievable or not. Users located closer to their serving
cell would suffer from lower path loss, this would translate to higher average
SNR and higher throughput. Using Finite Buffer simulation methodologies,
this is no longer true due to users arriving randomly in the network and be-
58
5.2. Simulation Results and Discussion
ing subject to the inter-cell interference that is present during the time the
user is in the network. Of course, path loss always plays a significant role in
dictating overall performance but this is now tempered by the fact that users
arrive randomly in the network, which affects the inter-cell interference.
In order to provide better QoE to all users, resource allocation schemes
should target users that require the lowest amount of resources in order to be
satisfied. This will help the system deliver better user experience to all users
in the network. The QoE of all users improves thanks to the departure of
other users and our proposed scheme does that by serving video users faster.
This benefits all users in the network and helps provide a more consistent
user experience across the whole network, which is in line with the objectives
of future 5G networks.
59
Chapter 6
Conclusions and Future
Work
This chapter summarizes the main contributions of the thesis and provides
some suggestions regarding for future work.
6.1 Contributions
In this thesis, we addressed the topic of transmitting video content in 4G and
beyond-4G networks by exploiting information about the way H.265/HEVC
operates. Using knowledge of the coding structures, reference picture lists
and the process through which the H.265/HEVC encoder transmits this
information to the decoder, we proposed a cross-layer scheduling frame-
work which allocates resources to video users that need to receive referenced
frames.
Our performance evaluation of H.265/HEVC video-content delivery was
made in a mixed-traffic environment using random user arrivals and finite-
buffer traffic models. To the best of our knowledge, there is no similar work
reported in the literature. Results showed that both video and BE users
benefit from the proposed scheduling framework. Video users benefit from
60
6.2. Future Work
reduced losses on packets carrying referenced frames while BE users benefit
from improved throughput. The improvement for video users is achieved by
tracking referenced frames and focusing resource allocation towards video
users whenever their transmission queues have packets carrying referenced
frames in the video sequence. As long as there are such frames in the
transmission queue of a video user our proposed framework prioritizes these
users and allocate resources to them. This allows video users to download
video content more quickly and allows BE users to access resources more
quickly, leave the network more quickly and enjoy higher throughputs on
average as a result.
As we go towards 5G networks, the expectation from cellular networks
is that they provide a consistent user experience across the coverage area.
Results showed that 90% of BE users can expect to get between 1 Mbps
to 2 Mbps higher throughput using FRA-PF, which can potentially be the
difference between excellent and mediocre in the Quality of Experience the
user is getting. In addition, it was found that BE users in the cell-edge
region of each cell actually experience much higher throughputs than the
10th percentile of the user throughput CDF. This shows that users that
experience lower throughputs are not necessarily located in the cell-edge
region but can in fact be much closer to the base-station.
6.2 Future Work
Several future directions can be pursued, depending on which side of the
problem one wishes to focus on.
61
6.2. Future Work
If one were to focus on the communications side, one direction for future
work could be to use an air-interface that is actually going to be used in
5G systems. In this work we considered the use of a LTE-A air interface
with some 3GPP Release-12 features such as the Release 12 4-Tx Linear
Precoding. This is because at the time the work was undertaken, 3GPP
was still working on Release 13 and no air-interface had yet been proposed
for 5G systems so we did not have the opportunity to evaluate performance
for such systems. Instead we focused more on performance evaluation using
realistic traffic models over an up-to-date LTE-A air-interface and look at
the performance metrics to be used in 5G networks.
In our performance evaluation, we did not compare our proposed FRA-
PF scheme with a scheduling scheme that would strictly prioritize users
requesting video services over best effort users. It would be interesting
to see whether such a scheduling scheme achieves improvements for both
video users and best effort users. We also did not consider any admission
control policies in our traffic models, which would regulate traffic arrival in
high load situations and can have a significant impact on user experience.
Another direction for future work could be to look into traffic offloading
schemes. Since 3GPP Release 8, the 3GPP community has been introducing
support for heterogeneous networks. Smaller base-stations can be deployed
in the cell-edge region in order to provide coverage to users with stringent
QoS or QoE requirements. For example: macro base-stations can offload
specific users in the coverage area of small base-stations in order to provide
better QoE to its own users, and therefore provide a more consistent user
experience across the whole network, something that 5G networks will be
62
6.2. Future Work
required to provide. The more general problem to address is to design
scheduling frameworks which will provide the best user experience and at
the same time maximize revenue for carriers.
If one were to focus on the video encoding or video compression side,
one direction for future work could be the actual evaluation of subjective
quality. No subjective quality testing was performed in our work. The
major stumbling block that needs to be overcome is to get the reference
implementation of the H.265/HEVC decoder to produce a viewable video
sequence of a bitstream with missing LDUs. The reference decoder imple-
mentation is not designed to be robust against any form of packet loss and
aborts the decoding process at the slightest error or absence of an LDU. If we
can reconstitute samples of bitstreams with missing LDUs and output the
corresponding video sequence, it would be possible to do subjective quality
testing and gain insights into how the loss of specific packets impacts the
viewing experience. This will give much clearer insights into how packet loss
and Quality of Experience are related for video services, and more specifi-
cally how much the loss of packets carrying I-Frames hurts the Quality of
Experience.
63
Bibliography
[1] Cisco, “Cisco Visual Networking Index: Global Mobile Data Traffic
Forecast Update, 2014-2019,” February 2015.
[2] ITU-T, Advanced Video Coding for generic audio visual services - Rec-
ommendation ITU-T H.264. February 2014.
[3] ITU-T, High Efficiency Video Coding - Recommendation ITU-T H.265.
April 2013.
[4] M. Wien, High Efficiency Video Coding - Coding Tools and Specifica-
tions. Springer, May 2014.
[5] N.-G. M. Networks, “NGMN 5G White Paper,” February 2015.
[6] M. Rugelj, U. Sedlar, M. Volk, J. Sterle, M. Hajdinjak, and A. Kos,
“Novel Cross-Layer QoE-Aware Radio Resource Allocation Algorithms
in Multiuser OFDMA Systems,” IEEE Transactions on Communica-
tions, September 2014.
[7] S. Singh, O. Oyman, A. Papathanassiou, D. Chatterjee, and J. G. An-
drews, “Video Capacity and QoE Enhancements over LTE,” IEEE In-
ternational Conference on Communications, June 2012.
64
Bibliography
[8] M. Salem, P. Djukic, J. Ma, and M. Hawryluck, “QoE-Aware Joint
Scheduling of Buffered Video on Demand and Best Effort Flows,” IEEE
International Symposium on Personal, Indoor and Mobile Radio Com-
munications, September 2013.
[9] J. Nightingale, Q. Wang, C. Grecos, and S. Goma, “The Impact of
Network Impairment on Quality of Experience (QoE) in H.265/HEVC
Video Streaming,” IEEE Transactions on Consumer Electronics, May
2014.
[10] F. Bossen, “Common HM test conditions and software reference con-
figuration,” April 2012.
[11] G. Sullivan and T. Wiegand, “Rate-distortion optimization for video
compression,” IEEE Signal Processing Magazine, pp. 74–90, November
1998.
[12] T. Schierl, M. M. Hannuksela, Y.-K. Wang, and S. Wenger, “System
Layer Integration of High Efficiency Video Coding,” IEEE Transac-
tions on Circuits and Systems for Video Technology, pp. 1871–1884,
December 2012.
[13] Y.-K. Wang, R. Even, T. Kristensen, and R. Jesup, RTP Payload For-
mat for H.264 Video. IETF, May 2011.
[14] Y.-K. Wang, Y. Sanchez, T. Schierl, S. Wenger, and M. Hannuksela,
RTP Payload Format for H.265/HEVC Video. IETF, August 2015.
65
Bibliography
[15] F. Kelly, “Charging and rate control for elastic traffic,” European Trans-
actions on Communications, pp. 33–37, 1997.
[16] D. G. Luenberger and Y. Ye, Linear and Nonlinear Programming.
Springer, 3rd ed., 2008.
[17] P. A. Hosein, “QoS Control for WCDMA High Speed Packet Data,”
IEEE International Workshop on Mobile and Wireless Communications
Network, 2002.
[18] R. Srinivasan, J. Zhuang, L. Jalloul, R. Novak, and J. Park, “IEEE
802.16m Evaluation Methodology Document (EMD),” July 2008.
[19] “3GPP TR 36.814 v9.0.0 - Technical Specification Group Radio Ac-
cess Network - Evolved Universal Terrestrial Radio Access (E-UTRA)
- Further advancements for E-UTRA physical layer aspects,” March
2010.
[20] F. Khan, LTE for 4G Mobile Broadband. Cambridge University Press,
2009.
[21] “HM 14.0, HEVC Test Model Reference Implementation.” https://
hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/. Accessed: 2014-
09-30.
[22] “IMTAphy, LTE/LTE-Advanced system level simulator.” http://www.
lkn.ei.tum.de/personen/jan/imtaphy/index.php. Accessed: 2014-
05-24.
66
Bibliography
[23] “openWNS, open Wireless Network Simulator, open source system
level simulation platform for performance evaluation and comparison
of wireless and multi-cellular mobile communication systems.” https:
//launchpad.net/openwns. Accessed: 2014-05-24.
[24] “3GPP TR 25.814 v7.1.0 - Technical Specification Group Radio Access
Network; Physical layer aspects for evolved Universal Terrestrial Radio
Access (UTRA),” December 2006.
[25] ITU-R, “Guidelines for evaluation of radio interface technologies for
IMT-Advanced,” December 2009.
[26] “3GPP TR 25.996 v9.0.0 - Spatial channel model for Multiple Input
Multiple Output (MIMO) simulations,” December 2009.
[27] D. Tse and P. Viswanath, Fundamentals of Wireless Communications.
Cambridge University Press, March 2010.
[28] “3GPP TS 36.213 v12.2.0 - Technical Specification Group Radio Ac-
cess Network - Evolved Universal Terrestrial Radio Access (E-UTRA)
- Physical Layer Procedures,” June 2014.
[29] “3GPP TR 36.829 v11.1.0 - Technical Specification Group Radio Access
Network - Enhanced performance requirement for LTE User Equipment
(UE),” December 2012.
[30] A. Roessler, J. Schlienz, S. Merkel, and M. Kottkamp, “LTE-Advanced
(3GPP Rel.12) Technology Introduction - White Paper,” June 2014.
67
Bibliography
[31] “3GPP TS 36.211 v12.2.0 - Technical Specification Group Radio Ac-
cess Network - Evolved Universal Terrestrial Radio Access (E-UTRA)
- Physical channels and modulation,” June 2014.
[32] J. Olmos, A. Serra, S. Ruiz, M. Garcia-Lozano, and D. Gonzalez, “Ex-
ponential Effective SIR Metric for LTE Downlink,” IEEE International
Symposium on Personal, Indoor and Mobile Radio Communications,
September 2009.
[33] W. Lei, T. Shiauhe, and M. Almgren, “A fading-insensitive performance
metric for a unified link quality model,” IEEE Wireless Communica-
tions and Networking Conference, April 2006.
68