a statistically optimal algorithm for multimedia receiver buffers

UNIVERSITY OF OSLO

Department of Informatics

A statistically optimal algorithm for multimedia receiver buffers

Doctoral thesis

Brita H. Hafskjold Gade

Supervisor: Professor Fritz Albregtsen

20. December 2007

© Brita H. Hafskjold Gade, 2008 Series of dissertations submitted to the Faculty of Mathematics and Natural Sciences, University of Oslo Nr. 804 ISSN 1501-7710 All rights reserved. No part of this publication may be reproduced or transmitted, in any form or by any means, without permission. Cover: Inger Sandved Anfinsen. Printed in Norway: AiT e-dit AS, Oslo, 2008. Produced in co-operation with Unipub AS. The thesis is produced by Unipub AS merely in connection with the thesis defence. Kindly direct all inquiries regarding the thesis to the copyright holder or the unit which grants the doctorate. Unipub AS is owned by The University Foundation for Student Life (SiO)

v

Abstract

For interactive multimedia and for multimedia streams, receiver playoutbuffers are required to smooth network delay variations. The most commonly used playoutbuffer algorithms for voice use a constant playout speed, and give all packets the same end-to-end delay or use a new value of end-to-end delay for each new talkspurt. Controlling the playout speed can give a lower end-to-end delay and fewer packets that are lost due to late arrivals, and the aim of this thesis has been the development of an optimal control algorithm for the playout speed.

The most significant difference between this work and other published playoutbuffer algo-rithms, is that we use a thorough mathematical approach. First, a stringent notation and stringent mathematical models of the media receiver system are developed, by using control theory. The developed notation and mathematical models of the media receiver system are totally generic and independent of the networks and protocols used.

We have identified three deviations from perfect playout: 1) Buffering delay 2) A playout rate different from the sender rate and 3) A change of playout rate. By using optimal control theory and these three deviations from perfect playout, the statistically optimal controller of the playout speed is deduced. The optimal controller minimizes the three deviations from the perfect playout, based on their relative importance. The importance will vary for different user and application requirements, and are thus freely tunable by means of three weight factors. The optimal controller is also independent of the networks and protocols used.

The optimal controller needs an estimate of the network and protocol behavior, and a Kalman Filter based estimator is developed to provide this estimate. An anti-run-dry algorithm is also developed for the optimal control algorithm, to give the user a controllable run-dry probability.

To evaluate the optimal control algorithm, and compare it to other algorithms, a receiver system environment simulator has been implemented in Matlab. This simulator is network and protocol independent. Both the notation, mathematical models and the simulator can thus be used as a basis for developing and/or evaluating any kind of playoutbuffer algorithms.

The receiver system environment simulator is used to evaluate the optimal controller and anti-run-dry algorithm with a range of different tuning settings, and also to evaluate some of the existing playout control algorithms.

When compared to other algorithms in an objective technique for measuring voice quality (PESQ - Perceptual Evaluation of Speech Quality) and in a subjective listening test (DMOS -

vi

Degradation Mean Opinion Score), the optimal controller has demonstrated very good results for both simulated and real network measurement traces.

vi

Acknowledgements

When I first started as a PhD student in February 1999 on the project MULTE (Multimedia Middleware for Low Latency High Throughput Environment), my research area was Gigabit ATM (Asynchronous Transfer Mode) networks and zero-copy protocol stacks. The financing period was 4 years, with 25% duty work such as supervising master students. I would like to thank my original main supervisor Professor Thomas Plagemann at Ifi (Department of Infor-matics), UiO (University of Oslo) and my original secondary supervisor, Dr. Robert Macdonald at FFI (Norwegian Defence Research Establishment), for their guidance on the original subject.

The midterm evaluation of MULTE concluded that the theme of my work was no longer an open problem. Thus, I needed to find a new theme. Though the searching process was long and stressful, it was rewarding in that it gave insight into the process of searching for an open area, learning about the area and checking that it has still room for improvements. When this process was finished with the new theme “playout buffer control“ by the end of january 2002, I had one year left of my financing period to develop a unique solution to playout buffer control.

To develop my algorithms, I used cybernetic theory, and since my original main supervisor did not have cybernectical background, I wrote my first two papers alone. However, I am very grateful to Professor Fritz Albregtsen at Ifi, UiO for supervising me from 2004, and professor Oddvar Hallingstad, UniK (University Graduate Center at Kjeller), for his fruitful comments.

The 1st of February 2003 (when my financing period for the doctoral work ran out), I started working as a full time scientist at FFI. From April 2005 until the end of December 2005, I had maternity leave.

I would like to thank UniK and FFI for giving 6 months of financing from January 2006, making me able to use daytime work to finish the thesis. Due to the waiting for the last paper to be published, the thesis was not printed until december 2007.

I am also grateful to Dr. Mohan Krishna Ranganathan at Sasken Communication Technologies Limited, Bangalore, India and Liam Kilmartin at National University of Ireland, Galway for being so kind to let me use their measurement traces to test my algorithms.

I would also like to thank the committee for reading this thesis and learning about my work.

Lastly I would like to thank my daughters Helene and Alida for bringing joy to my life and my husband Kenneth for his neverending love and support.

Brita Helene Hafskjold Gade

vii

Table of Contents

1 Introduction ............................................................................................. 11.1 The general Media transfer system................................................................2

1.1.1 The media.................................................................................................. 21.1.2 The sender ................................................................................................. 31.1.3 The receiver............................................................................................... 31.1.4 User requirements ..................................................................................... 41.1.5 The transport segment ............................................................................... 4

1.2 User perception of timing-related quality .....................................................51.2.1 Fulfilling timing-related user requirements .............................................. 61.2.2 Using a playoutbuffer to remove jitter ...................................................... 81.2.3 The amount of media in the playoutbuffer................................................ 8

1.3 Existing playoutbuffer control algorithms ....................................................91.3.1 Fixed playout delay................................................................................. 101.3.2 Adaptive playout delay ........................................................................... 111.3.3 Adapt the playout speed.......................................................................... 16

1.4 Optimal control of playout speed .................................................................201.5 Overview over the thesis................................................................................21

2 Motivation .............................................................................................. 252.1 Why is playoutbuffer control important in the future? .............................252.2 Present playoutbuffer control status............................................................262.3 Aim of the thesis: Develop the theory of playoutbuffer control ................27

2.3.1 Develop a theoretical foundation ............................................................ 272.3.2 Optimal solution...................................................................................... 282.3.3 For all future networks and protocols ..................................................... 282.3.4 For all QoS requirements ........................................................................ 28

2.4 Contributions of the thesis ............................................................................292.4.1 Claim: all user requirements can be expressed in requirements A-C ..... 30

2.5 Limitations of the thesis ................................................................................312.6 Summary.........................................................................................................32

3 Mathematical modelling........................................................................ 333.1 Definitions.......................................................................................................343.2 Assumptions ...................................................................................................353.3 Continuous interpretation of the system .....................................................36

3.3.1 The playoutbuffer contains complete media-units.................................. 373.3.2 The virtual buffer .................................................................................... 383.3.3 The player ............................................................................................... 39

3.4 Notation...........................................................................................................413.5 Mathematical relations..................................................................................44

3.5.1 Mathematical relations between the system quantities ........................... 453.5.2 State space modelling of the system ....................................................... 493.5.3 Timesteps and timestamps ...................................................................... 523.5.4 Transport segment rate measurement and its measurement noise .......... 54

viii

3.5.5 Prediction of MVB.................................................................................. 583.5.6 Discretization of the continuous state space model ................................ 65

3.6 Observability and controllability .................................................................673.6.1 Observability........................................................................................... 683.6.2 Controllability ......................................................................................... 683.6.3 Summary of observability and controllability ........................................ 69

3.7 Summary.........................................................................................................69

4 Optimal control...................................................................................... 714.1 The non-linearity of our system....................................................................72

4.1.1 Running dry............................................................................................. 724.1.2 Delay spikes ............................................................................................ 73

4.2 Introduction to optimal control theory........................................................754.3 Optimization criteria .....................................................................................76

4.3.1 Requirement A: Minimize the total latency of each media-unit ............. 764.3.2 Requirement B: Keep the player speed close to the correct media speed 784.3.3 Requirement C: Minimize the rate of change of playout speed.............. 78

4.4 Different phases of optimal control of playout speed .................................784.4.1 Start-up/initialization procedure for the optimal controller .................... 794.4.2 Stopping procedure ................................................................................. 794.4.3 Timeperiod for optimal control............................................................... 80

4.5 Finding the set point vector and the Riccati weight matrixes....................804.6 Finding the optimal controller......................................................................814.7 Summary.........................................................................................................89

5 Transport segment state vector estimation ......................................... 915.1 Introduction to Kalman filter theory ...........................................................91

5.1.1 Non-mathematical description of the Kalman filter ............................... 925.1.2 Mathematical description of the Kalman filter ....................................... 92

5.2 Main equations of the transport segment state vector estimator ..............945.3 Timing .............................................................................................................96

5.3.1 How to deal with late measurements ...................................................... 965.3.2 The KF prediction used as an optimal predictor ................................... 100

5.4 Finding the transport segment model ........................................................101

6 Anti-run-dry algorithm....................................................................... 1036.1 Finding the anti-run-dry algorithm ...........................................................1076.2 Finding the estimated receiver buffer level ..............................................1136.3 Finding the estimated virtual buffer level ................................................1246.4 Finding the variance of the buffer estimation error.................................1256.5 Finding the minimum value of MRCV,d...................................................1296.6 Optimal predictor for the anti-run-dry mechanism.................................1316.7 Total anti-run-dry algorithm......................................................................131

6.7.1 The length of the prediction period....................................................... 1316.7.2 Anti-run-dry mechanism in a loop ........................................................ 133

6.8 Summary.......................................................................................................135

7 Implementation in Matlab .................................................................. 1377.1 Notation and syntax .....................................................................................137

7.1.1 Matlab code sample notation ................................................................ 138

ix

7.1.2 Figure syntax......................................................................................... 1397.2 Matlab system overview ..............................................................................139

7.2.1 Transport segment simulation in Matlab .............................................. 1407.2.2 The path of a media unit ....................................................................... 1427.2.3 The management tool............................................................................ 1437.2.4 Transport segment................................................................................. 1447.2.5 Playoutbuffer......................................................................................... 1457.2.6 Player .................................................................................................... 1457.2.7 Rate measurement ................................................................................. 1467.2.8 Controller algorithm.............................................................................. 1477.2.9 Send-out-time calculation ..................................................................... 147

7.3 Controller algorithms deduced in this thesis.............................................1487.3.1 Controller algorithm for optimal controller .......................................... 1487.3.2 Controller algorithm for optimal control with anti-run-dry algorithm . 1507.3.3 Transport segment state estimation....................................................... 1527.3.4 Virtual buffer estimation....................................................................... 1557.3.5 Transport segment state prediction ....................................................... 1577.3.6 Anti-run-dry algorithm.......................................................................... 1577.3.7 Optimal controller ................................................................................. 160

8 Results and discussion ......................................................................... 1638.1 Algorithms used ...........................................................................................165

8.1.1 Algorithm 1 ........................................................................................... 1658.1.2 Algorithm 2 ........................................................................................... 1658.1.3 Algorithm 3 ........................................................................................... 1668.1.4 Algorithm 4 ........................................................................................... 166

8.2 Simulated transport segment......................................................................1668.3 Parameters used in this chapter .................................................................167

8.3.1 Use of weight factors ............................................................................ 1688.3.2 Use of graphs ........................................................................................ 169

8.4 Results when using existing playout algorithms .......................................1708.4.1 Algorithm 1: Adaptive playout delay.................................................... 1708.4.2 Algorithm 2 ........................................................................................... 172

8.5 Results when using the optimal controller ................................................1738.5.1 Fulfilling requirement A only ............................................................... 1748.5.2 Fulfilling requirement B only ............................................................... 1778.5.3 Fulfilling requirement C only ............................................................... 1798.5.4 Trying to fulfil requirements B and C................................................... 1818.5.5 Trying to fulfil requirements A and C .................................................. 1838.5.6 Trying to fulfil requirements A and B .................................................. 1858.5.7 Trying to fulfil all three requirements................................................... 1888.5.8 Varying the desired receiver buffer level.............................................. 190

8.6 Optimal control with erroneous transport segment model......................1918.6.1 The model ............................................................................................. 1928.6.2 Parameters and results........................................................................... 1928.6.3 Discussion ............................................................................................. 193

8.7 Anti-run-dry algorithm ...............................................................................1938.7.1 Parameters and results........................................................................... 1948.7.2 Discussion ............................................................................................. 200

8.8 Results on real TCP transport segment trace ...........................................201

x

8.8.1 Real TCP transport segment trace......................................................... 2018.8.2 Algorithm 1 ........................................................................................... 2038.8.3 Algorithm 2 ........................................................................................... 2048.8.4 Algorithm 3: Optimal control of playout speed .................................... 2068.8.5 Algorithm 4: optimal control with anti-run-dry.................................... 210

8.9 Quality metrics.............................................................................................2118.9.1 MOS and DMOS................................................................................... 2128.9.2 PESQ..................................................................................................... 2148.9.3 Packet loss and run-dry incidents.......................................................... 2158.9.4 Arentz dissimilarity measure ................................................................ 215

8.10 Results on real UDP transport segment traces .......................................2168.10.1 Trace 1: NUIG-DCU trace with 20 ms packetization interval............ 2188.10.2 Trace 2: NUIG-UNSW trace with 20 ms packetization interval ........ 2218.10.3 Trace 3: NUIG-DCU trace with 40 ms packetization interval............ 2238.10.4 Trace 4: NUIG-UNSW trace with 40 ms packetization interval ........ 225

8.11 DMOS, PESQ and Arentz tests ................................................................2288.11.1 DMOS and PESQ results .................................................................... 2288.11.2 Results for Arentz dissimilarity measure ............................................ 231

8.12 Execution time............................................................................................2348.13 Comparison with newer algorithms.........................................................234

9 Conclusion ............................................................................................ 2379.1 Summary of the thesis .................................................................................2379.2 Assessment of claims....................................................................................238

9.2.1 Claim 1: Development of a stringent notation ...................................... 2389.2.2 Claim 2: Mathematical modelling......................................................... 2389.2.3 Claim 3: The optimal controller............................................................ 2399.2.4 Claim 4: The anti-run-dry algorithm..................................................... 2399.2.5 Claim 5: The receiver system environment simulator .......................... 240

9.3 Future work and open problems ................................................................2409.3.1 Future work ........................................................................................... 2409.3.2 Open problems ...................................................................................... 241

10 References........................................................................................... 243

Appendix A Notation............................................................................. 249A.1 Definition of terms .....................................................................................249A.2 Time and timesteps ....................................................................................249A.3 General notation rules ...............................................................................249A.4 Specific notation symbols ..........................................................................250A.5 Variable names for Matlab code samples ................................................258

Appendix B Acronyms ......................................................................... 263

Appendix C System variance................................................................ 265C.1 System variance of 2nd order Markov model .........................................265C.2 System variance of an oscillator model ....................................................268

Appendix D Deductions......................................................................... 273D.1 Finding .......................................................................................................273

xi

D.2 Why the distribution curve cannot be integrated ...................................275D.3 Equations for the fraction operator .........................................................276D.4 Calculating term 2 of Equation (6.70) ......................................................277D.5 The square of the sum of three general scalars .......................................280

Appendix E Paper presented at CCCT-03 .......................................... 283

Appendix F Paper presented at ISICT-03........................................... 291

Appendix G Paper accepted by IET Communications ...................... 300

1

1 Introduction

When a stream of media is sent through a network such as the Internet, the packets in the stream will be individually delayed. Therefore, a reception buffer at the receiver machine is necessary to protect against playout interruptions due to variations in the data arrival rate. While the amount of protection offered grows with the size of the client’s buffer, so does the extra delay that is introduced. Many different algorithms have been used to control the use of the playout-buffer. The newest of these control the playout speed of the media to be able to find a better compromise between playout interruptions and added delay.

We have identified three deviations from the perfect playout: 1) Buffering delay 2) A playout rate different from the sender rate and 3) A change of playout rate. By using a thorough math-ematical approach, we aim at finding the statistically optimal control of the playoutspeed, that minimizes the three deviations from the perfect playout, based on their relative importance. The two main steps towards the statistically optimal controller is the development of a strict notation and strict mathematical models, that are independent of the network and protocols, and general enough to fit any kind of playoutbuffer algorithm.

Figure 1.1 is a guide for reading Chapter 1, and most chapters of this thesis contains a similar guide. In these guides, each section is represented by a rounded rectangle with the following background colour code:• Sections with white background either contains much details and/or mathematical equa-

tions, or are not essential to understand the chapter. These sections are meant for readers

who read the entire thesis, or who have a minimum of mathematical background.

• Sections with yellow background are either important for understanding the chapter, or con-

tains part of the essence of the chapter. These sections should be read by all readers.

• Sections with light yellow background stripes are for readers who do not want to read the

entire thesis, but still want to learn more than the essence of each chapter.

The guides also use arrows, where an arrow from section A to B means that A contains infor-mation that should be read before B.

As shown in Figure 1.1, Section 1.1 is essential and should be read before any of the next sections. It gives a description of a general media transfer system. Section 1.2 (p. 5) describes the user perception of timing-related quality and introduces three user requirements used in the rest of the thesis. Section 1.3 (p. 9), which is not essential for understanding the thesis, describes traditional and new solutions in the field of playoutbuffer control, and Section 1.4 gives a short

Chapter 1 Introduction

2

description of the new solution presented in this thesis. Section 1.5 (p. 21) is essential, since it gives an overview over the thesis, including a guide for reading the thesis.

1.1 The general Media transfer system

In a media transfer system, a sender application is sending media to a receiver system. The media, the sender and the receiver are described in Sections 1.1.1, 1.1.2 and 1.1.3, respectively. The media usually goes through a transport segment consisting of a number of networks and protocols on its way from the sender to the receiver, as described in Section 1.1.5 (p. 4).

1.1.1 The media

The media could be any kind of media that is perceived as being continuous for the user. Inside the network and computers, the media is coded in a digial format (for example by being a continuous stream of discrete samples that are equidistant in time). Examples are:• Sound (voice, music, movie sound, distributed orchestra, sounds related to games, etc.)

• Video

• Other kinds of continuous media like control signals and sensor signals for soft real-time

distant interaction. Examples are control signals for robotic movement, sensor signals from

different kinds of joysticks and virtual reality effects.

Figure 1.1:Guide for reading Chapter 1

1.1 The general media transfer system

1.2 User perception of timing related quality

1.3 Existing playoutbuffer control algroithms

1.4 Optimal control of playoutspeed

1.5 Overview over the thesis











1.1 The general Media transfer system

3

• Signals to be used in multiplayer games over the internet (e.g. showing the mouse position

of one player on the monitor of another player, when the player machines are connected

through a network).

The media application could be any of the two types:• Interactive media application, such as IP telephony and video phones, where the media is

sent both ways, and both communicating machines work as both sender and receiver. For

most interactive media applications, the total delay of the media should be kept under cer-

tain limits to avoid reducing the users’ perception of quality.

• Streaming media application, such as video-on-demand and radio or TV over the internet,

where all the media is sent from one sender to one or more receivers. For most streaming

media applications, the total delay can be quite long (compared to the requirements for

interactive media applications) before the users’ quality perception is affected.

This thesis will focus on the receiver application, and show the development of an algorithm that will work equally well for streaming and interactive media.

This thesis uses the term correct media speed to denote the constant speed of the media when played at the correct speed. This is defined for all types of continuous media, e.g. as a constant number of samples per second for sound and as a constant number of pictures per second for video. Note that the constant correct media speed does not mean that the rate of bits per second has to be constant.

1.1.2 The sender

The sender aims at sending the media at the constant correct media speed. The bit rate and the packet rate out of the sender application is dependent upon e.g. compression and media coding, and thus does not have to be constant.

1.1.3 The receiver

The receiver application receives the media and plays it out through a player. The type of player is dependent upon the media type, and could include for instance a loudspeaker (for sound), a screen (for video) or a moving robot (for control signals).


4

1.1.4 User requirements

The users’ perception of the playout quality of the media at the receiver depends upon many different quality aspects, depending on the specific type of media. As will be explained in Section 2.5, however, this thesis focuses on the quality aspects relating to timing, such as delay and playout speed (see Section 1.2 (p. 5) for more details).

1.1.5 The transport segment

In all distributed systems, the sender and the receiver are separated by a transport segment, consisting of all protocols and networks that the media goes through on its way from the sender application to the receiver application. This transport segment adds delay to the media. Most combinations of networks and protocols also add delay jitter (in addition to delay) to the media stream. This means that each unit (such as a picture for video and a sample for sound) of the media or each packet gets its own individual delay between the sender and the receiver, as illus-trated in Figure 1.2.

If the sender application sends a stream with a constant packet rate, the receiver application at the other side of the transport segment will receive a stream where the packet rate, due to the introduced jitter, is not constant. This is illustrated in the upper part of Figure 1.3, where the packets are drawn more narrow than in Figure 1.2, to make it easier to illustrate the jitter.

Figure 1.2:Each packet gets an individual delay between the sender and the receiver

delay A

Time

A B C D E

A BC

D

E

delay B delay C

delay D delay E

F

delay F

Sender

Receiver F

Media when played at correct media speed:

Media after going through the transport segment:

1.2 User perception of timing-related quality

5

In additions to the situations illustrated in Figures 1.2 and 1.3, there are two main situations of packet arrivals at the receiver:• Two (or more) packets can arrive in the wrong order. Different receiver algorithms treat this

problem in different ways. If the packet with the largest delay is still not too late, the

receiver can interpret the situation as if both packets had arrived at the time of arrival of the

latest packet. Some algorithms drop packets that arrive later than a certain deadline, and

continue the playout of the next packet in the sequence.

• Packets can be lost in the transport segment. As described in Section 2.5 (p. 31), the area of

packet loss is a research area that is not covered in this thesis.


If the receiver application, in the situation illustrated in the upper part of Figure 1.3, would play out the media contained in each packet immediately after reception, the user of the receiver application would experience a low quality of the media, due to the jitter. Thus, the timing of the media playout is important for the users’ perception of quality. This thesis will focus on the quality-of-service aspects related to timing and playout speed of the media. For the user to perceive a high level of playout quality, several requirements must be fulfilled:

A. The total delay (between the sender application and the playout part of the receiver applica-

tion) for each part of the media should not be too large.

B. The playout speed should be as close as possible to the correct media speed

C. The playout speed should be changed as slowly as possible (i.e. the playout speed should be

as smooth as possible).

We claim that all user noticeable quality requirements related to timing and playout speed can be described by the three requirements A, B and C. The reasons for this is described in Section 2.4.1 (p. 30).

Requirements A, B and C are indeed conflicting, and therefore, they usually cannot be fully satisfied at the same time. One of the aims of this thesis is to give the user a best possible combi-nation of fulfilment of these requirements.

The individual importance of each of the requirements is very much dependent upon the type of media. A few examples are listed below.


6

• For interactive media, requirement A is very important. If the total delay is too long, the

communication between the users can be disturbed. An example of this is when, after a

short pause, one user starts communicating (e.g. talking), and shortly after, the other user

starts communicating, believing that the first is only listening (since the media from the first

user has a large delay). A delay of less than 150 ms will give a high quality voice communi-

cation, but if the delay exceeds 400 ms, many users will be unsatisfied [14]. For voice com-

munication, requirements B and C have lower importance (since, as explained in Section

1.3.3 (p. 16), specific algorithms can be used to keep the pitch unchanged while changing

the playout speed).

• For streaming of video, requirement A would be of low importance, since most users would

accept a delay of several seconds before the video starts playing. Requirement B would be

more important, since a large deviation in playout speed could change the meaning of a

video scene.

• For streaming of music, requirement A would in many situations be of more importance

than for the above example, since a user that only want to listen to one song, or a part of one

song, would like to hear the music shortly after she/he sends the request for it. For many

types of pop music, requirement B may not be very important, as long as the pitch is kept

unchanged. For other types of music, like classical music, requirement B may be more

important. Requirement C would probably be the most important of the three requirements,

since a change in pitch is easier to detect for the human brain than the absolute pitch [51]. In

1975, Voss and Clarke [50] found that that there is an inverse relationship between the size

of a pitch change and the likelihood of a change, which they call the 1/f rule. Melodies that

change more abruptly than the 1/f rule would sound unpleasantly erratic. For voice, we have

found that a very high frequent change of playout speed makes the voice less pleasant to lis-

ten to, but does not require any extra effort to understand the meanings of sentences spoken.

We also found that, for voice, the loss of several consecutive packets may lead to loss of infor-mation, and thus to a loss of meaning of the sentences spoken.

1.2.1 Fulfilling timing-related user requirements

In an ideal situation, the receiver would receive the media immediately after it was sent from the sender. This would give a situation with no delay and with a playout speed that was constant


7

and equal to the correct media speed. Hence all the three requirements (A, B and C) would be fulfilled.

For most implementations however, the sender and the receiver are separated by a transport segment that introduce delay to the media stream. If this delay had been constant, the best solu-tion would be for the receiver to play out the media immediately after reception. This would give a delay equal to the delay from the transport segment for requirement A, but requirements B and C would be completely fulfilled.

In most systems, the transport segment add delay jitter to the media stream. As opposed to the delay from the transport segment, which is impossible to subsequently remove from the media stream, the jitter can be subsequently reduced or removed from the media by adding delay. A user would often experience an improved quality perception, even with an increase in delay, if the jitter is reduced or removed. Reducing jitter by adding delay is done by the use of a receiver FIFO buffer, called playoutbuffer, as shown in Figure 1.3.

Adding delay to reduce or remove jitter is an example of the conflicting nature of requirements A, B and C. The added delay gives a lower quality for requirement A, but a better quality for requirements B and C.

Figure 1.3:A playoutbuffer is used to smooth the jitter

Transport segment

(Networksand

protocols)

Sender Receiver

Receiver

Player Playoutbuffer


8

1.2.2 Using a playoutbuffer to remove jitter

In the rest of this thesis, the notion run dry is used to describe the situation where the playout-buffer has become empty because playout speed has been faster than the input speed from the transport segment. When the playoutbuffer gets empty, the playout speed falls to zero, since there is no media to play out. Requirements B and C are not fulfilled when the playoutbuffer runs dry.

There are several ways in which a playoutbuffer could be used to remove jitter by adding delay. Two examples are given below:1. The most intuitive and simple way would be to decide in advance which delay (q ms) to

accept, and to give each part of the media stream this delay. One solution could be to choose

q to be equal to the maximum expected delay from the transport segment or to the maxi-

mum acceptable delay for the users. When the first part of the media stream arrives, the

receiver waits until it is q ms since it was sent from the sender, and then start playing at the

constant correct media speed. As long as the playoutbuffer does not run dry, requirements B

and C are fully met with this easy solution. One drawback is that the delay would have to be

chosen as a quite large value to guarantee that the buffer does not run dry. Thus, require-

ment A would be poorly met. This solution was chosen by the first Voice-over-IP applica-

tions (see Section 1.3.1).

2. The above method assumed constant playout speed. Another possibility is to vary the play-

out speed, and thus give different parts of the media stream their own total delay. Thus the

total delay can be adapted much better to the delay of the transport segment. If the playout

speed is allowed to vary, there will be several ways to solve the conflicting nature of

requirements A, B and C. The ability to change the playout speed makes it easier to satisfy

requirement A, but since the playout speed is changing, user requirements B and C will not

be fully met. Different versions of this solution are chosen by more recent Voice-over-IP

applications and Voice-over-IP research (see Sections 1.3.2 and 1.3.3).

1.2.3 The amount of media in the playoutbuffer

For the playoutbuffer to be able to smooth the jitter, its amount of media, measured by the time it takes to play the media at the correct media speed, should be varying, and corresponding to the varying transport segment delay. Thus the amount of media in the playoutbuffer is not fully controllable. However, by using a playoutbuffer control algorithm, it is still possible to control

1.3 Existing playoutbuffer control algorithms

9

the amount of media in the playoutbuffer to a certain degree, as described in Section 1.3 (p. 9), where the main groups of existing playoutbuffer control algorithms are covered.

The amount of media to keep in the playoutbuffer must be chosen carefully, since there are advantages and disadvantages for both large buffer sizes and small buffer sizes:

• If the amount of media in the playoutbuffer is very small, very little additional delay is

introduced, and hence requirement A is almost fulfilled. The drawback of a small playout-

buffer is the high probability that the buffer will run dry. When the buffer runs dry, the play-

out speed drops to zero, and does not switch back to the correct speed until the next packet

arrives from the transport segment. Hence requirements B and C are not fulfilled.

• If the size of the playoutbuffer is very large, the buffer has a very low risk of running dry,

and hence almost all the jitter is removed, giving a playout speed that is close to or equal to

the correct media speed. Thus, requirements B and C are almost fulfilled. The drawback is

the large delay that is introduced, giving a low quality for requirement A.


This section will discuss the most important existing algorithms that can be used to partially control the amount of media in the playoutbuffer and at the same time to control the amount of jitter experienced by the user.

The traditional playoutbuffer solutions does not voluntarily change the playout speed. The media in each packet is played out at the correct media speed as long as the playoutbuffer does not run dry. Thus, the traditional playoutbuffer control algorithms can only control the average size of the playoutbuffer by controlling the buffering delay of the first packet of the media stream (or of the talkspurt - explained in Section 1.3.2). If the transport segment delay had been constant, the amount of media in the playoutbuffer would be directly dependent upon the waiting time of the first packet.

The two traditional ways to control playoutbuffers are called fixed playout delay, explained in Section 1.3.1 and adaptive playout delay, explained in Section 1.3.3 (p. 16).

Section 1.3.3 explains a newer way of controlling playoutbuffers by controlling the playout speed. The work in this thesis also controls the playout speed, and as explained in Section 1.4 this thesis deduces the optimal control of playout speed.


10

1.3.1 Fixed playout delay

For fixed playout delay, all the packets have the same total delay. This corresponds to method number 1 in Section 1.2.2 (p. 8). The receiver plays out each packet exactly q ms after it was sent from the sender. Any packet that arrives at the receiver more than q ms after it was sent from the sender, is considered lost. The parameter q can usually be set by the user or application. The performance of the fixed playout delay algorithm is illustrated in Figure 1.4.

The advantage of fixed playout delay is that the algorithm is easy to understand, and if you have a mechanism to find the time difference between the clocks of the sender machine and the receiver machine, it is also easy to implement.

The disadvantage of the algorithm is the constant total delay. Thus, choosing q as a too large value compared to the transport segment delay will give disadvantages of a too large playout-buffer size (see Section 1.2.3 (p. 8)). On the other hand, choosing q as a too small value compared to the transport segment delay, gives the disadvantages of a too small playoutbuffer size (see Section 1.2.3 (p. 8)). Since all packets that arrive at the receiver after their deadline are considered lost, a lot of packets will be lost if q is a small value compared to the average trans-port segment delay. This is illustrated in Figure 1.4, where all packets in the interval where the magenta graph (the transport segment delay) is above the blue graph (the total delay) are consid-ered lost.

It might be difficult to choose a good value for q since the transport segment delay will often vary a lot, both due to time-of-day (where there is increased network traffic at some periods of the day), and due to other influences of the network and the network protocols.

Many Voice-over-IP applications (e.g. the Cisco IP phones [9]) let the user choose between two algorithms: the fixed playout delay algorithm explained above and the adaptive playout delay algorithm explained below.

Figure 1.4:Illustration of the performance of the fixed playout delay algorithm

Packet number

Delay Total delay Transport segment delay


11

1.3.2 Adaptive playout delay

Adaptive playout delay is an algorithm that is specialized for human speech. The adaptive playout delay algorithm divides human speech into talkspurts. When people speak, they usually say a few words (e.g. a sentence or a part of a sentence), then they make a short pause before saying some more words. A talkspurt is such a collection of words, with a period of silence before and after. The small pauses between the talkspurts is either shortened or elongated, without annoying the users. The adaptive playout delay algorithm can therefore not be used for continuous media without pauses. If a pause is elongated, the following talkspurt will have an increased total delay, and if a pause is shortened, the following talkspurt will have a decreased total delay. This way, each talkspurt j gets its own total delay qj, so that each packet in talkspurt

number j is played out exactly qj ms after it was sent from the sender.

A lot of research has been done on between-talkspurt-adjustment to find a good value of qj, and

some of this work is summarized in Sections 1.3.2.1 to 1.3.2.6.

The performance of the adaptive playout delay algorithm is illustrated in Figure 1.5.

The advantage of the adaptive playout delay algorithm is that it handles changes of the transport segment delay better than the fixed playout delay algorithm, since each talkspurt gets its own total delay. The total delay is therefore adapted to the transport segment delay.

The main disadvantage of the adaptive playout delay algorithm is that it can only be used for voice or for media with frequent pauses. Another drawback is that background music or other background sound can make it impossible to divide the sound stream into talkspurts, and thus give very long talkspurts. For such long talkspurts, the adaptive playout delay algorithm has the same disadvantages as the fixed playout delay algorithm.

Figure 1.5:Illustration of the performance of the adaptive playout delay algorithm

Packet number



12

1.3.2.1 Ramjee et al.

In 1994, Ramjee et al. [41] suggested four different algorithms for the calculation of qj. All four

algorithms calculate qj as a function of the average delay and the average deviation of the delay

from the sender to the receiver, as shown in Equation (1.1):

(1.1)

where K is a positive constant (chosen to be 4 for all four algorithms), dj is the average delay,

and vj is the average deviation of the delay.

The parameters dj and vj are calculated in different ways for the four algorithms:

• Algorithm 1 uses:

, (1.2)

where i is the packet number and ni is the measured network delay, calculated by:

, (1.3)

and:

(1.4)

where n1 is the measured network delay for the first packet in the talkspurt. The parameter

was chosen to be 0.998002.

• Algorithm 2 uses Equation (1.2), where had two different values: if , was

chosen to be 0.75, else it was chosen to be 0.998002 as in algorithm 1. It also uses Equation

(1.4).

• Algorithm 3 uses Equation (1.4) and:

(1.5)

where Si is the set of all packets received during the talkspurt prior to the one initiated by i.

• Algorithm 4 is described by the following algorithm:

if mode == NORMAL {if (abs(ni - ni-1) > abs (vi) * 2 + 800) {

var = 0; /* Detected beginning of spike */

qj dj Kvj+=

di αdi 1– 1 α–( )ni+=

ni treceived i, tsent i,–=

vi αvi 1– 1 α–( ) di n1–+=

α

α ni di>( ) α

di mini Si∈ ni{ }=


13

mode = IMPULSE }}else {

var = var /2 + abs((2ni - ni-1 - ni-2)/8)

if (var <= 63) {

mode = NORMAL /* End of spike */ni-2 = ni-1ni-1 = nireturn }}

if mode == NORMALdi = 0.125 * ni + 0.875 *di-1

else di = di-1 + ni - ni-1

vi = 0.125 * abs(ni - di) + 0.875 *vi-1

ni-2 = ni-1

ni-1 = ni

return

1.3.2.2 DeLeon and Sreenan

In 1999, DeLeon and Sreenan [10] suggested an algorithm that performs the following for each packet:

1. Compute the network delay prediction given N previous network delays ni-1,...,ni-N using

the NLMS (normalized least mean square) algorithm described below.

2. Calculate the variation by using Equation (1.4).

3. Calculate qj as in Equation (1.1), but use the network delay prediction from step 1

instead of dj.

The NLMS algorithm is given by:

(1.6)

where hi is an N x 1 vector of adaptive filter coefficients, is the step size, ni is an N x 1 vector

containing the most recent N network delays and ei is the estimation error given by:

nj

nj

hi 1+ hiμ

niTni a+

---------------------niei+=

μ


14

, (1.7)

where is the network delay estimate for packet i.

Equation (1.6) is used in the adaptive one-step predictor shown in figure 1.6.

1.3.2.3 Pinto and Christensen

In 1999, Pinto and Christensen [39] suggested an algorithm with two modes, NORMAL and SPIKE. During NORMAL mode, the following equations are used:

(1.8)

where is the time the first packet of talkspurt j was sent from the sender, is the mean

network delay observed during NORMAL operation and is given by:

(1.9)

where Optgapj-1 is the optimal gap, which is the authors name for receiver buffer delay, i.e. the

time between the reception timestamp and the playout time of packet i. Given a set of ordered network delays for talkspurt j-1, the optimal gap is given by:

(1.10)

where toler is the tolerance to packet loss specified in the range [0,1], is the

total number of packets that should be played in the talkspurt to maintain the desired packet loss rate, Gap(k) is the receiver buffer delay of packet number k in the set and freqi is the number of

packets experiencing the same receiver buffer delay.

Figure 1.6:Adaptive one-step predictor

ei hiTni ni–=

ni

qj tsent j, ni ρ+ +=

tsent j, ni

ρ

ρ ρ Optgapj 1–+=

Optgap Gap freqi

i 0=

1 toler–( )ni 1–

��

=

1 toler–( )ni 1–

Delay hi +_ni

ni-1 nj ei


15

During SPIKE mode, the SDA from algorithm 4 in Section 1.3.2.1 takes over and drives the playout of packets for the duration of the spike.

1.3.2.4 Atzori and Lobina

In 2004, Atzori and Lobina [4] calculated qj by the equation:

, (1.11)

where dnet is the network delay and dbuffer is the receiver buffer delay given by:

, (1.12)

where g is the packet interarrival time and bopt is the optimal buffer dimension given by maxi-

mizing an R factor given by:

(1.13)

where dcod is the delay used for speech processing, b is the buffer dimension, H(x) is the step

function (H(x) = 0 is x < 0, else H(x) = 1), enet is the network packet loss ratio.

In 2005, Atzori and Lobina [5] gave a new value of the R factor, by exchanging the last term in Equation (1.13) by a much more complex calculation.

1.3.2.5 Jung and Atwood

In 2005, Jung and Atwood [17] suggested an algorithm that uses equations (1.1) to (1.4), but

lets the factor K (which in their paper is called ) be adaptive, and set by the following equation:

qj dnet dbuffer+=

dbuffer bopt g⋅=

R 93.2 0.024 dcod b g⋅ dnet+ +( )⋅( )–

0.11 dcod b g⋅ dnet 177.3–+ +( ) H dcod b g⋅ dnet 177.3–+ +( )⋅ ⋅( )– 11–

40 1 10 enet 1 enet–( )σl

2

g2 b 1–( )2-------------------------+

� ��

⋅+ln⋅� ��

–

=

β


16

(1.14)

where vj-1 is the calculated variance for the previous talkspurt, JLOW is the limit (in ms) for low

jitter, JHIGH is the limit for high jitter and KMAX>KMIN . Jung and Atwood use different values

of KMAX and KMIN in their simulations.

1.3.2.6 Narbutt and Murphy

In 2001, Narbutt and Murphy [31] compared 7 different algorithms, including the 4 algorithms in Section 1.3.2.1.

In 2003, Narbutt and Murphy [32] suggested a new algorithm using equations (1.1) to (1.4), but

making a dynamic parameter based on the estimates of the variance vi:

(1.15)

where the function was chosen experimentally to maximize the performance of the algo-

rithm over their set of network traces. In [33], [34] and [35] the algorithm was further simulated and evaluated.

1.3.3 Adapt the playout speed

To solve the problems related to long talkspurts, some newer work, described in sections 1.3.3.1 to 1.3.3.4, has been done on the topics of playout speed adjustment and within-talkspurt-adjust-ment.

For human voice, instead of finding a constant or ‘constant-per-talkspurt’ delay parameter, this new work adjusts the delay even inside of the talkspurts, by changing the playout speed. One clear advantage of this method is that it can be used for all kinds of media, since it does not divide the media stream into talkspurts. By adapting the playout speed, different parts of the media stream can have their own total delay. Thus the total delay can be adapted much better to the transport segment delay. The ability to change the playout speed makes it easier to satisfy

K

KMAX if 0 vj 1– JLOW≤ ≤,

KMIN if vj 1– JHIGH>,

KMIN KMAX–( )vj 1– KMAXJHIGH KMINJLOW–+JHIGH JLOW–

---------------------------------------------------------------------------------------------------------------------------- if JLOW vj 1–< JHIGH≤,

��

=

α

αi f vi( )=

f vi( )


17

user requirement A, but since the playout speed is changing, user demands B and C must also be taken into consideration.

For video, a change of playout speed can be obtained by showing each picture for a longer or shorter period. For sound, one of the best ways to change the playout speed of for instance voice and music, is the time-domain interpolation algorithm WSOLA [24], where the playout speed

can be changed without changing the pitch1 of the signal ([24] indicates good voice quality with a stretch or compression of 25% of the inter-packet-time). An illustration of the performance for a general playout speed adjusting algorithm is shown in Figure 1.7.

Chronology:The solution presented in this thesis was developed in 2002 and published in 2003 (by the papers in Appendix E and F). The main part of the thesis was also written in 2003. Part of 2004 and all of 2005 I had sick leave and maternity leave, and this is the reason why the thesis was not finished before 2006.

Researchers from Stanford University were the first to publish within-talkspurt-adjustment of playout speed. Their work is described in Section 1.3.3.1. We found no other published work on within-talkspurt-adjustment before the papers in this thesis were written. During the time period from the papers were written in 2003 and until today, two new groups have published work on adaptation of playout speed. Their work is described in Sections 1.3.3.2 and 1.3.3.4.

1. The definition of pitch in music is “the perception of the frequency of a note“. The pitch varies with the sound wave frequency. Thus, pitch means the lowness or highness of a sound.

Figure 1.7:A general algorithm for adaptation of playout speed- illustration of performance.

Packet number



18

1.3.3.1 Researchers from Stanford University

Liang et al. [24] were the first to publish within-talkspurt adjustment of playout speed. They improved the WSOLA algorithm to work on only one packet, so that no algorithm delay would be introduced during the packet scaling - playout process. At the time (2001 - 2002) when we were searching for algorithms to compare with the algorithms developed in this thesis, we found (in addition to [24]) only submitted (not yet accepted) papers on playout speed adaptation from the researchers at Stanford University. Two of these papers, from 2001, presented the following algorithm:

If the playoutbuffer level is above a target buffer level d, the time used to play each packet or each media-unit is set to:

(1.16)

where . If the playoutbuffer level is below the target buffer level d, the time used to play each packet is set to:

(1.17)

where . The suggested values to use for the constants s and f are and , i.e. a +/- 25% change of the time used to play each packet.

Both papers changed their playout algorithm before [25] was printed in december 2003 and [18] was printed in june 2004. Since [18] discusses non-interactive streaming of video, it is no longer relevant to this thesis. The new algorithm presented in [25] is:

Calculate the desired length of the packet

If desired length > expansion threshold

scale packet with target length min(desired length, max length)elseif desired length < -compression threshold

scale packet with target length max(desired length, min length)else

keep packet without modification

The desired length of the packet is set based on predictions of the network delay. The predic-tions are based on a sliding window of the last w packets received. A large w will give a more accurate estimation of the delay distribution, but will make the estimator adapt less responsively to the varying network delay. Thus, the choice of w determines how fast the algorithm is in

play_time f normal_play_time⋅=

f 1<

play_time s normal_play_time⋅=

s 1> s 1.25= f 0.75=


19

adapting to the network delay variation, and is a trade-off between accuracy and responsiveness. The algorithm lets the user set a parameter for the acceptable loss rate.

1.3.3.2 Laoutaris et al.

Laoutaris and Stavrakakis [21] describes an algorithm for packet video receivers, that use a Markov Decision Process to control the playout speed. Laoutaris et al. describes another version of the algorithm in [22]. The first paper uses cost assignment for lack of continuity and for latency. The second paper uses only the lack of continuity cost, and the algorithm is therefore no longer suited for interactivity. In the first paper, two different methods are used to control the latency. The first method, limiting the size of the buffer by discarding frames that arrive when the buffer occupancy is N frames, has as a drawback the degraded quality of a stream that lacks some of its packets. The second method, using a latency cost, is reported in [21] as resulting in a very large buffer size (with an expected buffer size of 75 frames), as opposed to the first method (with an expected buffer occupancy of 25.5 frames). The frame rate used was 30 frames/s. Since both methods have buffering delays above 0.8 seconds, none of them are suited for interactive voice communication.

1.3.3.3 Liu et al.

Liu et al. [26] presents an adaptation of algorithm 4 from Ramjee et al. [41] (presented in Section 1.3.2.1), to enable it to handle delay spikes that happen within a talkspurt. A threshold range is calculated for each packet, based on the content-based stretching bound. For time-scale modification, Liu et al uses SOLA (Synchronized overlap-and-add), and modifies it into a packet-based version. In the ‘normal’ mode, if a packet arrival time at the receiver is within the thershold range of qj, qj will be updated. Within the ‘spike’ mode, qj is delayed until it is high

enough to catch a spike (arrival time < qj). The playout length of spike packets are squeezed

after the delay spike. When the total end-to-end delay is below the value it had before the spike, they stop decreasing qj. Their late packet loss rate is between 0.02 and 0.25.

1.3.3.4 Ranganathan and Kilmartin

Ranganathan and Kilmartin [42] presents two different playoutbuffer adaptation solutions. The first solution is a neural network based intertalkspurt playout delay adaptation, which will not be discussed here. The second solution, which is an intratalkspurt playout delay adaptation, use a 6-layer fuzzy network delay trend analysis system (FTAS) to find the slowly varying trend in the network delay. The output of FTAS is used as an input to the decision process for the playout


20

speed, where two different solutions are used to compute , the amount by which the

playout delay is to be adapted for the nest packet:• If the network delay trend is decreasing, the scaled packet length is a linear function of the

network delay trend, by using the equation:

, (1.18)

where tm is the trend measure from FTAS and is the maximum delay change of the

playout process.

• If the network delay trend is increasing, the scaled packet length is a nonlinear function of

the network delay trend, by using the equation:

, (1.19)

where is a parameter used to control the responsiveness of the system for decreasing net-

work delays.

The user or application can choose the history size to be used by the fuzzy network and the

sensitivity parameter used to control the responsiveness of the system for decreasing network delays.

The resulting buffering delay had a tendency to display some instability, i.e., undergo uncon-trolled increases. Therefore, the authors made a mechanism to stabilize the buffering delay, by giving the receiver buffer delay as an extra input to layer 2 of FTAS.

This algorithm is compared to the optimal control algorithm developed in this thesis in Section 8.10 (p. 216).

1.4 Optimal control of playout speed

As with the work described above, we also use adaptation of playout speed, but aim at basing our work on a more thorough study of the receiver buffering. As far as we have seen, there is still no theoretical foundation in the field of playoutbuffer control. we have found no stringent mathematical notation or models. The main goal of the work described in this thesis is to develop a theoretical foundation for multimedia receiver buffer systems, which is transport

Δdecision

Δdecision tm Δmax×=

Δmax

ΔdecisionΔmax 1 λ tm×( )+( )log

1 λ+( )log-----------------------------------------------------------=

λ

λ


21

segment independent, and hence valid for all future transport segments. By developing a strin-gent mathematical notation to describe the system, we can also develop mathematical models that are transport segment independent. These models are used, together with optimal control theory, to develop a transport segment independent controller that gives the statistically optimal control of playout speed. The optimal controller is general with respect to the user requirements described in Section 1.2 (p. 5), and by letting the user or application set different weight factors, it can be adapted to all timing-related Quality-of-Service requirements.

The optimal control algorithm deduced in this thesis can be used for both interactive applica-tions (such as IP telephony and video conferences) and for streaming of media (e.g. video-on-demand).


Figure 1.8 is a guide for reading the thesis, which uses the same colour codes and arrows as explained for Figure 1.1. As shown in Figure 1.8, the rest of this thesis is organised as follows:

• Chapter 2 gives a motivation for the rest of the thesis (and is therefore colour coded as

essential), and presents the contributions and limitations of the thesis.

• Chapter 3 develops a stringent notation to describe the system, and uses this notation to

develop transport segment independent mathematical models for the system. Sections 3.1

(Definitions) and 3.4 (Notation) are essential for understanding the rest of the thesis.

• Chapter 4 deduces the transport segment independent optimal controller, using the mathe-

matical models from Chapter 3.

• Chapter 5 gives a brief introduction into Kalman filter theory, and develops a Kalman filter

for estimation and prediction of network behaviour.

• Even though the optimal controller deduced in Chapter 4 performs better than existing play-

outbuffer algorithms, it still can run dry. Therefore, Chapter 6 develops an anti-run-dry

algorithm to use together with the optimal control algorithm. This total algorithm will keep

the run-dry probability below a specified level.

• To be able to run and compare different playoutbuffer algorithms, we have developed a net-

work simulator and an implementation of the total system, in Matlab [29]. The total system

implementation can be run on both simulated and real transport segment data. This imple-

mentation is described in Chapter 7. Chapter 7 gives a detailed description of the implemen-

tation, to make it easy for readers to implement the algorithm(s) in their own systems.


22

• Chapter 8 presents an evaluation of the performance and flexibility of our approach. The

algorithms developed in this thesis was run with a range of different settings, and was also

compared with existing algorithms in both a subjective listening test (DMOS - Degradation

Mean Opinion Score) and two different objective voice quality evaluations (PESQ - Percep-

tual Evaluation of Speech Quality and Arentz dissimilarity measure).

• The concluding Chapter 9 gives a summary of the thesis, discusses the claims of our solu-

tion and describes open problems and areas for future research.

.

The appendixes of the thesis is organized as follows:• Appendix A gives an summary of all the notation used in the thesis. The notation is alpha-

betically sorted (by both the english and the greek alphabet) to make it easier to use. This

thesis combines different areas where the standard notation is overlapping. Thus, for some

areas, a non-standard notation has been chosen. It may be very helpful to actively use

appendix A, because the thesis uses a wide range of symbols that may be hard to remember.

• Appendix B gives a list of the acronyms used in the thesis.

Figure 1.8:Guide for reading the thesis

Chapter 3: Mathematical modelling 3.1 Definitions 3.4 Notation

Chapter 4: Deduce optimal controller

Chapter 5: Transport segment state vector estimation

Chapter 6: Anti-run-dry algorithm

Chapter 7: Implementation in Matlab

Chapter 8: Results and discussion

Chapter 9: Conclusion

Chapter 2: Motivation

Chapter 3: Mathematical modelling 3.1 Definitions 3.4 Notation

Chapter 4: Deduce optimal controller

Chapter 5: Transport segment state vector estimation

Chapter 6: Anti-run-dry algorithm

Chapter 7: Implementation in Matlab

Chapter 8: Results and discussion

Chapter 9: Conclusion

Chapter 2: Motivation


23

• Appendix C gives a deduction of system variance for different transport segment models.

This appendix will be most interesting for mathematically interested readers.

• Appendix D contains mathematical deductions that are too spacious to be placed in the

main part of the thesis. This is an appendix meant for mathematically interested readers

only.

• Appendix E is a paper presented at International Conference on Computer, Communication

and Control Technologies (CCCT '03), july/august, 2003. It describes some of the notation,

mathematical models and the optimal controller. This paper can be read as a short summary

of part of this thesis. Note however that this paper uses a slightly different notation than the

thesis.

• Appendix F is a paper presented at International Symposium on Information and Communi-

cation Technologies (ISICT03), september, 2003. This paper describes the development of

the anti-run-dry algorithm, and can be read as a summary of part of the thesis. This paper

also use a slightly different notation than the thesis.

• Appendix G is a paper printed in IET Communications. This paper describes different qual-

ity metrics, and describes the main results from a subjective listening test and from the com-

parison with other published algorithms by means of an objective technique for measuring

voice quality.

The reason that the first two papers use a different notation than the thesis, is that the notation have been refined after these papers were written.

The main work in this thesis is presented in the first two extensive papers, instead of in several lighter papers. This is because the different parts of the thesis is very interwoven, and thus diffi-cult to divide into more than two separate parts.

We believe that using mathematics and cybernetics to solve the playoutbuffer problems give solid solutions. Since this thesis is in the area of informatics, the drawback (for readers from the non-mathematical area of informatics) is that this thesis contains a large amount of equations. To help the reader understand the thesis, many figures and textual descriptions are used, and in situations where the use of equations is not essential (such as in Section 3.6 (p. 67)), a textual description and solution is used.


24

2.1 Why is playoutbuffer control important in the future?

25

2 Motivation

This chapter describes the motivation for the work of this thesis. Section 2.1 describes why play-outbuffer control is important also in future networks and with future protocols, Section 2.2 (p. 26) gives an overview of the present playoutbuffer control status and Section 2.3 (p. 27) describes the aim of the thesis. Sections 2.4 (p. 29) and 2.5 (p. 31) describe the contributions and the limitations of the thesis.

2.1 Why is playoutbuffer control important in the future?

During the recent years, there has been an increase in digital transfer of media streams and inter-active media streams over different network and protocol types, and there are several reasons why this increase is likely to continue in the future:

• IP telephony has become a common household tool, and has the potential of eliminating the need for line-switched telephony.

• According to [47], Nokia predicts that in the future, all speech communication will be over wireless broadband networks, since the current mobile network is much more expensive in use.

• Video cameras such as web cams (used for e.g. videophones) has also become a common tool in homes with relatively high bandwidth internet access.

• Many new mobile phones are equipped with a video camera. Up until now, these are used for storing or streaming the video content [36], but may be used for videophones in the near future.

• The use of multiplayer games where the players are situated all over the world, and con-nected through the internet is growing.

One of the problems for interactive media in today’s large networks (especially the Internet) is the delay jitter that is being introduced by the network itself and by the protocols.

We believe that even though most future networks will have increased bandwidth and future computers will have increased CPU power compared with today’s equipment, the jitter from the network and protocols will still be a problem that needs to be solved. One reason for this is that the amount of data that is transferred per second for a specific media type will increase along with the user’s demand for increased quality (e.g. increased video quality, that requires more data per picture). Another reason is the nature of the protocols used; with the TPC protocol for

Chapter 2 Motivation

26

example, the protocol itself gives different delays to different packets (this will happen regard-less of retransmission, even though retransmission leads to greater differences [30]).

We believe that the increase during the recent years in use of cordless transmission, such as mobile telephones and wireless networks, will continue. Cordless connections often have lower bandwidth and more packet loss and packet corruption than wired solutions. The networks and protocols that need to be used for cordless transmissions often lead to a large amount of jitter, and thus, there is a need to remove jitter also in the future for such connections.

Thus, there is a need to solve the problem of jitter added to the multimedia stream by networks and protocols also for future users. There is a need to give users of interactive multimedia a high level of quality of service, also for networks and protocols that add large amounts of jitter to the multimedia stream. Therefore, we claim that a solution to this jitter problem can aid future inter-active and streaming media transfer applications.

2.2 Present playoutbuffer control status

In 1997, Rosenberg [44] wrote about playoutbuffer algorithms: ”To date, these algorithms have all been ad-hoc. Attempts have been made to develop some kind of theory or bounds on the performance of such algorithms, but with limited success. We believe that there is room for additional theoretical and practical work in this area. In particular, we believe that adaptation of playoutbuffers need not occur only at the beginning of talkspurts.” and ”We also believe that application of traditional estimation techniques may prove fruitful in helping to design better algorithms”. This seems to describe the situation also in 2003.

The development that have been seen in the field of playoutbuffer control, also for playout speed adapting controllers, can roughly be described like this: • First, a method, either an ad-hoc method or a method found by trial and error, is developed,

implemented, and tested (and for some methods, used).• Some time later, a new method, or a modification to one of the existing methods, is devel-

oped without theoretical grounding, implemented and tested. The new method is usually proven to have a better performance than the previous methods, at least in specific situa-tions.

• In this manner, new methods, often ad hoc methods, that are not theoretically grounded, keep showing up.

2.3 Aim of the thesis: Develop the theory of playoutbuffer control

27

As far as we have found, no theoretical performance limit has been presented, and thus, the possible future performance improvement is unknown. Since the optimal performance is not known, there exists no common reference for evaluating the different existing methods.

As far as we know, there is still no theoretical foundation in the field of playoutbuffer control. We have found no theoretical studies and no presentations of stringent mathematical notation, models or solutions to the problem.

We believe that a theoretical foundation could bring the field of playoutbuffer control a large step forward. We also believe that a development of this theoretical foundation will gain the field of playoutbuffer control much more than another set of new non-theoretically-grounded methods.

2.3 Aim of the thesis: Develop the theory of playoutbuffer control

The work described in this thesis is not aimed at finding temporary solutions or ad-hoc methods, but at developing a theoretical foundation for the field of playoutbuffer control, as described in Section 2.3.1. This thesis aim at deducing the mathematically optimal solution to the playout-buffer control problem, as described in Section 2.3.2 (p. 28). The solution should be valid for all future networks and protocols, as described in Section 2.3.3 (p. 28), and be tunable to all application- and user requirements (i.e. all timing-related requirements, as explained in Section 2.5 (p. 31)), as described in Section 2.3.4 (p. 28).

2.3.1 Develop a theoretical foundation

The goal of the work described in this thesis is to develop a theoretical foundation for media receiver buffer systems. To find this theoretical foundation, we first need to formulate a math-ematically stringent description of the receiver buffer system. To do this, we first need to develop a stringent notation to describe the system. By using this notation, the next aim is to develop mathematical models of the system.

Since the development of the theoretical foundation and the optimal controller is a large and demanding task, we do not aim at doing a real-time implementation. However, we aim at devel-oping a simulator where the optimal controller can be tested with real-life measurements, which we believe would give close to identical results to a real-time implementation.


28

2.3.2 Optimal solution

By using the mathematical models mentioned in Section 2.3.1, we aim at deducing a controller that can find the theoretically best possible way to reduce the impact of jitter on the users’ perception of the timing-related QoS. This controller should, at any time, and for any media type, find the best playout speed to use to best satisfy requirements A, B and C from Section 1.2.

This optimal controller will hopefully eliminate the need for further ad-hoc methods in the field of playoutbuffer control. If desirable, it can also be used as a reference to evaluate different ad-hoc methods.

2.3.3 For all future networks and protocols

We aim at making both the theoretical foundation and the mathematical models of the system independent of the transport segment. This way, we can make the optimal controller inde-pendent of the transport segment, and hence valid for all present and future networks and proto-cols.

As opposed to methods without theoretical foundation, that are often outdated when a new and better non-theoretical or ad-hoc method is developed, we hope that the theoretical foundation developed in this thesis can be valuable for all current and future media-transfer applications in all current and future networks and protocols.

2.3.4 For all QoS requirements

We also aim at making the optimal controller general with respect to the user requirements described in Section 1.2 (p. 5). This way, it can be adapted to all timing-related Quality-Of-Service requirements.

We saw in Section 1.2 (p. 5) that these requirements are conflicting, and that the quality of one can usually be improved by reducing the quality of another of the requirements. The individual importance of the three requirements will vary from application to application, and also from user to user, thus we seek an algorithm where it is possible to weight these requirements according to the demands from the users or applications. Thus, with a correct weighting for the different requirements, the algorithm will also be the optimal algorithm for all possible users.

2.4 Contributions of the thesis

29

2.4 Contributions of the thesis

This thesis makes the following contributions to the field of playoutbuffer control:1. Development of a stringent notation for receiver buffer systems

2. Development of mathematical descriptions and transport segment independent state space

models of the receiver buffer systems and the transport segment by use of the stringent nota-

tion

3. Development of an optimal controller (by using 1 and 2 above) that:

a. Is transport segment independent

b. Is general, so that it can be tuned to any application or user requirements with respect

to playout speed and latency. It can also be tuned to act close to identical to many

existing controllers. The tuning is done by letting the user put his/her own individual

weights on the importance of:

• Keeping the buffer level close to a target level (requirement A)

• Keeping the playout speed close to the sender speed (requirement B)

• Keeping the playout speed smooth (i.e. keeping a low rate of change of playout

speed) (requirement C)

4. Development of an anti-run-dry algorithm for the optimal control of playout speed, by the

use of 1-3 above. (None of the published algorithms has any guarantee that the buffer will

not run dry, and neither will the optimal controller in 3, thus an anti-run-dry algorithm is

needed.) As an input to the anti-run-dry algorithm, the user can give a maximum limit for

the run dry probability of the buffer.

5. A receiver system environment simulator and a transport segment simulator will be imple-

mented in Matlab. Both existing algorithms and new algorithms from 3 and 4 above will be

implemented, tested and compared in this environment. We claim that the algorithms from 3

and 4:

• Performs better than (or, for specific situations, equal to) existing algorithms

• Are more general and adaptable to transport segment types and user requirements

regarding latency and playout speed than existing algorithms.

This thesis describes a theoretical study, thus the main contribution to the field of playoutbuffer control is the theoretical part of the thesis, including development of notation, mathematical models, the optimal controller and an anti-run-dry algorithm for the optimal controller. We believe that this is an important brick in the development of the field of playoutbuffer control.


30

The practical part of this work includes the development of a simulation system by using Matlab [29]. This simulation system is used to compare the performance of different playoutbuffer control algorithms, for both real and simulated transport segment measurements. In addition, WSOLA is implemented in MATLAB (based on a C++ implementation), and sound files are scaled according to the output of the different playoutbuffer control algorithms.

2.4.1 Claim: all user requirements can be expressed in requirements A-C

The only controllable variable is the playout speed of the media, which can be set to any positive value or to zero. This means that there is only one degree of freedom when it comes to control-ling the timing-related aspects of the receiver buffer system. By changing the playout speed over time, all degrees of its derivative and its integration can be changed.As stated in 3b in the contribution list, the optimal controller can be tuned to any user require-ments regarding buffer level, playout speed and rate of change of playout speed. We claim that any user noticeable quality-of-service requirements that can be satisfied by the change of playout speed are included in these three requirements. The reason for this is that the three requirements A-C are dependent upon the playout speed:

• The integration of the playout speed determines the amount of media in the playoutbuffer (if the transport segment delay had been constant, the media content of the playoutbuffer would have been the exact integration of the difference between the playout speed and the correct media speed), which determines the amount of extra latency experienced by the user,. Thus requirement A depends on the integration of the playout speed.

• Requirement B depends on the playout speed itself.• The derivative of the playout speed is the same as the rate of change of the playout speed.

Thus requirement C depends on the derivative of the playout speed.

We do not use more than one integration or more than one differentiation of the playout speed. The reason for this is:• The double integration of the difference between the playout speed and the media speed into

the playoutbuffer is the integration over time of the amount of media in the playoutbuffer. Since the user does not want to compensate for a long time of a too large playoutbuffer (which gives the user a too large delay) with a period of a too small playoutbuffer, to make the double integration of the difference between the playout speed and the media speed into the playoutbuffer smaller, this double integration is not important to the quality perception of the user. In the same way, further integrations of the playout speed is not interesting for the user.

2.5 Limitations of the thesis

31

• The double differentiation of the playout speed determines the rate of change of the playout acceleration. For sound, we believe that this is not noticeable for the user as long as the pitch remains unchanged. We also believe that this is not important for video.

2.5 Limitations of the thesis

The users’ perception of the playout quality of the media at the receiver depends upon many different quality aspects, and on the specific type of media. Many quality aspects are dependent upon the bandwidth (measured in number of bits per second that can be transferred) between the sender and the receiver. For video, these quality aspects could be picture size, colour depth and resolution. For sound, they could be sample rate and sample quantization [49], and for other types of media yet other quality aspects relating to the available bandwidth could be important. In this thesis, however, we will not concentrate on the bandwidth dependent quality-of-service aspects, but on the aspects relating to timing, such as delay and playout speed (see Section 1.2 (p. 5) for more details).

The algorithms developed in this thesis are implemented in Matlab, but not in a real-time system. Since the focus of the thesis is the development of theory, implementation of the devel-oped algorithms in a real-time system (such as on a sound card or video card) is left as an open area for future researchers or computer engineers.

The theory developed in this thesis is transport segment independent, hence to use the theory, the user or application programmer needs to provide a mathematical model (state space model) of his or her particular transport segment. As always when control theory is used to solve prob-lems in a system, the system needs to be expressed as a state space model. Since networks and protocols may change in the future, a unified mathematical model for all transport segments cannot be made now. Since there exists no exact knowledge of the networks and protocols of the future, the area of finding mathematical models of these transport segments is left as an open area for future researchers.

However, the algorithms developed in this thesis have proven to work well also for approximate transport segment state space models, that can be found for instance by the use of Matlab’s [29] System Identification Toolbox.

This thesis does not cover the area of packet loss, since this is a research area that is outside the scope of our playoutbuffer control area. Like most existing playoutbuffer control algorithms, we have concentrated on solving the problem of latency and jitter, and not looked at all types of errors that can be caused by the network. We are aware that the area of finding solutions to the


32

packet loss problem is an active research area, and that the combination of playoutbuffer control and FEC (Forward Error Correction) has also been explored [8] [45].

Combining solutions to the packet loss problem with optimal control of playout speed may be a fruitful research area that could give a total solution for interactive media transfer, and is left as an open area to future researchers.

This thesis does not look at the problem of packets arriving in the wrong order. The re-ordering of packets is assumed to be performed by one of the protocols in the transport segment. One way for this protocol to perform the reordering is to deliver both packets to the playoutbuffer at the time the latest packet arrives from the protocols below.

2.6 Summary

As far as we have seen, there is still no theoretical foundation in the field of playoutbuffer control. We have found no stringent mathematical notation or models. All presented solutions have been without theoretical founding, and new trial-and-error or ad-hoc methods keep showing up. Since the optimal performance is not known, there exists no reference for evalu-ating these ad-hoc methods.

The main goal of the work described in this thesis is to develop a theoretical foundation for multimedia receiver buffer systems, which is transport segment independent, and hence valid for all future networks and protocols. By developing a stringent mathematical notation to describe the system, we can also develop mathematical models that are transport segment inde-pendent. We also hope to eliminate the need for further ad-hoc methods by deducing the trans-port segment independent optimal controller, that can be adjusted to any user preferences regarding delay and playout speed. The optimal control algorithm deduced in this thesis can be used for both interactive applications (such as IP telephony and video conferences) and for streaming of media (e.g. video-on-demand).

An anti-run-dry algorithm is developed for the optimal control algorithm. Since both the anti-run-dry algorithm and the optimal control algorithm are transport segment independent, we have assumed that the user or programmer provides the state space model of his/her particular transport segment behaviour. The transport segment state space model is used by both the anti-run-dry algorithm and the optimal control algorithm. Even though the optimal controller is optimal only with a correct state space model, the algorithm has proven to be very robust, thus an approximate model is sufficient (as shown in Chapter 8 (p. 163)).

33

3 Mathematical modelling

The aim of this chapter is to formulate a mathematically stringent description of the receiver buffer system, and to use this description to find a mathematical state space model of the system, that can later be used to deduce the optimal controller.

This chapter is organized as shown in Figure 3.1. The essential parts of the chapter are Section 3.1, where the system is defined, Section 3.4 (p. 41), which introduces the stringent notation that is developed for this thesis, and is used in the rest of the thesis, and Section 3.7 (p. 69), which gives a summary of the mathematical model developed in this chapter.

Section 3.2 (p. 35) describes the assumptions used in the rest of the thesis, and may be skipped for readers who only want to read the essence of the thesis.

The more theoretical part of this chapter consists of Section 3.3 (p. 36), which gives a contin-uous interpretation of the system, that will help the understanding of the continuous mathemat-ical models for the system, Section 3.5 (p. 44) that first finds the mathematical relations between the entities in the system, and use these relations to develop a mathematical model of the receiver buffer system, and Section 3.6 (p. 67), which describes the observability and control-lability of the system.


3.1 Definitions

3.2 Assumptions

3.3 Continuous interpretation of the system

3.4 Notation

3.5 Mathematical relations

3.6 Observability and controllability

3.7 Summary

3.1 Definitions

3.2 Assumptions


3.4 Notation



3.7 Summary

Chapter 3 Mathematical modelling

34

3.1 Definitions

We need a new term to express the amount of media in a flow. We cannot use the terms bits or bytes to express this, since one second of media in a flow can contain much more data bits than another second in a flow (for instance, due to a different level of compression). An example of this, is that for video, compression can lead to a varying number of bits per second, since the compression of a sequence of a movie is dependent upon the difference between consecutive pictures, and the compression of each picture is dependent upon the level of details in the picture.

Another way to express the amount of media in a flow could be to use the term packet - but then we would have to discuss at which protocol level to choose this packet. Furthermore, since packets at the same protocol level usually contain a constant amount of bytes, we get the same problem as with bits or bytes.

Therefore, we introduce the term media-unit to define the amount of media corresponding to a constant amount of time when playing the media at the correct media speed. A media-unit could be chosen as the smallest amount of media that is used by the player. One example is a 50 pictures/second video, where the most intuitive would be to define a media-unit as 20 ms of media. When played at the correct media speed, all media-units in the same media in a flow needs the same amount of time to be played.

In this work, the size of each media-unit is not very important, but rather the fact that each media-unit corresponds to the same amount of ideal playout time. The optimal control algorithm developed in this thesis works equally well for any media-unit size. However, for some equa-tions in this thesis and in the simulation environment, we have assumed that the media-units are sent to the player one by one. In our simulation environment, we have also assumed that the playoutbuffer receives single media-units or collections of media-units from the transport segment. Thus, if a media-unit consists of several packets in the network, it must be assembled before being sent into the playoutbuffer.

If the sender would send a stream with a constant bit rate (as with PCM-encoded sound), each media-unit would contain a constant number of bits. In this work however, we do not make any assumptions regarding the number of bits in each media-unit.

We have the following main components in our system (illustrated in Figure 3.2):

3.2 Assumptions

35

• the sender: This is an ideal sender that sends out data at the correct media speed rSNDR

[media-units/sec]. Any jitter from a real-life sender is included in the model for the trans-port segment component.

• the transport segment: receives data from the ideal sender, and sends the data to the playout-buffer on the receiver machine with the rate rTRS(t) [media-units/sec]. This component

models the entire path from the ideal sender application to the receiver application, includ-ing the jitter from the real-life sender and both delay and jitter from all networks and proto-cols between the sending application on the sender machine and the playoutbuffer on the receiver machine.

• the playoutbuffer: receives data from the transport segment, and queues it in a FIFO buffer. Sends out data (to the player) with the media-unit rate rPB(t) from the FIFO buffer.

• the player: receives data from the playoutbuffer and puts it into its FIFO buffer (the reason that we need to model the player as having a buffer is explained in Section 3.3.3 (p. 39)). The media-units are played with the media-unit rate rPLR(t) (determined by the controller),

which should be close to rSNDR for the users to perceive a good quality.

3.2 Assumptions

In this work, we make the following assumptions:• All media streams have a defined and known constant correct media speed, measured in

media-units per second• The receiver knows the correct media speed (rSNDR) of the media stream.

• We assume that the maximum amount of memory used by our playoutbuffer will not be lim-ited by a small amount of total memory on the receiver machine, or by too little memory

Figure 3.2:Overview of the system

Transport segment Sender Playoutbuffer Player

rSNDR rTRS rPB rPLR


36

given to our process. We assume that the only negative effect of using a large playoutbuffer is the amount of latency it will introduce.

• The receiver has no power to dynamically change the media-unit rate from the sender. One reason for this assumption is that we assume that the latency between the sender and the receiver is too large to make the communication between them effective for attaining our goals (of playoutbuffer control) by dynamically changing the sender media-unit rate. Another reason is that for many applications, e.g. tele or video conferences with more than two participants, the sender will send media to more than one receiver.

• The only available variable that we can control is the instantaneous value of rPLR (and its

derivatives).• The receiver can calculate rTRS by measuring the arrival time to the playoutbuffer of the

media-units.


We would like to model our system by continuous variables, since this will simplify the equa-tions of the relations between variables, and also give a better understanding of the dynamics of the system, compared to a discrete modelling. By making a continuous state space model of our system, we will also be able to use optimal control theory in Chapter 4 (p. 71). We choose to do most of the deductions in this thesis with continuous models, and find the discrete version of these models before implementation.

To be able to represent our system as a continuous state space model, we first need to express it with continuous variables. Thinking of the system in a continuous way can also help under-standing the dynamics of the system. One continuous way to think of it, is as if the media-units are lying one after the other on a magnetic tape, with a constant length for each media-unit. (This


37

can be thought of like music tapes or video tapes). The continuous magnetic-tape system is depicted in Figure 3.3.

The tape is sent out from the sender to the transport segment at a constant speed. From the sender side to the receiver side of the transport segment, there is a positive distance (which means that the delay from the sender to the playoutbuffer can never be zero). The tape is sent from the transport segment into the playoutbuffer in a varying rate, so that the tape inside the transport segment is sometimes tight, but is often accumulated inside the transport segment. When the tape goes out of the transport segment, it goes directly into a playoutbuffer. This buffer has capacity to store any amount of tape before it is being sent to the player. The play-outbuffer can also send the tape to the player directly, without storing it, and without adding much delay to the tape. In Figure 3.3, the two pairs of wheels giving tape into the playoutbuffer and pulling tape out of the playoutbuffer are drawn apart for visualization. In an actual system, the distance between these two pairs of wheels would be zero. Thus, if the tape between the two wheel pairs is tight, the playoutbuffer is empty. As long as the playoutbuffer is empty, the speed of media into the player can never exceed the speed of media out of the transport segment. Thus, if the rate from the transport segment is zero, and the playoutbuffer gets empty, the media speed into the player drops to zero.

3.3.1 The playoutbuffer contains complete media-units

Since a playoutbuffer contains complete media-units, we need a total model of the system where we can model the media-stream as continuous and at the same time model the playoutbuffer as

Figure 3.3:Overview of the continuous system

Player

Playoutbuffer

Sender

Transport segment


38

containing an integer number of media-units. Thus, we need to add a component that can convert the continuous media-stream from the transport segment into the complete media-units going into the playoutbuffer. This component, called a virtual buffer, is introduced in Section 3.3.2. In the same way, we use the player as a converter between the complete media-units from the playoutbuffer and the continuous media-stream experienced by the user, as explained in Section 3.3.3 (p. 39).

3.3.2 The virtual buffer

In a physical system, whole media-units are sent from the transport segment to the playout-buffer, and whole media-units are sent from the playoutbuffer to the player. Hence, the playout-buffer always contains an integer number of media-units. To be able to develop a mathematical model where the playoutbuffer contains an integer number of media-units, we will need a converter between integer media-units and continuous flow in both the input and the output end of the playoutbuffer.

In the output end of the playoutbuffer, the player will have this converter role (see Figure 3.3 and Figure 3.4), since it accepts whole media-units from the playoutbuffer, and plays them out to the user as continuous media (as described in Section 3.3.3).

In the input end of the playoutbuffer, whole media-units arrive from the transport segment. However, we would like to model the stream out of the transport segment as a continuous stream, thus we need to model the physical transport segment as two mathematical parts; the regular transport segment between the sender and the receiver, and a small virtual buffer at the receiver side. The mathematical virtual buffer (which has no physical counterpart) is filled up with the continuous media stream from the transport segment, and as soon as there is enough media inside the virtual buffer to make up a media-unit, the media-unit is ready to be delivered to the playoutbuffer. In this way, we have a continuous stream from the transport segment to the virtual buffer (see Figure 3.4), while whole media-units are delivered from the virtual buffer to the playoutbuffer.


39

In the ideal case (where the playoutbuffer can accept the new media-unit from the virtual buffer at the exact same point in time when it is ready to be sent from the virtual buffer), the virtual buffer never contains more than one media-unit of media.

For the mathematical modelling later in this chapter, we can also think of our system as containing two main mathematical buffers; the transport segment buffer and the receiver buffer, where the receiver buffer is the sum of the virtual buffer, the playoutbuffer and the buffer in the player. This way, both of these mathematical buffers will have continuous stream inputs and outputs, and thus contain a floating-point number of media-units.

3.3.3 The player

In a theoretically ideal situation, we can assume that a media-unit is the smallest amount that the media could be divided into (e.g. a picture for video), and a new media-unit is delivered to the player at the exact point in time when the player had finished the playout of the previous

Figure 3.4:Illustration of the use of the virtual buffer

Virtual buffer

Sender Playoutbuffer Player

rTRS(t) continuous

media stream

rVB(t) whole media-units

rPB(t) whole media-units

rPLR(t) continuous

media stream

rSNDR continuous

media stream

floating number of media-units

floating number

of media-units

integer number of

media-units

floating number of

media-units Transport

segment


40

media-unit. Figure 3.5 illustrates an example of the amount of media in the playoutbuffer (MPB)

and in the player (MPLR) in such an ideal situation.

In such an ideal situation, the amount of media in the player will always be between 0 and 1 media-units.

In a more practical situation, the playoutbuffer may need to send a new media-unit to the player before the player has finished the playout of the previous media-unit, to ensure that the player buffer does not run dry. The playoutbuffer may also need to send several media-units at a time

Figure 3.5:Buffer fill level in the theoretically ideal situation

Media-units in Playoutbuffer

Media-units in Player

Time

Time

MPLR=1MPLR=0.5

1/rSNDR

MPB=3 MPB=2

MPB=1

MPLR=0 MPLR=1 MPLR=0

C

1/rSNDR = e.g. holding time for one picture

MPB

MPLR

B A B

C C

A B

3.4 Notation

41

to the player, e.g. because the time used to play each media-unit could be very short. Thus, the player buffer may contain several media-units, as illustrated in Figure 3.6.

For video, the playout from the player is mathematically regarded as continuous, even though the player in reality keeps each picture on the screen for a certain number of ms before changing to the next picture. As an example, take a correct media speed is 50 pictures/second, where each picture is shown for 20 ms. If the player contains only one picture, and it has been shown for 5 ms, there are mathematically 15 ms of media left in the player buffer, since the picture is to be shown for 15 more ms before it is time to change the picture.

3.4 Notation

To find the optimal controller, we first need a precise mathematical model of the system. To be able to find this precise mathematical model, we need a stringent notation. The notation used to describe our system is presented in Table 3.1. Appendix A (p. 249) gives an overview of all the notation used in the thesis.

Figure 3.6:Buffer fill level in a non-ideal situation

Media-units in Playoutbuffer

Media-units in Player

Time

MPLR=3 MPLR=2

MPB=4

MPLR=0 MPLR=3 MPLR=1

A

B A

1/rSNDR

MPB

MPLR C

B C D

D MPB=1

D E

F

MPB=3

MPLR=3

DE

F

MPB=1

F

G H I

MPB=3

G H I

MPB=3

MPLR=2

Time


42

Symbol Unit Description

rSNDR media-units/s The constant media-unit rate out of the ideal sender, equal to

the correct media speed. .

rTRS (t) media-units/s The media-unit rate out of the transport segment,

rVB (t) media-units/s The media-unit rate out of the virtual buffer,

rPB (t) media-units/s The media-unit rate out of the playoutbuffer,

rPLR (t) media-units/s The media-unit rate out of the player,

m Media-unit number.

The first media-unit in the stream has m = 1.

MTRS (t) media-units The number of media-units in the transport segment at time t,

.

MVB(t) media-units The number of media-units in the virtual buffer at time t,

.

MPB (t) media-units The number of media-units in the playoutbuffer at time t,

.

MPLR (t) media-units The number of media-units in the player at time t,

.

MPBPLR (t) media-units The number of media-units in the playoutbuffer and player at

time t, ,

.

MRCV (t) media-units The number of media-units in the receiver (i.e. in the virtual buffer, playoutbuffer and player) at time t,

, .

seconds The time period that media-unit m spends in the transport

segment. ,

seconds The time period that media-unit m spends in the virtual

buffer. ,

Table 3.1: Notation

rSNDR 0≥

rTRS t( ) 0 t∀,≥

rVB t( ) 0 t∀,≥

rPB t( ) 0 t∀,≥

rPLR t( ) 0 t∀,≥

m 1≥

MTRS t( ) 0 t∀,≥

MVB t( ) 0 t∀,≥

MPB t( ) 0 t∀,≥

MPLR t( ) 0 t∀,≥

MPBPLR t( ) MPB t( ) MPLR t( )+=

MPBPLR t( ) 0 t∀,≥

MRCV t( ) MVB t( ) MPB t( ) MPLR t( )+ += MRCV t( ) 0 t∀,≥

λTRS m,λTRS m, 0 m∀,> m 1≥

λVB m,λVB m, 0 m∀,> m 1≥

3.4 Notation

43

The use the most central notation listed in Table 3.1 is illustrated in Figure 3.7.

seconds The time period that media-unit m spends in the playout-

buffer. ,

seconds The time period that media-unit m spends in the player.

seconds The total latency that media-unit m experiences in the

receiver,

tSNDR,m seconds The time that media-unit m leaves the sender.

tTRS,m seconds The time that media-unit m leaves the transport segment.

tVB,m seconds The time that media-unit m leaves the virtual buffer.

tPB,m seconds The time that media-unit m leaves the playoutbuffer.

tPLR,m seconds The time that media-unit m leaves the player.

t0 seconds The time the sender starts sending the first media-unit.

tk seconds The time of timestep number k.

hk seconds The length of the timeperiod between timesteps k and k+1,

.

Figure 3.7:Illustration of the use of notation

Symbol Unit Description

Table 3.1: Notation

λPB m,λPB m, 0 m∀,≥ m 1≥

λPLR m,λPLR m, 0 m∀,≥

λRCV m,λRCV m, λVB m, λPB m, λPLR m,+ +=

hk tk 1+ tk–=

Virtual buffer

Playoutbuffer

Player

Sender rSNDR rTRS rVB rPB rPLR

Contents: MTRS

Latency: λ TRS ,m MVB

λ VB,m

MPB

λ PB ,m MPLR

λ PLR ,m

MPBPLR, λPBPLR

MRCV, λRCV

Transport segment


44

To make the equations easier to read, all variables are represented by italic letters and all vari-ables throughout the thesis comply with the following rules:

To make the thesis easier to read and understand, the most important equations in the thesis (i.e. the resulting equations of the different deductions, or equations that will be used later in the thesis) have a black frame.


To be able to deduce the transport segment independent optimal controller in Chapter 4, we need a precise mathematical representation of the system. This chapter use the notation from Section 3.4 to develop transport segment independent mathematical models of our system.

Section 3.5.1 describes mathematical relations between some of the quantities of our system, and Section 3.5.2 (p. 49) uses some of these relations to deduce a state space model of the system. Section 3.5.3 (p. 52) describes the timestamping process for the media-units. The results from this description is used by Section 3.5.4 (p. 54) to deduce a prediction of the transport segment rate measurement and its measurement noise, and by Section 3.5.5 (p. 58) to deduce

Description Example Meaning

Non-bold letter x, M Scalar

Lower case bold letter x Vector

Upper case bold letter A Matrix

Lower right subscript number Subscript denotes timestep number

A dot on top of a variable Time derivative of the variable

A hat on top of a variable Updated Kalman filter estimate of the variable

A bar on top of a variable (Time) prediction of the variable

A tilde on top of a variable Measured value of the variable

A delta in front of a variable Measurement error of the variable

Table 3.2: Notation rules

xk

x· tddx=

xk

xk

xk

δx x x–=


45

the prediction of MVB and it’s variance. Section 3.5.6 (p. 65) discusses how to discretize a

continuous state space model, and it also contains an explanation of white noise discretization.

3.5.1 Mathematical relations between the system quantities

The time that media-unit m leaves the sender equals the number of media-units sent (including m) divided by the sender rate, plus the start time:

(3.1)

where is the time that media-unit m leaves the sender, and t0 is the time the sender

starts sending the first media-unit. The latency from the transport segment will delay the media-units as follows:

(3.2)

where is the latency that media-unit m experiences in the transport segment and

is the moment when media-unit m leaves the transport segment. In addition, the virtual

buffer latency and the playoutbuffer latency gives an extra delay:

(3.3)

where is the time when media-unit m leaves the playoutbuffer, and and

are the latencies that media-unit m experiences in the virtual buffer and the playoutbuffer, respectively. When also including the player latency, we can find the time when the media-unit is sent out of the player (i.e. the playout time of the media-unit):

(3.4)

where is the time when media-unit m leaves the player and is the latency that

media-unit m experiences in the player. Thus, the total latency for media-unit m is:

(3.5)

tSNDR m,m

rSNDR--------------- t0+=

tSNDR m,

tTRS m, tSNDR m, λTRS m,+=

λTRS m,

tTRS m,

tPB m, tTRS m, λVB m, λPB m,+ +tSNDR m, λTRS m, λVB m, λPB m,+ + +

==

tPB m, λVB m, λPB m,

tPLR m, tPB m, λPLR m,+tSNDR m, λTRS m, λVB m, λPB m, λPLR m,+ + + +

==

tPLR m, λPLR m,

λtotal m, tPLR m, tSNDR m,–λTRS m, λVB m, λPB m, λPLR m,+ + +λTRS m, λRCV m,+

===


46

The number of media-units in the transport segment is given by:

, (3.6)

where is the number of media-units in the transport segment at time t. In the same

way, we can find the number of media-units in the virtual buffer:

, (3.7)

where is the number of media-units in the virtual buffer at time t. The number of media-

units in the playoutbuffer can be found in the same way:

(3.8)

where is the number of media-units in the playoutbuffer at time t. The amount of media

in the in the player is:

(3.9)

where is the number of media-units in the player at time t. The total number of media-

units at the receiver can be written as:

, (3.10)

By combining equations (3.7), (3.8), (3.9) and (3.10), we get:

. (3.11)

MTRS t( ) rSNDR rTRS τ( )–( ) τd

t0

t

0≥=

MTRS t( )

MVB t( ) rTRS τ( ) τd

t0

t

rVB τ( ) τd

t0

t

– 0≥=

MVB t( )

MPB t( ) rVB τ( ) τd

t0

t

rPB τ( ) τd

t0

t

– 0≥=

MPB t( )

MPLR t( ) rPB τ( ) τd

t0

t

rPLR τ( ) τd

t0

t

– 0≥=

MPLR t( )

MRCV t( ) MVB t( ) MPB t( ) MPLR t( )+ +MVB t( ) MPBPLR t( )+

==

MRCV t( ) rTRS τ( ) τd

t0

t

rPLR τ( ) τd

t0

t

– 0≥=


47

The first term of the above equation is the total number of media-units that at time t has arrived from the transport segment into the receiver buffers. The second term is the total number of media-units that at time t has been played out from the player. The difference between these two terms is the number of media.-units at time t in the receiver buffers.

In Section 3.5.2 (p. 49), we will need the time derivative of Equation (3.11). By using the following formula from [46]:

, (3.12)

Equation (3.11) can be written as:

. (3.13)

Since:

, (3.14)

and:

, (3.15)


(3.16)

The essence of the above equation is that the rate of change of the number of media-units in the receiver buffers equals the difference between the rate into the receiver buffers from the trans-port segment and the rate out of the receiver buffers that are played out from the player.

The transport segment latency for media-unit number m, , can be expressed in two

ways:

ddx------ f x y,( ) yd

a x( )

b x( )

f x b,( ) ddx------b x( ) f x a,( ) d

dx------a x( )– x∂

∂ f x y,( ) yd

a

b

+=

M· RCV t( ) ddt----- rTRS τ( ) τd

t0

t

ddt----- rPLR τ( ) τd

t0

t

–

rTRS t( ) ddt-----t d

dt-----r

TRSτ( ) τd

t0

t

rPLR t( ) ddt-----t d

dt-----r

PLRτ( ) τd

t0

t

+� ��

–+

=

=

ddt-----r

TRSτ( ) 0=

ddt-----r

PLRτ( ) 0=

M· RCV t( ) rTRS t( ) rPLR t( )–=

λTRS m,


48

• As the time used to send to the virtual buffer all the media-units that are inside the transport segment (including m) at the time media-unit m has just entered the transport segment:

(3.17)

• or the time it took to send all the media that are inside the transport segment at the time media-unit m has just left the transport segment, from the sender to the transport segment:

(3.18)

Likewise, can be expressed in two ways:

, (3.19)

or:

. (3.20)

Also the total latency at the receiver can be expressed in two ways, as:

(3.21)

or as:

(3.22)

λTRS m,1

rTRS tTRS β,( )-------------------------------- βd

m MTRS tSNDR m,( )–( )

m

=

λTRS m,1

rSNDR--------------- βd

m

m MTRS tTRS m,( )+( )

MTRS tTRS m,( )

rSNDR------------------------------------= =

λPB m,

λPB m,1

rPB tPB β,( )--------------------------- βd

m MPB tVB m,( )–( )

m

=

λPB m,1

rVB tVB β,( )--------------------------- βd

m

m MPB tPB m,( )+( )

=

λRCV m,1

rPLR tPLR β,( )--------------------------------- βd

m MRCV tTRS m,( )–( )

m

=

λRCV m,1

rTRS tTRS β,( )-------------------------------- βd

m

m MRCV tPLR m,( )+( )

=


49

3.5.2 State space modelling of the system

To be able to deduce the optimal controller, we need to express our system knowledge in a state space model.

To find this model, we will need Equation (3.16) on page 47. As stated in Section 3.2 (p. 35), our only way to control the system is to control the player rate rPLR. Thus, we can use the player

rate itself, or any of its derivatives as our control variable. We choose to use the time derivative of the player rate as our control variable:

(3.23)

since this will give simpler equations for the optimal control of our system than other choices (deduced in Chapter 4 (p. 71)).

Since our aim is to develop a transport segment independent state space model, we need the user or application programmer to give us a state space model of his/her particular transport segment. Generally, this is given on the form:

. (3.24)

where is the state vector and CTRSvTRS expresses the system noise, where vTRS is a vector

of uncorrelated Gaussian white noise with a zero mean and a variance of 1. The first state of the

state vector is , i.e. the rate out of the transport segment into the playout-

buffer minus the sender rate. The rest of the states are given by the model used:

(3.25)

In the rest of this thesis, we assume that the first term on the right hand side of Equation (3.24)

is either linear (so that it can be written on the form ) or can be linearized by using

at each specific point in time. Thus, we use the transport segment

state space equation:

(3.26)

u r·PLR=

x· TRS f xTRS( ) CTRSvTRS+=

xTRS

xTRS rTRS rSNDR–( )

xTRSrTRS rSNDR–

:=

ATRSxTRS

ATRS t( )f xTRS t( ) t,( )∂

xTRS t( )∂---------------------------------=

x· TRS ATRSxTRS CTRSvTRS+=


50

where ATRS is the system matrix.

To express our total system by a state space model, we first need to choose the states in the system state vector. We choose the receiver buffer level as the first state, the difference between the player rate and the sender rate as the second state, and the last states are chosen as the trans-port segment state vector. Thus, we get the following total system state vector:

. (3.27)

By using Equations (3.16) on page 47, (3.23), (3.25) and (3.26), we can find the following equa-tion for the time derivative of the state vector:

, (3.28)

where is the number of states in the transport segment state space equation (Equation

(3.26)), denotes a zero matrix of dimension a by b, and where is absent on the

left hand side for state number 2, since it is constant ( ).

One can obtain a good transport segment model for use in the algorithms in this thesis, with a

low number of states ( ) in . In the simulations of this paper (see Chapter 8 (p. 163)),

we have used

The equation for a general state space model is:

(3.29)

xMRCV

rPLR rSNDR–

xTRS

=

x·M· RCVr·PLRx· TRS

rPLR t( ) rSNDR–( )– 1 01 nTRS 1–( )× xTRS� �� +

uATRSxTRS CTRSvTRS+

=

=

nTRS

0a b× rSNDR

r·SNDR 0=

nTRS xTRS

nTRS 2=

x· Ax Bu Cv+ +=


51

Equation (3.28) can be written on this form:

(3.30)

By combining the above two equations, we can find our matrixes of the state space model as:

, (3.31)

, (3.32)

and , (3.33)

where:

, (3.34)

and

M· RCVr·PLRx· TRS

0 1– 1 01 nTRS 1–( )×

0 0 01 nTRS×

0nTRS 1× 0nTRS 1× ATRS

MRCVrPLR rSNDR–

xTRS

01

0nTRS 1×

u00

CTRS

vTRS+ +

=

B01

0nTRS 1×

=

C00

CTRS

=

AA1 RCV, A2 RCV,

0nTRS 2× ATRS=

A1 RCV,0 1–0 0

=


52

. (3.35)

Equation (3.29) together with Equations (3.31) to (3.35) describe the linear area of our system. It gives a precise description of our system as long as the buffer does not run dry. Section 4.1 (p. 72) will discuss the non-linearity of our system.

3.5.3 Timesteps and timestamps

As mentioned in Section 2.3.2 (p. 28), we will use a controller algorithm to control the playout speed from the player. In a regular computer, the controller process needs to share the CPU (Central Processing Unit) with other processes, and thus cannot run continuously. Instead, we let the controller be run at regular intervals, called timesteps, as described in Section 3.5.3.1.

If the intervals between each run of the controller is long, we need a separate process to receive the media-units arriving from the transport segment, and find their timestamps. This is explained in Section 3.5.3.2 (p. 53).

3.5.3.1 Timesteps

The controller at the receiver will be run at regular1 intervals, called timesteps. In the equations in this thesis, we assume a numbering (by using the letter k) of the timesteps. The time between two consecutive timesteps k and k+1 is called hk:

(3.36)

where tk is the time of timestep k and tk+1 is the time of timestep k+1.

It does not matter for the performance of the total system whether it is the timestamping process (see Section 3.5.3.2) or the controller that moves the media-units into the playoutbuffer. The reason for this is that if the timestamping process does not move the media-units into the play-outbuffer, the first task of the controller in a new timestep is to move all new timestamped media-units into the playoutbuffer. When the next tasks for the controller in the same timestep

1. The length of the intervals might vary, but should be kept under a certain limit if the player receives new media-units only at these intervals. If not, the player buffer might run dry while there are still media left in the playoutbuffer.

A2 RCV,

1 01 nTRS 1–( )×

0 01 nTRS 1–( )×=

hk tk 1+ tk–=


53

are performed, they will not notice which process put the new media-units into the playout-buffer.

3.5.3.2 The timestamping process

We cannot measure , the rate out of the transport segment directly. To be able to calcu-

late a measurement for , we need to measure , the point in time when media-unit

m is leaving the transport segment, for all media-units.

This can be done by a timestamping process, that measures for each media-unit as soon

as it arrives from the transport segment (or, mathematically, from the virtual buffer). The infor-

mation about the timestamp ( ), should be given to the playout speed controller at the next

timestep. The timestamp is the point in time when the reception process has received

the entire media-unit, and could, if desired, immediately move the media-unit from the playout-

buffer to the player buffer and play it. Mathematically, is the point in time when the last

infinitesimal part of media-unit m has been sent from the transport segment to the virtual buffer.

A media-unit, or a collection of media-units, is regarded as being sent out of the virtual buffer at the time of its timestamp, and hence mathematically, the time measured by the timestamping

process is really . Since the virtual buffer is used as a converter, in an ideal situation,

where the media-unit is timestamped at the exact moment in time when it is ready to leave the virtual buffer, we have that:

(3.37)

In a real implementation, the timestamping process will have two main errors:• Since there will always be a small delay from the reception of the media-unit, until the time

of the timestamp, the measurement of will be delayed compared to the time when

media-unit m was leaving the virtual buffer, or .

• The error of the timestamp itself is never exactly zero (due to error of the receiver machine clock), and thus can be modelled as having a noise component. The machine clock error (also called jitter error) between two adjacent timesteps is varying from timestep to timestep, and can be a random positive or negative value. Even though the clock skew (the sum of the jitter values over time) can be quite large, the jitter in itself is usually very small. For the calculations in this chapter, we use the approximation that the machine clock error (jitter error) is white noise.

rTRS t( )

rTRS t( ) tTRS m,

tTRS m,

tTRS m,

tTRS m,

tTRS m,

tVB m,

tTRS m, tVB m,=

tVB m,

tVB m, tTRS m,>


54

3.5.4 Transport segment rate measurement and its measurement noise

This section will find equations for the media-unit rate out of the transport segment and the vari-ance of the measurement noise of this rate. This variance will be used in the measurement noise covariance matrix W in the Kalman filter described in Chapter 5 (p. 91).

Both the optimal controller (to be deduced in Chapter 4 (p. 71)) and the transport segment rate estimator (to be deduced in Chapter 5 (p. 91)) need measurements of the rate from the transport

segment, . As described in Section 3.5.3, cannot be measured directly, but we

can measure , the timestamp of each media-unit m, and use these timestamps to calculate

.

To calculate , a measurement of the mean1 rate of media-unit number m from the trans-

port segment, we use the equation:

(3.38)

If we would like to calculate the mean rate of a collection of n media-units, where media-unit number m is the last media-unit in the collection, we use the equation:

(3.39)

We use Equation (3.38) or (3.39) to calculate the mean rate from the transport segment in the

time period: or . The best timestamp we can give to

this calculated measurement is the middle point of the time period, such that:

(3.40)

where is the time of the rate measurement , and:

1. If the media-unit had been divided into many packets in the transport segment, and these packets arrived in a varying rate from the transport segment, we would need to find the mean rate of these packets, to express the rate of the entire media-unit as one rate.

rTRS t( ) rTRS t( )

tTRS m,

rTRS t( )

rTRS m,

rTRS m,1 media-unit

tTRS m, tTRS m, 1––( )----------------------------------------------------=

rTRS m n– m ],(,n media-units

tTRS m, tTRS m, n––( )----------------------------------------------------=

tTRS m, 1–( tTRS m, ], t( TRS m, n– tTRS m, ],

trTRS m,tTRS m,

tTRS m, tTRS m, 1––( )2

----------------------------------------------------–=

trTRS m,rTRS m,


55

(3.41)

where is the time of the rate measurement .

By using Equations (3.38) and (3.40) or Equations (3.39) and (3.41) to calculate the rate out of the transport segment, there are two types of errors:• Measurement error due to the white noise error (explained in section 3.5.3.2 (p. 53)) of the

time stamping process. This error is discussed in Section 3.5.4.1.• By using the middle point of the time period as the timestamp of the rate measurement, we

assume a linear rate change during the time period. The error introduced by this lineariza-tion error is discussed in Section 3.5.4.2 (p. 56).

3.5.4.1 Measurement error due to timestamping error

If we assume that the standard deviation of the white noise error (explained in section 3.5.3.2

(p. 53)) of the timestamping process is , we can calculate the associating standard devia-

tion of the rate measurement. By using Equation (3.39), we get:

(3.42)

where is the standard deviation of the timestamping error of the rate

measurement , is the measurement error of and

is the measurement error of . Equation (3.42) leads to:

trTRS m n– m ],(,tTRS m,

tTRS m, tTRS m, n––( )2

----------------------------------------------------–=

trTRS m n– m ],(,rTRS m n– m ],(,

σtTRS

σrTRS m n– m ],(, timestamp,2 E rTRS m n– m ],(, rTRS m n– m ],(,–( )2( )

E n media-unitstTRS m, tTRS m, n––( )

---------------------------------------------------- n media-unitstTRS m, tTRS m, n––( )

----------------------------------------------------–� �� 2� ��

n media-units( )2Eδ tTRS m,– δ tTRS m, n–+

tTRS m, tTRS m, n––( ) tTRS m, tTRS m, n––( )---------------------------------------------------------------------------------------------------------� �� 2

� ��

=

=

=

σrTRS m n– m ],(, timestamp,

rTRS m n– m ],(, δ tTRS m, tTRS m, δ tTRS m, n–

tTRS m, n–


56

(3.43)

Therefore, an approximate value of the standard deviation is:

(3.44)

Note that it is only the short-term error of the timestamping process that contributes to the rate measurement error discussed in this section. The reason for this can be seen from the last line in Equation (3.42). Here, the time difference between the two timestamps, and the difference between the error of the two timestamps is calculated, but the size of the error itself is cancelled.

3.5.4.2 Measurement error due to linearization error

By using the middle point of the time period as the timestamp of the mean rate of the time period, we introduce a linearization error, since this is the same as assuming that the rate change during the time period has been constant. The error introduced by this linearization is:

(3.45)

where is the standard deviation of the linearization error of the rate

measurement and is the true mean rate out of the trans-

port segment in the time period . As can be seen from Figure 3.8, the size

of the linearization error is dependent upon the magnitude of the high frequency part of the transport segment jitter and upon the size of the collection being measured.

σrTRS m n– m ],(, timestamp,2

n media-units( )2Eδ tTRS m, δ tTRS m, n––( )2

tTRS m, tTRS m, n––( )2 tTRS m, tTRS m, n––( )2---------------------------------------------------------------------------------------------------------------� ��

2 n media-units( )2σtTRS

2

tTRS m, tTRS m, n––( )4---------------------------------------------------------

=

=

σrTRS m n– m ],(, timestamp,

n media-units( ) 2σ⋅ tTRS

tTRS m, tTRS m, n––( )2---------------------------------------------------------------≈

rTRS m n– m ],(,

2σtTRStTRS m, tTRS m, n––-----------------------------------------------⋅=

σrTRS m n– m ],(, linearization, rTRS trTRS m n– m ],(,( ) rTRS m n– m ],( mean, ,–=

σrTRS m n– m ],(, linearization,

rTRS m n– m ],(, rTRS m n– m ],( mean, ,

t( TRS m, n– tTRS m, ],


57

3.5.4.3 Rate measurement error of different collection sizes

The influence of the timestamping error and the linearization error on the rate measurement error is illustrated in Figure 3.8, where the measured rate of different sizes of collections are shown. Figure 3.8 is made by the receiver system environment simulator described in Chapter 7 (p. 137), with rSNDR = 50 media-units/s. However, in this section, it is only meant as an illus-

tration of the rate measurement error of different collection sizes.

The total measurement error is a sum of the error caused by the timestamping process (see Equa-tion (3.44)) and the linearization error (see Equation (3.45)).

As can be seen from Equation (3.44), the error caused by the timestamping process is reduced by increasing the number of media-units in each collection. The linearization error, on the other hand, is increased by increasing the collection size, since the more seldom we get rate measure-ments, the more of the high-frequency parts of the transport segment rate are not detected by our rate measurement. Thus, one must be careful when designing the size of the collections.

Figure 3.8 illustrates three different collection sizes:

Figure 3.8:Illustration of measurement of the rate out of the transport segment

0 1 2 3 4 5 6 7

42

44

46

48

50

52

54

56

seconds

med

ia-u

nits

/s

true rate (rTRS) measurement of rTRS,(m-1,m] measurement of rTRS,(m-4,m] measurement of rTRS,(m-50,m]


58

• The green line illustrates the consequence of measuring too small collection sizes. The line-arization error is negligible compared to the error caused by the timestamping process, which is dominating the total error.

• The red line illustrates a good choice of collection size, where none of the errors are very dominant compared to the other.

• The blue line illustrated the consequence of having too large collection sizes. The error caused by the timestamping process is negligible compared to the linearization error, which dominates the total error.

3.5.5 Prediction of MVB

As explained in Section 3.4 (p. 45), the virtual buffer is not a real buffer. It is a mathematical description that is used to convert the modelled continuous stream from the transport segment to the whole media-units being sent into the playoutbuffer.

A media-unit, or a collection of media-units, is regarded as being sent out of the virtual buffer

at the time of its timestamp, .

The use of the virtual buffer is illustrated in Figure 3.9, where:

• The yellow line is the integral of , i.e. the sum of media-units that had entered the

virtual buffer if there was no transport segment jitter.

• The green line is the integral of , i.e. the sum of media-units that the virtual buffer

has received.

• The magenta line is the integral of , i.e. the sum of media-units that the playoutbuffer

has received.

Thus, the difference between the green line and the magenta line is the content of media inside the virtual buffer. This is illustrated in Figure 3.10. Figures 3.9 and 3.10 are made by use of the simulation environment explained in Chapter 7 (p. 137), with rSNDR = 50 media-units/s.

tVB m,

rSNDR

rTRS t( )

rVB t( )


59

In this thesis, we assume that the timestamping process is fast enough to give each media-unit a separate timestamp, so that the amount of media in the virtual buffer does not exceed one media-unit (see explanation in Section 3.5.5.1).

Figure 3.9:Illustration of integrated rates

Figure 3.10:Illustration of number of media-units in the virtual buffer, MVB


60

In Chapter 6 (p. 103), we will need a predicted value of the amount of media in the virtual buffer, and the variance of the prediction error, at the following points in time:• At the time of the timestamp for the media-unit or collection of media-units. This is

deduced in Section 3.5.5.1• At the time of the first timestep after the timestamp. This is deduced in Section 3.5.5.2.• At n timesteps after the arrival of last media-unit. This is deduced in Section 3.5.5.3 (p. 62).

3.5.5.1 Prediction of MVB at the timestamp

If the time-stamping process (explained in Section 3.5.3.2) had been able to give a timestamp

to media-unit m at the moment in time when the last infinitesimal part of m

arrived from the transport segment, the virtual buffer would be empty at (i.e. we would

have ). In a practical implementation, the time-stamping process is not that

fast, and thus, an amount of media-unit number m+1 will usually have arrived into the virtual buffer before the time-stamping process gives a timestamp to media-unit number m. Since this amount is impossible to predict, the best guess we can make is to assume that the amount of

media (from media-unit number m+1) in the virtual buffer at is zero:

(3.46)

Note that this assumption only means that we let the timestamp define the moment in time when the media-unit has left the transport segment.

3.5.5.2 Prediction of MVB at the first timestep after the timestamp

Since the controller is run only at specific timesteps (as explained in Section 3.5.3.1), while the timestamping process gives a timestamp to the media-units as soon as possible after they arrive from the transport segment, there will usually be a delay from the timestamp of a media-unit to the timestep when it is being handled by the controller. Even if several media-units arrive into the playoutbuffer at the same timestep, there will usually be a delay from the timestamp of the last of the media-units to the time of the timestep.

If a media-unit or a collection of media-units has timestamps in the range , they will

arrive into the playoutbuffer at timestep k. The time interval from the timestamp of

media-unit m, or of the last media-unit m of the collection of media-units, to the time of the

tVB m, tTRS m,=

tTRS m,

MVB tTRS m,( ) 0=

tTRS m,

MVB tTRS m,( ) 0≈

tk 1– tk,� �

tTRS m,

tk


61

timestep can be written as . During this time-interval, the amount of media from

the next media-unit that has been streaming into the virtual buffer can be written as:

(3.47)

One way to calculate an approximate value of this equation, is:

(3.48)

where the mean of the rate during the timeperiod is approximated by the mean of

the rate at the endpoints of the timeperiod.

In a practical implementation, we use Equation (3.46). Thus, we can predict the amount of media in the virtual buffer at timestep k by the equation:

(3.49)

where the prediction error is:

(3.50)

This prediction error consists of two parts:1. The linearization error that we do by using Equation (3.48) instead of Equation (3.47). This

error is dependent upon the dynamics of and of the length of the time period

.

2. The error due to the rate estimation error.

Since we measure at timestamps that are further apart than and , we assume

that the dynamics of that we can estimate is so low-frequent that we can assume the

linearization error to be negligible compared to the error due to the rate estimation error. There-fore, the total error can be written as:

tTRS m, tk,� �

MVB tk( ) MVB tTRS m,( )– rTRS τ( ) τd

tTRS m,

tk

=

MVB tk( ) MVB tTRS m,( )– tk tTRS m,–( )rTRS tk( ) rTRS tTRS m,( )+

2------------------------------------------------------------⋅≈

tTRS m, tk,� �

MVB tk( ) tk tTRS m,–( )rTRS tk( ) rTRS tTRS m,( )+

2------------------------------------------------------------⋅=

MVB tk( ) MVB tk( )– tk tTRS m,–( )rTRS tk( ) rTRS tTRS m,( )+

2------------------------------------------------------------ rTRS τ( ) τd

tTRS m,

tk

–

� ��

=

rTRS t( )

tTRS m, tk,� �

rTRS t( ) tTRS m, tk

rTRS t( )


62

(3.51)

The variance of the this error can be written as:

(3.52)

Since:

(3.53)


(3.54)

Thus, the variance of the prediction error depends on the length of the time period between the timestamp and the timestep, and on the variance of the prediction error of rTRS at the timestamp

and at the timestep.

In this section, we first found a prediction of MVB at the first timestep after the timestamp (given

by Equation (3.49)) and the error of this prediction (given by Equation (3.51)). Then, we found the variance of the prediction error (given by the above equation).

3.5.5.3 Prediction of MVB at n timesteps after of the timestamp

If no media-units have arrived since timestep k, the amount of media in the virtual buffer at timestep k+n can be expressed by using the equation:

(3.55)

MVB tk( ) MVB tk( )–

tk tTRS m,–( )rTRS tk( ) rTRS tTRS m,( )+

2------------------------------------------------------------

rTRS tk( ) rTRS tTRS m,( )+2

------------------------------------------------------------–� �� ≈

var MVB tk( ) MVB tk( )–( )

tk tTRS m,–( )2ErTRS tk( ) rTRS tk( )– rTRS tTRS m,( ) rTRS tTRS m,( )–+

2-------------------------------------------------------------------------------------------------------------------------------� ��

2

� �� =

ErTRS tk( ) rTRS tk( )– rTRS tTRS m,( ) rTRS tTRS m,( )–+

2-------------------------------------------------------------------------------------------------------------------------------� ��

2

� ��

max var rTRS tk( )( ) var rTRS tTRS m,( )( ),( )≤

var MVB tk( ) MVB tk( )–( )

tk tTRS m,–( )2 max var rTRS tk( )( ) var rTRS tTRS m,( )( ),( )⋅≤

MVB tk n+( ) MVB tk( ) rTRS θ( ) θd

tk i+

tk i 1+ +

i 0=

n 1–

�+=


63

In a practical implementation, we need to discretize the integral in Equation (3.55). This can be done in many different ways. We could use the forward or backward Euler method, or the mean of these two methods, i.e. the trapezoidal method:

(3.56)

To get a more accurate prediction, we divide each timestep in two, and use the trapezoidal method:

(3.57)

where . We choose to use Equa-

tion (3.57), since it gives a more accurate estimate of than Equation (3.56).

In Equation (3.49), we used a simpler expression, since the integration period of the expression to be discretized was shorter (smaller than one timestep). For Equation (3.57) we need a more accurate expression, since the prediction period can amount to several timesteps.

Next, we need to find the variance of the estimation error in Equation (3.57). By using Equation (3.57), the estimation error can be written as:

(3.58)

This can also be written as:

MVB k n+, MVB k, hk i+rTRS k i+, rTRS k i 1+ +,+

2------------------------------------------------------------

i 0=

n 1–

�+=

MVB k n+,

MVB k, hk i+rTRS k i+, rTRS k i 1+ +, 2rTRS k i 1 2⁄+ +,+ +

4----------------------------------------------------------------------------------------------------------

i 0=

n 1–

�+=

rTRS k i 1 2⁄+ +, rTRS tk i 1 2⁄+ +( ) rTRStk i+ tk i 1+ ++

2-----------------------------------� �� = =

MVB k n+,

MVB tk n+( ) MVB tk n+( )– MVB tk( ) MVB tk( )–

hk i+rTRS k i+, rTRS k i 1+ +, 2rTRS tk i 1 2⁄+ +( )+ +

4--------------------------------------------------------------------------------------------------------------- rTRS θ( ) θd

tk i+

tk i 1+ +

–

i 0=

n 1–

�+

=


64

(3.59)

As can be seen from Equation (3.59), the estimation error consists of three parts:

Error 1: The error . This is given by Equation (3.51) on page 62.

Error 2: The error of at the points in time , and . We can calcu-

late the variance of this error by using the equation:

(3.60)

Error 3: The linearization error we introduce by approximating the integration of by

the integration period times the trapezoidal of (as shown in Equation (3.57)).

This error consists of the last line of Equation (3.59). The size of this error depends on the transport segment state space model and on the time between consecutive timesteps. By using the trapezoidal rule (page 54 in [7]), we can write the lineariza-tion error of Equation (3.57) as:

(3.61)

MVB tk n+( ) MVB tk n+( )– MVB tk( ) MVB tk( )–

hk i+ΔrTRS k i+, ΔrTRS k i 1+ +, 2ΔrTRS tk i 1 2⁄+ +( )+ +

4---------------------------------------------------------------------------------------------------------------------------

hk i+rTRS k i+, rTRS k i 1+ +, 2rTRS tk i 1 2⁄+ +( )+ +

4--------------------------------------------------------------------------------------------------------------- rTRS θ( ) θd

tk i+

tk i 1+ +

–

i 0=

n 1–

�+

i 0=

n 1–

�+

=

MVB tk( ) MVB tk( )–( )

rTRS tk i+ tk i 1+ + tk i 1 2⁄+ +

var error2( )

hk i+2 max var rTRS k i+,( ) var rTRS k i 1+ +,( ) var rTRS tk i 1 2⁄+ +( )( ), ,( )

i 0=

n 1–

��

≤

rTRS t( )

rTRS t( )

error32 hk i+ 2⁄( )3

12------------------------------r··TRS

i 0=

n 1–

�–

hk i+3

48------------r··TRS

i 0=

n 1–

�–

=

=


65

In our implementation, we have assumed that the linearization error is negligible compared to the other errors, and hence, we have used the following equation for the variance of the predic-

tion error of :

(3.62)

3.5.6 Discretization of the continuous state space model

In our mathematical model, given by Equation (3.29) on page 50, we use a continuous-media-stream and continuous-time state space model. Since this model will be used by a computer program that runs at specific timesteps (as explained in Section 3.5.3.1 (p. 52)), we need to calculate the time-discrete version of the state space model. Thus, we need to find a model that has continuous media stream, but is time-discrete.

Our system state space model is given by Equation (3.29) on page 50 as the continuous state space model:

(3.63)

where the matrixes A, B and C are constant.

To discretize this, we start with:

(3.64)

For the second term on the right hand side of Equation (3.64), we assume that the period between the timesteps are short enough for us to write:

(3.65)

MVB k n+,

var MVB tk n+( ) MVB tk n+( )–( ) var MVB tk( ) MVB tk( )–( )

hk i+2 max var rTRS k i+,( ) var rTRS k i 1+ +,( ) var rTRS k i 1 2⁄+ +,( ), ,( )

i 0=

n 1–

��

+

≤

x· t( ) Ax t( ) Bu t( ) Cv t( )+ +=

xk 1+ xk Ax τ( ) Bu τ( )+( ) τd

tk

tk 1+

Cv τ( ) τd

tk

tk 1+

+ +=

Ax τ( ) Bu τ( )+( ) τd

tk

tk 1+

hkAxk hkBuk+≈


66

For the third term on the right hand side of Equation (3.64), the discrete noise vector can be

written as:

(3.66)

We need an expression for the covariance matrix of as a function of the spectral density

matrix of the continuous process noise . The spectral density matrix of the continuous

white noise can be expressed by the equation:

(3.67)

where is the Dirac delta function, defined as [7]:

(3.68)

By using Equation (3.67), the covariance matrix of the discrete noise vector can be

written as:

(3.69)


(3.70)

Further, we get:

vk

vk v τ( ) τd

tk

tk 1+

=

vk

v t( ) V t( )

v t( )

E v t( ) vT t τ+( )⋅( ) V t( )δ τ( )=

δ t( )

δ t( ) 0 t 0

δ t( ) td

∞–

∞

≠,

1

=

=

Vk vk

Vk E vkvkT( ) E v τ1( )vT τ2( ) τ2d( ) τ1d

tk

tk 1+

tk

tk 1+

� ��

= =

Vk E v τ1( )vT τ2( )( ) τ2d( ) τ1d

tk

tk 1+

tk

tk 1+

=


67

(3.71)

Equation (3.71) can then be written as:

(3.72)

If we assume that the time period between the timesteps is chosen small enough compared to the rate of change of V(t), we can use the simplified expression:

(3.73)

By inserting Equations (3.65) and (3.66) into Equation (3.64), we get:

(3.74)

where: , and . We have now found the discrete version of

the continuous state space model in Equation (3.63). Since it is discrete, Equation (3.74) can be used as the state space model in a computer program.


The observability and controllability of a state space model is usually calculated mathemati-cally. However, we choose to find the observability and controllability by reasoning, since this will give a better understanding of the system.

Each state in a system or in a state space model can be in one of four modes:

1. Controllable and observable

2. Controllable but not observable

3. Observable but not controllable

4. Neither controllable nor observable

Vk V τ2( )δ τ1 τ2–( ) τ2d( ) τ1d

tk

tk 1+

tk

tk 1+

=

Vk V τ1( ) τ1d

tk

tk 1+

� ��

=

Vk hkV tk( )≈

xk 1+ xk hkAxk hkBuk Cvk+ + +≈Φkxk Λkuk Γvk+ +=

Φk I Ahk+( )= Λk hkB= Γ C=


68

The observability of our system is discussed in Section 3.6.1, the controllability is discussed in Section 3.6.2, and a summary is given in Section 3.6.3.

3.6.1 Observability

The observability of each of the states in our model is:

• State 1, , can be estimated, since at each timestep, we can measure the value of

and calculate an estimate of .

• State 2, , is known, since our controller decides the value of , and the

constant value of is known in advance.

• State 3, , can be measured, since we measure the value of (as explained

in section 3.5.4), and the constant value of is known in advance.

The rest of the states (in addition to state 3) are part of the transport segment state space model

given by the user. There is no use in having states in that are not observable by measuring

(i.e. by measuring the first state in ). Therefore, we assume that is observable

by measuring .

Thus, all the states in our state space model are observable.

3.6.2 Controllability

The controllability of each of the states in our model is:

• State 1, , can be indirectly controlled, since the control of influences the

receiver buffer level .

• State 2, , can be directly controlled, since our control variable is and

is a constant variable.

• State 3, , and the rest of the states in the transport segment state space model

cannot be controlled, since one of the assumptions of this thesis was the we do not influence the transport segment (as explained in Section 3.2).

MRCV

MPBPLR MVB

rPLR rSNDR– rPLR

rSNDR

rTRS rSNDR– rTRS

rSNDR

xTRS

rTRS xTRS xTRS

rTRS

MRCV rPLR

MRCV

rPLR rSNDR– rPLR

rSNDR

rTRS rSNDR–

3.7 Summary

69

3.6.3 Summary of observability and controllability

To sum it up, states 1 and 2 are controllable and observable, while state 3 and the rest of the states of the transport segment state vector are observable, but not controllable. Thus, all states are observable, and since we can control the first two states, we can control the variables that we need to control. This is illustrated in Figure 3.11.

3.7 Summary

This chapter has deduced the total state space model for our system as: ,

where: , , and ,

where: , and is the number of states in

the state-space equation for the transport segment, and we have selected the control variable u

as . The user or application programmer has given a state space model for the trans-

port segment as: , where xTRS is the state vector, ATRS is the

system matrix and CTRS vTRS expresses the system noise, where vTRS is a vector of uncorrelated

Gaussian white noise with zero mean and a variance of 1. The first state of the state vector

Figure 3.11:Controllability and observability for our system

Controllable and observable States 1 and 2

Controllable but not observable

Observable but not controllable All transport segment states

Neither controllable nor observable

Control signal Measurement

x· Ax Bu CvTRS+ +=

xMRCV

rPLR rSNDR–

xTRS

= B01

0nTRS 1×

= C00

CTRS

= AA1 RCV, A2 RCV,

0nTRS 2× ATRS=

A1 RCV,0 1–0 0

= A2 RCV,

1 01 nTRS 1–( )×

0 01 nTRS 1–( )×= nTRS

u r·PLR=


xTRS


70

is , and the rest of the states are given by the specific model used:

.

rTRS rSNDR–( )

xTRSrTRS rSNDR–

:=

71

4 Optimal control

In Chapter 3 (p. 33), a stringent mathematical notation was developed and used to deduce the transport segment independent mathematical model of the receiver buffer system. This chapter will use both the notation and the mathematical model from Chapter 3 to deduce the optimal controller, given certain optimization criteria. The mathematical model from Chapter 3 is valid only in the linear area of our system, i.e. when there is some media in the receiver buffers. This is treated in Section 4.1.

Figure 4.1 shows that the essential part of this chapter contains Sections 4.1 and 4.7. The first part of Section 4.1 (i.e. Section 4.1.1) describes when the optimal controller will work and not, and thus motivates the need for the anti-run-dry algorithm developed in Chapter 6. Section 4.7 (p. 89) shows how the optimal controller works together with the total system, and also gives a summary of the main equations of the chapter.

To get a deeper understanding of the optimal controller without having to read many equations, Section 4.3 (p. 76), which finds the optimization criteria to use should be read.

In the more mathematical part of the chapter, we will use optimal control theory to find the optimal controller. We therefore give a brief introduction into optimal control theory in Section 4.2 (p. 75). Although this introduction is mainly meant for readers not familiar with optimal control theory, other mathematically interested readers should at least skim through the equa-tions of Section 4.2, to get to know the notation used for optimal control theory in this thesis. Section 4.5 (p. 80) uses the optimization criteria from Section 4.3 to find the set point vector and

Chapter 4 Optimal control

72

the Riccati weight matrixes. By using the results from Section 4.5, Section 4.6 (p. 81) deduces the optimal controller, and describes a few implementation guidelines.

4.1 The non-linearity of our system

4.1.1 Running dry

The state space model of our system, Equation (3.29) on page 50, shows a linear system, where

, the amount of media in the receiver buffers, can have both positive and negative values.

However, our system is non-linear because our receiver buffers can only contain positive (or zero) amounts of media. As shown in Figure 4.2, when our buffer runs dry, the buffer level

decreases until it stops at , and rPLR will suddenly drop to zero, and jump back


4.1 The non-linearity of our system 4.1.1 Running dry

4.2 Introduction into optimal control theory

4.3 Optimization criteria

4.4 Different phases of optimal control of playoutspeed

4.5 Finding the set point vector and the Riccati weight matrixes

4.6 Finding the optimal controller

4.7 Summary

MRCV

MPBPLR 0=

4.1 The non-linearity of our system

73

when the buffer again contains enough media for the playout to start. Our linear state space

model is valid only as long as .

We would like to find a statistically optimal controller for our non-linear system, but as far as we know, the control theory does not contain such controllers for non-linear systems of our kind. Therefore, the aim of this chapter is to find a controller that behaves optimally in the linear area of our system, i.e. when there are media-units in the receiver buffers. We need to use an exception when the buffer runs dry, e.g. by filling the buffer to a desired buffer level before the player starts playing again, resetting the controller, and then start playing at the speed that the optimal controller determines.

Since we would like the linear model to be valid almost all the time, we need a controller that gives a low probability of running dry. The anti-run-dry algorithm deduced in Chapter 6 (p. 103) lets the user control the probability that the buffer runs dry.

4.1.2 Delay spikes

Some transport segments experience infrequent, but large delay spikes. A delay spike means a long period (lasting up to several seconds) where the reciever does not receive media-units, and where the receiver buffer thus runs dry, after which all the media-units that should have arrived, suddenly arrives very fast, and if they are not discarded, they result in a high receiver buffer level.

Figure 4.2:Illustration: When the buffer runs dry, the playout speed drops to zero.

MPBPLR 0≥

Player rate [media-units/s] Buffer level [media-units]

Time


74

For applications and users that do not want to loose any information (such as video streaming and TV broadcast), our algorithm will let these media-units be played out when they arrive, possibly with a faster speed than normal, to reduce the buffering delay to a more normal level. This way, it may take some time to reduce the buffering delay, depending on the allowed devi-ation of the playoutspeed from rSNDR, which depends on the weight factors to be described in

this chapter. For other applications (such as interactive applications), users may prefer to loose some data (which can mathematically be seen as using an infinite playout speed) rather than a long period with a large delay. For such applications, we use MPB,MAX as a maximum level of

the playoutbuffer, which corresponds to the maximum delay accepted; if the buffer level exceeds this limit, loss of data is preferred. The playoutbuffer level for a case with no jitter, but one large delay spike, is illustrated in Figure 4.3.

Many playout control algorithms (such as Fixed Playout Delay and Adaptive Playout Delay) give each media-unit a deadline, and discard media-units that arrive after their deadline. One drawback of this approach is that many media-units may be discarded.

If the first data from the delay spike arrives, and fills the playoutbuffer at a level below MPB,MAX, algorithms using deadlines will discard these media-units, but our algorithm will let

them be played out. We have thus chosen to play these delayed media-units, instead of discarding them. If the data from the delay spike arrive at a rate that is low enough to never fill the playoutbuffer above MPB,MAX , we choose to let all the data be played. If however, the play-

outbuffer is filled above MPB,MAX , the oldest media-units are discarded (since they are regarded

as being played at an infinit speed from the output side of the FIFO playoutbuffer).

Figure 4.3:Illustration of using MPB,MAX to avoid a high buffer level after a delay spike

MPB,MAX

MPB

Packets lost

Time

4.2 Introduction to optimal control theory

75

4.2 Introduction to optimal control theory

A good state space model describes the real system quite accurate. For a system that is described by such a model, there is only one optimal way to control the system. As long as the state space model is linear, optimal control theory offers the optimal solution. All other solutions will be sub-optimal.

This section gives a very brief introduction into optimization theory. For more thorough infor-mation on optimal control theory, linear quadratic methods and the Riccati equation, two good textbooks are [1] and [6].

According to [6], the term optimal control refers to “systems that are designed to generate control variables that minimize or maximize a scalar objective functional that includes the process states and the control variables”. If the system to be controlled is linear and time-invar-iant, that is, if it can be written on the form:

, (4.1)

where v is a Gaussian white noise vector, and A, B and C are constant matrixes, and the objec-tive functional is a scalar quadratic function, this is a linear quadratic problem. Let us assume the system model of Equation (4.1) and the optimization criterion:

(4.2)

where x0 is a constant or slowly changing set point vector. The optimal control vector can then

be found by [6]:

(4.3)

where:

(4.4)

where:

(4.5)

Equation (4.5) is known as the Riccati matrix differential equation. According to [6], it can be

shown that when , = constant, and gives the stationary case:

x· Ax Bu Cv+ +=

J x τ( ) x0–( )TQ x τ( ) x0–( ) u τ( )TPu τ( )+( ) τd

t0

∞

=

u t( ) G t( ) x t( ) x0–( )=

G P 1– BTR t( )–=

R· RA– ATR– RBP 1– BTR Q–+=

t ∞→ R t( ) R→


76

(4.6)

The constant Riccati weight matrixes P and Q must satisfy the following [1]:• P must be positive definite

• Q must be non-negative definite

A guideline from [6] is to give the elements on the diagonals of the weight matrixes a magnitude relative to the expected (or nominally acceptable) value of the corresponding variable. For example:

(4.7)

where is a nominally acceptable variation relative to a reference of the state variable .


As described in Section 1.2 (p. 5), there are three requirements that must be met for the user to experience a high degree of quality of service:

A. The total delay of each media-unit should be as small as possible, and at the same time, the

buffer should not run dry.

B. The playout speed should be as close as possible to the correct media speed

C. The change in playout speed should be as slow as possible

The following three subsections will elaborate on each of these three requirements.

4.3.1 Requirement A: Minimize the total latency of each media-unit

Mathematically, the requirement to minimize the total latency can be written as:

(4.8)

RA ATR RBP 1– BTR– Q+ + 0=

q111

Δxi( )2---------------=

Δxi xi

Minimize λtotal m,( ) m∀,Minimize λTRS m, λVB m, λPB m, λPLR m,+ + +( ) m∀,Minimize λTRS m, λRCV m,+( ) m∀,

==


77

One of our assumptions was that cannot be controlled (explained in assumption

number 3 in Section 3.2 (p. 35)), hence to minimize the total latency of each bit, should

be minimized.

From Equation (3.21) on page 48, it can be deduced that , the latency that media-unit

m experiences in the receiver buffers, can be minimized in two ways:

1. By minimizing the integrand. This is done by minimizing , i.e. by maximizing

, the rate out of the player. In practice, this would mean to minimize the latency of

the last media-unit in the receiver FIFO queue by speeding up the playout of all the media-

units in the receiver buffers.

2. By minimizing the length of the integral. This is done by minimizing for

all . In practice, this would mean to minimize the latency of the last media-unit in

the receiver FIFO queue by making sure that the buffer level is as low as possible.

We cannot maximise , since this will be a direct conflict with requirement B, that

should be as close as possible to . Thus, , the length of the inte-

gral, needs to be minimized. If the desired level of the playoutbuffer had been set to

zero, this would mean to minimize the first state of our state vector, .

However, by minimizing , the buffer would run dry very frequently. Therefore, we

choose the desired buffer level to be a positive value that gives an acceptable run-dry-

probability. In Chapter 6 (p. 103), will be dynamically controlled by an anti-run-dry

mechanism that takes the acceptable run-dry-probability as an input from the user.

Thus, the requirement from this section is to minimize the absolute value of the difference

between and , i.e. to:

(4.9)

λTRS m,

λRCV m,

λRCV m,

1 rPLR t( )( )⁄

rPLR t( )

MRCV tTRS m,( )

tTRS m,

rPLR t( )

rPLR t( ) rSNDR MRCV tTRS m,( )

MRCV d,

x1 MRCV=

MRCV

MRCV d,

MRCV d,

x1 MRCV= MRCV d,

minimize x1 MRCV d,–( ) tTRS 1, tTRS last,( , )∀,


78

4.3.2 Requirement B: Keep the player speed close to the correct media speed

Mathematically, the requirement to keep the player speed as close as possible to the correct media speed, can be written as:

(4.10)

where is the playout time of the first media-unit and is the playout time of the

last media-unit in the stream. Since the second state in our system state vector is

, this means to minimize the absolute value of our second state, i.e. to:

(4.11)

4.3.3 Requirement C: Minimize the rate of change of playout speed

Mathematically, the requirement to minimize the rate of change of the player speed can be written as:

(4.12)

Since our control variable is , this means to minimize the absolute value of the

control variable, i.e. to:

(4.13)

4.4 Different phases of optimal control of playout speed

We use three phases for the optimal control of the playout speed of a media transfer:• The start-up period, described in Section 4.4.1 (this is also used when the buffer runs dry).

• The main period, when the optimal controller is used. This is described in Section 4.4.3.

• The stopping period, described in Section 4.4.2.

Minimize rPLR t( ) rSNDR–( ) t tPLR 1, tPLR last,( , )∈( )∀,

tPLR 1, tlast PLR,

x2 rPLR rSNDR–=

minimize x2 t( )( ) t tPLR 1, tPLR last,( , )∈( )∀,

MinimizedrPLR t( )

dt----------------------� �

� � t tPLR 1, tPLR last,( , )∈( )∀,

u r·PLR=

minimize u t( )( ) tPLR 1, tPLR last,( , )∀,

4.4 Different phases of optimal control of playout speed

79

4.4.1 Start-up/initialization procedure for the optimal controller

To function properly, the system needs a good initialization procedure, to be used at the start of the transfer. To avoid start-up problems after the buffer has run dry, the initialization procedure should also be run each time the buffer runs dry.

We recommend the following initialization procedure:

• Set the playout rate to zero, and keep it equal to zero until the media amount in the

receiver buffers reach a certain limit, e.g. . This way, the risk that the buffer will

run dry in the near future is lowered, and more important, we avoid the chance of running

dry several times in a row.

• When the media amount in the receiver buffers reaches the limit, should be set equal

to , and then the optimal controller can be started.

The transport segment state estimator (explained in Chapter 5 (p. 91)) should run as usual during

the initialization procedure and during run-dry periods, so that all states in are updated.

4.4.2 Stopping procedure

When the last media-unit has arrived from the transport segment (at ) into the playout-

buffer, requirement A can no longer be used, since we no longer want the buffer level to be close

to . Instead, the media in the receiver buffers should be played at the correct media

speed.

Thus, we recommend the following stopping procedure:

• At the first timestep after , set the playout rate to the correct media speed

, and keep it equal to until the last media-unit has been played from the

player.

• After the last media-unit has been played out, the playout rate drops to zero, since

there is no more media to play.

rPLR

MRCV d,

rPLR

rSNDR

xTRS

tTRS last,

MRCV d,

tTRS last, rPLR

rSNDR rSNDR

rPLR


80

4.4.3 Timeperiod for optimal control

We will use the start-up and stopping procedures described in Sections 4.4.1 and 4.4.2, respec-tively. The controller will be started right after the initialization procedure and stopped right

before the stopping procedure. The controller is thus used in the timeperiod .

Since the media stream is started and stopped outside the controller, there is no need to use boundary conditions (explained in section 2.11 of [6]) for the controller.

4.5 Finding the set point vector and the Riccati weight matrixes

This section finds the set point vector and the Riccati weight matrixes Q and P for our system.

Since all the minimization timeperiods from Section 4.3 (p. 76) cover the controller time period

, and the start-up and stopping procedures are taken care of outside the

controller time period, the three requirements from Section 4.3 can be written as:

•

•

•

These three requirements result in conflicting demands to the playout speed and to the amount of media in the playoutbuffer. Thus, the user or application programmer should state the impor-tance of each of the three demands. Different applications (and for some applications, different users of the application) will weight the importance of each of these requirements in different ways. To let the optimal controller reflect this, we let the user or application programmer give us the following three weight factors:

• w1: The importance of keeping the buffer level close to the target buffer level, i.e. the

importance of minimizing .

• w2: The importance of keeping the playout speed close to the correct media speed, i.e. the

importance of minimizing .

• w3: The importance of changing the playout rate as slowly as possible, i.e. the importance of

minimizing .

tPLR 1, tTRS last,( , )

tPLR 1, tTRS last,( , )

minimize x1 MRCV d,–( ) t∀,

minimize x2 t( )( ) t∀,

minimize u t( )( ) t∀,

x1 MRCV d,–

x2 t( )

u t( )


81

By using Equation (4.2) on page 75, the optimization criterion can be written as:

(4.14)

where the set point vector is . Since in our case, u is a scalar, not a vector, the

relation can be used. Thus, in our case, we can use the Riccati weight

matrixes:

and (4.15)

where:

(4.16)


Section 4.2 (p. 75) gave an introduction into optimal control theory and Section 4.5 found the set point vector and the Riccati weight matrixes that we will use. This section will use the results from these two sections, together with the notation from Chapter 3 (p. 33) to deduce the optimal controller for our system.

J

x τ( ) x0–( )T

w1 0 01 nTRS×

0 w2 01 nTRS×

0nTRS 1× 0nTRS 1× 0nTRS nTRS×

x τ( ) x0–( ) u τ( )Tw3u τ( )+

� ��

τd

t0

∞

=

x0

MRCV d,

00nTRS 1×

=

uTPu pu2 w3u2= =

P p w3= = QQRCV 01 nTRS×

0nNW 1× 0nTRS nTRS×=

QRCVw1 0

0 w2=


82

This section will use the state space model of the total system, given in Equation (3.29) on page 50.

By using Equations (4.3) on page 75 and (4.4) on page 75, the optimal control of our system can be expressed as:

(4.17)

since in our case, is a scalar. By inserting the matrix B from Equation (3.31) on page 51,

and the state vector from Equation (3.31) on page 51, Equation (4.17) can be written as:

(4.18)

where:

, (4.19)

(4.20)

and

(4.21)

Equation (4.18) can now be written as:

(4.22)

u t( ) 1p---BTR x t( ) x0–( )–=

P p=

x t( )

u 1p--- 0 1 01 nTRS×

r11 r12 r1Br12 r22 r2B

r1BT r2B

T RBB

MRCVrPLR rSNDR–

xTRS

MRCV d,

00nTRS 1×

–

� ��

⋅

� ��

–=

r1B r13 … r1 nTRS 2+( )=

r2B r23 … r2 nTRS 2+( )=

RBB

r33 … r3 nTRS 2+( )

:· … :·

r nTRS 2+( )3 … r nTRS 2+( ) nTRS 2+( )

=

u 1p--- r12 MRCV MRCV d,–( ) r22 rPLR rSNDR–( ) r2BxTRS+ +( )–=


83

Thus, it is not necessary to calculate the entire matrix R, only row number 2, including ,

and , needs to be calculated.

By using Equation (4.6) on page 76, our Riccati equation can be written as:

(4.23)

since in our case, is a scalar.

In the rest of this section, we will first insert our system matrixes into Equation (4.23) to divide it into four matrix equations of a smaller dimension than Equation (4.23). These smaller dimen-

sion matrix equations will then be used to find expressions for , and , and to find

the equation for the optimal controller.

To calculate Equation (4.23) for our system, the system matrix A (given by Equation (3.33)) and the control matrix B (given by Equation (3.31)) needs to be inserted. For readability, each of the

four terms of Equation (4.23) will be treated separately. First, an expression for is calcu-lated:

(4.24)

Where:

, (4.25)

(4.26)

and

. (4.27)

r12 r22

r2B

RA ATR RB 1p---� �� BTR– Q+ + 0 nTRS 2+( ) nTRS 2+( )×=

P p=

r12 r22 r2B

RA

RAR1 R2

R2T R4

A1 RCV, A2 RCV,

0nTRS 2× ATRS

R1A1 RCV , R1A2 RCV, R2ATRS+

R2TA1 RCV , R2

TA2 RCV, R4ATRS+

=

=

R1r11 r12r12 r22

=

R2r1Br2B

=

R4 RBB=


84

is now easily found by:

(4.28)

Next, an expression for is calculated:

(4.29)

where:

. (4.30)

The last term in the Riccati equation, Q, is given by Equation (4.15) on page 81 as:

.

Inserting Equations (4.24), (4.28), (4.29) and (4.15) into Equation (4.23), gives the following 4 equations:

(4.31)

(4.32)

(4.33)

(4.34)

ATR

ATR RA( )T A1 RCV,T R1 A1 RCV,

T R2

A2 RCV,T R1 ANW

T R2T + A2 RCV,

T R2 ATRST R4+

= =

RB 1p---� �� BTR

RB 1p---� �� BTR

R1 R2

R2T R4

B10nTRS 1×

1p--- B1

T 01 nTRS×

R1 R2

R2T R4

⋅ ⋅

1p---

R1B1B1TR1 R1B1B1

TR2

R2TB1B1

TR1 R2TB1B1

TR2

⋅

=

=

B101

=

QQRCV 01 nTRS×

0nTRS 1× 0nTRS nTRS×=

R1A1 RCV, A1 RCV,T R1

1p---R1B1B1

TR1– QRCV+ + 02 2×=

R1A2 RCV, R2ATRS A1 RCV,T R2

1p---R1B1B1

TR2–+ + 02 nTRS×=

R2TA1 RCV, A2 RCV,

T R1 ATRST R2

T 1p---R2

TB1B1TR1–+ + 0nTRS 2×=

R2TA2 RCV, R4ATRS A2 RCV,

T R2 ATRST R4

1p---R2

TB1B1TR2–+ + + 0nTRS nTRS×=


85

Since the Riccati matrix R is symmetric, Equation (4.33) is the transposed of Equation (4.32). Thus, the above four equations represent only 3 different equations.

First, Equation (4.31) is solved:

(4.35)

Which gives:

(4.36)

Element (1,1) of Equation (4.36) gives:

, (4.37)

element (2,2) gives:

(4.38)

and element (1,2) (equal to element (2,1)) gives:

(4.39)

From Equation (4.37), can be positive or negative. Its sign is chosen by looking at Equation

(4.22) on page 82: if is positive, is negative (since the weight factors are always

positive), and a large value of (the amount of media in the receiver buffers

minus the desired amount) will give a smaller . This means that when the buffer

content is larger than desired, the playout rate from the buffer is decreased, and thus the buffer

r11 r12r12 r22

0 1–0 0

0 01– 0

r11 r12r12 r22

1w3------

r11 r12r12 r22

01

0 1r11 r12r12 r22

–

w1 0

0 w2

+

+ 02 2×=

1w3------r12

2– w1+ r11– 1w3------r12r22–

r11– 1w3------r12r22 – 2r12– 1

w3------r22

2– w2+

0 00 0

=

r122 w3w1=

r222 w3 w2 2r12–( )=

r111

w3------r12r22–=

r12

r12 r12 w3⁄–

MRCV MRCV d,–( )

u r·PLR=


86

content gets even larger. This is contrary to what we want. Therefore, we choose to be nega-

tive:

(4.40)

Like , can also be positive or negative, and its sign will be chosen by looking at Equation

(4.22) on page 82: if is positive, is negative, and a positive value of

will decrease the value of . This means that when the rate out

of the player is too large, the derivative of the playout rate should be decreased. This is just what

we want, so we choose to be positive:

(4.41)

By inserting Equations (4.40) and (4.41) into (4.39), an expression for is found:

(4.42)

Next, Equation (4.32) is solved:

(4.43)

which gives:

(4.44)

r12

r12 w3w1–=

r12 r22

r22 r22 w3⁄–

x2 rPLR rSNDR–( )= u r·PLR=

r22

r22 w3 w2 2r12–( )

w3 w2 2 w3w1+( )

=

=

r11

r111

w3------r12r22–

w1 w2 2 w3w1+

=

=

r11 r12r12 r22

1 01 nTRS 1–( )×

0 01 nTRS 1–( )×

r1Br2B

ATRS0 01– 0

r1Br2B

1w3------

r11 r12r12 r22

01

0 1r1Br2B

–

+ +

02 nTRS×=

r11 01 nTRS 1–( )× r1BATRS1

w3------r12r2B–+

r12 01 nTRS 1–( )× r2BATRS r1B– 1w3------r22r2B–+

01 nTRS×

01 nTRS×=


87

We can solve from the bottom equation of Equation (4.44):

(4.45)

Inserting Equation (4.45) into the top equation of Equation (4.44), gives:

(4.46)

From the above equation, an expression for can be found:

(4.47)

where , and are given by Equations (4.40)-(4.42).

To sum up, the optimal controller is given by the equation:

(4.48)

where the gain vector G is given by:

(4.49)

and:

r1B

r1B r12 01 nTRS 1–( )× r2BATRS1

w3------r22r2B–+=

r11 01 nTRS 1–( )×1

w3------r12r2B–

r12 01 nTRS 1–( )× r2BATRS1

w3------r22r2B–+� �

� �ATRS

+

01 nTRS×=

r2B

r2B w3 r11 01 nTRS 1–( )× r12 01 nTRS 1–( )× ATRS+� ��

r12InTRS nTRS× w3ATRS2– r22ATRS+� �

� � 1–⋅

=

r11 r12 r22

u t( ) G t( ) x t( ) x0–( )=

G g1 g2 gTRS

r– 12w3

----------r– 22w3

---------- r2B1–

w3------⋅� �

� �

=

=


88

(4.50)

where is given by Equation (4.40) on page 86, is given by Equation (4.41) on page 86,

and is given by Equation (4.47) on page 87.

The essence of the optimal controller in equation (4.48) is that the time derivative of the player rate (i.e u(t)) is set to a value that is dependent upon the three weight factors (given by G(t)), the difference between the desired and the actual receiver buffer level, the difference between the player rate and the correct media speed, and the transport segment state vector.

Since is known, the first state of Equation (4.50) can be estimated by measuring

and estimating . The second state is known, since and control is

known. The last states, represented by the transport segment state vector must be esti-

mated. The estimation procedure for the transport segment estimator is deduced in Chapter 5 (p. 91).

x t( ) x0–( )

MRCV MRCV d,–

rPLR rSNDR–

xTRS

=

r12 r22

r2B

MRCV d,

MPBPLR MVB rSNDR rPLR

xTRS

4.7 Summary

89

4.7 Summary

The total system with the optimal controller is illustrated in Figure 4.4. The Kalman filter, which will be used to estimate the transport segment state vector, is treated in Chapter 5 (p. 91).

The optimal controller is given by , where

,where , and

,

where .

The weight factors w1, w2 and w3 are given by:

• w1: the importance of minimizing



Figure 4.4:Total system with optimal controller

Receiver

Optimal controller

rPLR(t) )(~ trTRS

( ) 321, ,,, wwwtM dRCV

)(ˆ tNWxuser

Sender

Kalman filter

( )tM PBPLR

Transport segment

u t( ) G t( ) x t( ) x0–( )=

Gr– 12w3

----------r– 22w3

---------- r2B1–

w3------⋅� �

� �= r12 w3w1–= r22 w3 w2 2 w3w1+( )=

r2B w3 r11 01 nTRS 1–( )× r12 01 nTRS 1–( )× ATRS+� ��

r12InTRS nTRS× w3ATRS2– r22ATRS+� �

� � 1–⋅

=

r111

w3------r12r22–=

MRCV t( ) MRCV d,–

rPLR t( ) rSNDR–

r·PLR t( )


90

5.1 Introduction to Kalman filter theory

91

5 Transport segment state vector estimation

As was shown in Figure 4.4 (p. 89), the optimal controller deduced in Chapter 4 needs an esti-mate of the transport segment state vector. This chapter will find a transport segment state vector estimator, where the main component is a Kalman filter.

As shown in Figure 5.1, the essence of this chapter is Section 5.2 (p. 94), which describes the main Kalman filter equations that will be used for transport segment estimation.

Section 5.1 gives a brief introduction into Kalman filtering. Although this introduction is mainly meant for readers not familiar with Kalman filtering, all mathematically interested readers should at least skim through the equations of Section 5.1, to get to know the notation used for Kalman filters in this thesis. Section 5.3 (p. 96) discusses different timing issues for the state vector estimation, and Section 5.4 (p. 101) describes one way to find the transport segment model.


This section gives a brief introduction into Kalman filter theory. For more thorough information on Kalman filter theory, two good textbooks are [12] and [6].

A Kalman filter is a recursive algorithm for estimating states in a system. Examples of states are media rate and buffer level for our system, or pH-value and temperature for a chemical process.

A Kalman filter utilizes two sorts of information:• Measurements from relevant sensors


5.1 Introduction to Kalman Filter theory

5.2 Main equations of the transport segment state vector estimation

5.3 Timing

5.4 Finding the transport segment model

Chapter 5. Transport segment state vector estimation

92

• A mathematical model of the system (describing how the different states depend on each

other, and how the measurements depend on the states)

The Kalman filter also needs to know the accuracy of the measurements and the model.

5.1.1 Non-mathematical description of the Kalman filter

The steps of the recursive Kalman filter algorithm when starting at t0 are:

1. Initialization: At t0, an initial estimate and its uncertainty (covariance matrix) are used to

initialize the Kalman filter. The initial estimate is provided from outside the Kalman filter. After initialization, the Kalman filter itself produces all estimates and uncertainties.

2. Prediction: The mathematical model (a state space model) and the initial estimate are used to predict a new estimate valid at t1. The initial uncertainty and the accuracy of the model

(i.e. the process noise) are used to calculate the uncertainty of this predicted estimate. The uncertainty is increased during prediction - the longer the prediction step (t1- t0) is, the

greater the uncertainty increase is.3. Measurement update: Measurements valid at t1 give new information about the states.

Based on the accuracy of the measurements (measurement noise) and the uncertainty in the predicted estimate, the two sources of information are weighted and a new updated estimate valid at t1 is calculated. The uncertainty of this estimate is calculated based on the on the

measurement noise and the uncertainty in the predicted estimate. A measurement update leads to decreased uncertainty - the smaller the measurement noise is, the greater the uncer-tainty reduction is.

4. Prediction: At t2 a new estimate is predicted as in step 2, but now based on the updated

estimate from t1.

5.

The prediction and the following measurement update are repeated each time a new measure-ment arrives. If the models are correct, the Kalman filter will deliver optimal estimates (i.e. it minimizes the variance of the estimation error expressed as the trace of the covariance matrix [6]).

5.1.2 Mathematical description of the Kalman filter

We assume a general discrete state space system (for our system, this is given by Equation (3.74)):


93

(5.1)

, (5.2)

where the general discrete state space model in equation (5.1) can be found by the discretization

described in Section 3.5.6 (p. 65), and is the measurement vector, is the measurement

matrix and is the measurement noise.

For the state space system described by equations (5.1) and (5.2), the discrete Kalman filter equations are:

• Initialization (i.e. the initial estimate, and its covariance matrix):

(5.3)

(5.4)

• Prediction:

(5.5)

(5.6)

• The Kalman filter gain matrix K:

(5.7)

• Measurement update:

(5.8)

(5.9)

where is the covariance matrix of , is the covariance matrix of the measurement noise

wk and Vk is the covariance matrix of the discrete process noise vk.

xk 1+ Φkxk Λkuk Γkvk+ +=

yk Dkxk wk+=

yk Dk

wk

x0 E x0( )=

X0 E x0 x0–( ) x0 x0–( )T[ ]=

xk 1+ Φkxk Λkuk+=

Xk 1+ ΦkXkΦkT ΓkVkΓk

T+=

Kk XkDkT DkXkDk

T Wk+( )1–

=

xk xk Kk yk Dkxk–( )+=

Xk I KkDk–( )Xk=

Xk xk Wk


94

For timesteps with no measurements, the measurement matrix Dk will be a zero matrix

( ), and thus the measurement update Equations (5.8)-(5.9) will not influence the state

vector estimate and its covariance matrix. Only the prediction Equations (5.5)-(5.6) will change the state vector estimate and its covariance matrix. For such timesteps, the Kalman filter will work as an optimal predictor.

5.2 Main equations of the transport segment state vector estimator

As can be seen from Figure 4.4 (p. 89), the Kalman filter should find an optimal estimate of the

transport segment state vector . Using the transport segment state space model from

Equation (3.26) on page 49 in the Kalman filter gives:

(5.10)

Since the Kalman filter uses a discrete state space model, Equation (5.10) needs to be discre-tized. By using the discretization equations from Section 3.5.6 (p. 65), the discrete version of Equation (5.10) can be written as:

(5.11)

where: , , and the covariance matrix of is:

(5.12)

where is the spectral density matrix of the continuous white noise vector .

Usually, a measurement of that is valid at the exact same point in time as the timestep

is not available. However, to find the Kalman filter equations of our system, we first assume that

we receive a measurement of at timestep k, and go into the details of the timing in Section

5.3.

We can use our knowledge of the correct media speed to find a measurement of the first

state of the transport segment state vector (given in Equation (3.25) on page 49) by using the equation:

(5.13)

Dk 0=

xTRS k,


xTRS k, 1+ ΦTRS k, xTRS k, ΓTRSvTRS k,+=

ΦTRS k, I ATRShk+≈ ΓTRS CTRS= vTRS k,

VTRS k, hkVTRS tk( )≈

VTRS t( ) vTRS t( )

rTRS m,

rTRS

rSNDR

yTRS tk( ) rTRS tk( ) rSNDR–=

5.2 Main equations of the transport segment state vector estimator

95

Thus, the measurement equation becomes:

(5.14)

where:

(5.15)

However, since the measurements are not received at the points of time of the timesteps and since the measurements are delayed in time when they are received, the Kalman filter needs some adjustments, as described in Section 5.3.

By using Equations (5.11) and (5.14) together with the general Kalman filter equations from Section 5.1, the following Kalman filter equations to use for our system can be found:

• Initialization (i.e. the initial estimate, and its covariance matrix):

(5.16)

(5.17)

• Prediction:

(5.18)

(5.19)

since is a constant matrix in our system.• The Kalman filter gain matrix KTRS,k:

(5.20)

• Measurement update:

(5.21)

(5.22)

yTRS k, DTRSxTRS k, wTRS k,+=

DTRS 1 01 nTRS 1–( )×=

xTRS 0, E xTRS 0,( )=

XTRS 0, E xTRS 0, xTRS 0,–( ) xTRS 0, xTRS 0,–( )T[ ]=

xTRS k, 1+ ΦTRS k, xTRS k,=

XTRS k, 1+ ΦTRS k, XTRS k, ΦTRS k,T ΓTRSVTRS k, ΓTRS

T+=

ΓTRS

KTRS k, XTRS k, DTRS k,T DTRS k, XTRS k, DTRS k,

T WTRS k,+( )1–

=

xTRS k, xTRS k, KTRS k, yk DTRS k, xTRS k,–( )+=

XTRS k, I KTRS k, DTRS k,–( )XTRS k,=


96

In the Kalman filter Equations (5.20)-(5.19) for our system, XTRS is the covariance matrix of

xTRS, is the covariance matrix of the measurement noise wTRS,k and VTRS,k is the covar-

iance matrix of the discrete process noise vTRS,k.

Before the first measurement update of the Kalman filter, the state vector needs to be initialized

with as correct values as possible. If there is a large uncertainty in the initial state vector ,

this should be reflected by having large values in its covariance matrix . If this is the

case, the Kalman filter might not have acceptable accuracy of its estimates until it has received a few of the first measurements.

5.3 Timing

This section will discuss some timing related issues for the Kalman filter.

Section 5.3.1 explains how to solve the fact that most of our measurements are valid at other points in time than at the timesteps.Section 5.3.2 explains how the prediction part of the Kalman filter can be used as an optimal predictor.

5.3.1 How to deal with late measurements

The optimal controller needs an estimate of the transport segment state vector and its covariance

matrix at each timestep. If all measurements of had been valid at the exact same points in

time as the timesteps, and a new measurement had been received at each timestep, the Kalman

WTRS k,

xTRS 0,

XTRS 0,

rTRS

5.3 Timing

97

filter would be run as shown in Figure 5.2, where a wait-box is a symbol for the waiting time between each timestep.

In a real implementation, however, one of the following situations can occur at timestep k:• No media-units have arrived since timestep k-1, thus there is no new measurements. This is

discussed in Section 5.3.1.1.

• At least one new media-unit has arrived since timestep k-1. This is discussed in Section

5.3.1.2.

5.3.1.1 No media-units have arrived since last timestep

If no media-units have arrived between timestep k-1 and timestep k, a Kalman filter prediction is used to predict the values at time tk. The Kalman filter estimates at time tk-1 are used as input

Figure 5.2:Kalman filter implementation for the ideal world

Kalman filter prediction(Equations (5.21) - (5.22))

Kalman filter update(Equations (5.19) – (5.20))

( )kTRS tr~

wait

+ _

wait

( )kTRS tx

( )kTRS tX

Running Kalman filter

( )kTRS tx

( )kTRS tx

( )kTRS tX

( )1ˆ −kTRS tx

( )1ˆ

−kTRS tX

( )kTRS tX

tk

tk-1

hk = tk – tk-1


98

to the prediction equations, as shown in Figure 5.3. This is the same as incrementing the Kalman filter with no measurements available.

5.3.1.2 At least one media-unit has arrived since last timestep

When a new media-unit, numbered m, arrives from the transport segment, it is timestamped with tTRS,m. Its rate, rTRS,m is measured as shown in Section 3.5.4 (p. 54), and timestamped with

, where , since the timestamp is placed halfway between the arrival

times of packet number m-1 and packet number m (see equations (3.40) and (3.41) on page 55). An example of the relation between the rate measurement time, the media-unit timestamps and the time of the timesteps is illustrated in Figure 5.4.

As described in Section 3.5.3.2 (p. 53), and illustrated in Figure 5.4, the timestamps of media-units can be at other points in time than the timesteps.

If a media-unit arrives between timesteps k-1 and k, its rate measurement is received at timestep

k. However, as illustrated in Figure 5.4, the time of the media-unit rate measurement

may be several timesteps old.

To find optimal estimates of the state vector and its covariance matrix at time tk in this situation,

there is a need to go through three steps, as illustrated in Figure 5.5.

Figure 5.3:Kalman filter to use when no media-units arrived since last timestep

Figure 5.4: timesteps, media-unit arrivals and rate measurements


( )kTRS tx ( )kTRS tX

hk = tk – tk-1

( )1−kTRS tX

trTRS m,trTRS m,

tTRS m,<

tk-8 tk-7

tTRS,m-2 tTRS,m-1 tTRS,m

1, −mTRSrt

tk-9 tk-10 tk-6 tk-5 tk-4 tk-3 tk-2 tk-1 tk

mTRSrt ,

timesteps:

media-units arriving:

timing of rate measurement:

trTRS m,

5.3 Timing

99

The current time is tk.

1. The previous measurement updated value, which is stored in memory (valid at )

is used as input to the Kalman filter prediction equations, and predicted until . If

necessary, this should be done in several steps, as described in Section 5.3.2.

2. A prediction at time exists. This is used together with the rate measurement as input

to the Kalman filter measurement update equations, to find measurement a updated value at

time .

3. As a last step, the measurement updated value (valid at ) is used as input to the

Kalman filter prediction equations, and predicted until . As with the previous prediction,

this should also be done in several steps if necessary (as described in Section 5.3.2).

The result is the optimal estimates at both and .

Figure 5.5:Illustration of the timing used when a media-unit arrived since last timestep

trTRS m 1–,

trTRS m,

trTRS m,

trTRS m,

trTRS m,

tk

tk-8 tk-7

tTRS,m-2 tTRS,m-1 tTRS,m

1, −mTRSrt

mTRSrt ,

tk-9 tk-10 tk-6 tk-5 tk-4 tk-3 tk-2 tk-1 tk

Kalman filterprediction

Kalman filterupdate

Kalman filter prediction

step 1 step 2 step 3

trTRS m,tk


100

The total filter algorithm to use when at least one media-unit has arrived since last timestep is shown in Figure 5.6.

5.3.2 The KF prediction used as an optimal predictor

If there is a long time between measurements, the prediction time period for the Kalman filter prediction can be too long compared to the dynamics of the mathematical model of the transport segment. In these situations, the equations used for discretization of the continuous state space equation may not be accurate enough, since the discretization error increases when the length of the timestep increases. One simple way to solve this problem is to divide the prediction time period in n parts, and run the Kalman filter prediction for each of these parts (with a duration

Figure 5.6:Filter to use if at least one media-unit arrived since last timestep


Kalman filter update (Equations (5.19) – (5.20))

( )mTRSrTRS tr

,~

mTRSrt ,

wait 1, −mTRSrt

1,, −−

mTRSmTRS rr tt_

wait

( )mTRSrTRS t

,x ( )

mTRSrTRS t,

X


tk

mTRSrk tt,

−

+

_

( )kTRS tx

( )

mTRSrTRS t,

x

( )mTRSrTRS t

,X

( )

1,ˆ

−mTRSrTRS tx

( )1,

ˆ−mTRSrTRS tX

( )kTRS tX


101

that is 1/n of the original duration), as shown in Figure 5.7. This is often referred to as a semi-continuous Kalman filter.


When using control theory to control a certain system, a common assumption is that there exists a state space model of the system. Control theory is widely used for real-life physical systems, and the real physical world is then modelled by a state space model. Also in our case, the assumption that there exists a state space model for the transport segment is essential to be able to find the optimal solution.

As stated in Section 2.5 (p. 31), the area of finding the transport segment model is an open area where we suggest future research. Still, we will need a transport segment model to use now (before any research has been done on finding transport segment models).

One way to find a transport segment model is to measure the arrival time of each media-unit, and save these measurements to a file. The transport segment rate can then be calculated by using the equations from Section 3.5.4 (p. 54). By using Matlab’s [29] System Identification

Figure 5.7:The prediction part of the Kalman filter used as an optimal predictor


wait

ikTRS +,x

ikTRS +,X

1, ++ ikTRSx

1, ++ ikTRSX

nkTRS +,x

nkTRS +,X

kTRS ,x

kTRS ,X

Optimal predictor


102

Toolbox, a fairly good state space model of the transport segment behaviour is automatically found.

103

6 Anti-run-dry algorithm

In Chapter 3 (p. 33), a theoretical foundation of the field of playoutbuffer control, with intro-duction of a stringent mathematical notation and a development of transport segment inde-pendent mathematical models, were given. If the playoutbuffer does not run dry, the optimal control algorithm developed in Chapter 4 (p. 71) gives an optimal performance based upon the user’s quality preferences.

No algorithm can guarantee not running dry without lowering the playout speed to an unaccept-able level (since periods without data might be very long). Thus, there will always be a certain probability that the buffer will run dry. None of the existing playoutbuffer control algorithms (including the optimal control algorithm) gives the user a direct control over the run-dry prob-ability. Thus, we aim at finding an anti-run-dry algorithm for the optimal control algorithm, that gives the user a controllable run-dry probability.

The paper “Anti-Run-Dry Algorithm for Optimal Control of Playoutbuffers“ in Appendix F (p. 291) can be read for a quick overview of the anti-run-dry algorithm. Note however, that for some of the variables, the paper uses a different notation than this thesis, and that the calcula-tions are performed in a slightly different way.

Another way to get an overview of the anti-run-dry algorithm without having to read all the equations in this chapter, is to read the rest of the introduction and Sections 6.1 (p. 107) and the summary in Section 6.8.

The optimal control algorithm deduced in Chapter 4 is dependent upon the desired buffer level MRCV,d given by the user. The user needs to be careful when choosing the value of MRCV,d to

use, because:• If MRCV,d is chosen as a too large value, the average buffer level will be larger than what is

needed, and thus, unnecessary delay is introduced.• If MRCV,d is chosen as a very low value, compared to the behaviour of the transport seg-

ment, the buffer can run dry too often. When the buffer runs dry, the playout speed drops to zero, as was illustrated in Figure 4.2 (p. 73), and thus the user’s perception of the quality is lowered.

Since it is a difficult task for the user to decide the best value of MRCV,d, we would like to auto-

mate this task, by making an algorithm that dynamically finds the best value of MRCV,d. Such

an algorithm could for instance have the run-dry-probability as its input value. This way, the

Chapter 6 Anti-run-dry algorithm

104

user gets an easier task; to decide the tolerable run-dry-probability, instead of having to decide the value of MRCV,d. A black box view of such an algorithm is illustrated in Figure 6.1.

There are several ways in which the run-dry-algorithm in Figure 6.1 can use MRCV,d to avoid

that the buffer runs dry. Four possible solutions are:1. Static control by keeping MRCV,d at a constant high value. Even though the desired buffer

level can be set to a lower value for the optimal controller than for traditional controllers (as can be seen in Chapter 8 (p. 163)), the desired level must have a quite high value to avoid that the buffer runs dry. Even though this will lead to a low run-dry-probability, it will intro-duce an unnecessary extra delay. This solution does not use any system information.

2. Dynamic control by increasing MRCV,d when the buffer content gets low. This solution can

give a lower run-dry-probability than alternative number 1, since it uses some of the system information, i.e. the buffer level. However, playout speed cannot be slowed down too much, or else the user will perceive the quality as too low (since user requirement B will not be fulfilled).

3. Using Model Predictive Control (MPC). MPC is described in the papers [40] and [43] and in the textbook [28]. MPC lets the user specify upper and lower limits within which the con-troller should try to keep the controlled variable. This may sound like a solution to the play-out speed control problem.

4. Dynamic control by predicting the future transport segment behaviour, and using this to pre-dict the future behaviour of the optimal controller, and thus predict the future buffer level. This way, a run-dry incident can be predicted before it happens, and thereby prevented. This means that MRCV,d is increased when a lower rate from the transport segment is expected in

the near future, and decreased when a higher rate from the transport segment is expected in the near future. This solution uses all available system information.

Since alternative number 1 introduces unnecessary delay, we do not choose it.

For alternative number 2, rTRS typically needs to be low for a while before the buffer level gets

low, and makes this algorithm increase MRCV,d to decrease rPLR. Alternatively, rTRS needs to be

high for a period before the buffer level gets high, and makes this algorithm decrease MRCV,d to

Figure 6.1:Black box view of an anti-run-dry algorithmthat dynamically decides the value of MRCV,d(t)

anti-run-dry-

algorithm

run-dry-probabiltiy MRCV,d(t) user

105

increase rPLR. Thus, the drawback of this solution is that the change of MRCV,d typically appears

a while after the change in rTRS.

Alternative number 3, Model Predictive Control, allows the user to set maximum and minimum limits that the controller should try to keep a state within. For our system, this would mean to set the minimum limit of the buffer level equal to zero or a small value. However, MPC does not give any guarantee for how often the controlled variable will go outside the limits, which in our case means that MPC cannot guarantee to keep the run-dry probability below a limit. The user cannot specify how often she/he would allow the controlled variable to go outside the limits. In addition, nonlinearities such as ours, where the playout speed drops to zero when the buffer runs dry, cannot be modelled by MPC.

For alternative number 4, the knowledge of the transport segment behaviour is used to predict the buffer level. This lets the value of MRCV,d be changed much earlier than for alternative

number 2, since the time the buffer level will reach a low value can now be predicted in advance. In cases of low predictability, the algorithm will return a higher MRCV,d to keep the run-dry

probability as low as specified. If MRCV,d is increased before the buffer level reaches the low

value, the buffer level does not get low as for alternative number 2. It is therefore possible to avoid that the buffer runs dry, without changing the playout speed (rPLR) as much as would be

needed for alternative number 2.


106

Solution number 4 is the statistically optimal solution, hence we choose to use this solution. The total system when using alternative number 4 is illustrated in Figure 6.2.

As shown in Figure 6.2, the anti-run-dry algorithm will work as an ‘outer’ controller, outside the inner system (with the optimal controller) found in Chapters 4 and 5, where this outer controller determines the optimal value of MRCV,d(t).

As shown in Figure 6.3, the essence of the chapter is Section 6.1, which gives an explanation of how we find the anti-run-dry mechanism and discusses the statistical limit of MRCV,d , and

Section 6.8 (p. 135), which gives a summary of the chapter.

A more detailed understanding of the anti-run-dry algorithm can be gained by reading Section 6.5 (p. 129), which presents and explains the equation for the minimum value of MRCV,d , and

Section 6.7 (p. 131), which discusses the loop that we use for the anti-run-dry algorithm, and explains why this loop is needed.

Sections 6.2 (p. 113), 6.3 (p. 124) and 6.4 (p. 125) calculate mathematical expressions for

, and , respectively. These expressions

are used in Section 6.5 (p. 129) to find the minimum value of MRCV,d. Sections 6.2 to 6.4 contain

Figure 6.2:System structure with anti-run-dry algorithmThe area with grey background is from Chapters 4 and 5, while the rest is the anti-run-dry algorithm added in this chapter.

Receiver

Optimal controller

Transport segment

state predictor

rPLR(t)

)(~ trTRS

( )tM dRCV ,

)(ˆ tTRSx

run-dry probability

user

Sender

anti-run-dry algorithm

Kalman filter

( )tM PBPLR

Future predictions of xTRS(t) and their covariances

Transport segment

MRCV k n+, MVB k n+, var MRCV k n+, MVB k n+,–( )

6.1 Finding the anti-run-dry algorithm

107

long mathematical deductions, and even though a large part of these deductions are placed in appendix, they can be skipped without losing much of the understanding of the chapter.

In Section 6.6 (p. 131) an optimal predictor needed to run the anti-run-dry algorithm is presented.


We will now find the anti-run-dry algorithm illustrated by the green box in Figures 6.1 (p. 104) and 6.2 (p. 106). The task of this algorithm is to find the dynamical minimum value of MRCV,d,

given that the run-dry-probability of the buffer is to be kept below a specified level.

As mentioned in Section 5.3.2 (p. 100), the Kalman filter can also (in addition to finding the best prediction of the present transport segment state) be used as an optimal predictor of the future behaviour of the transport segment, i.e. future values of rTRS(t) and of all the other states

in the transport segment state vector xTRS(t). Since our transport segment state space model



6.2 Finding the estimated receiver buffer level

6.3 Finding the estimated virtual buffer level

6.4 Finding the variance of the buffer estimation error

6.5 Finding the minimum value of MRCV,d

6.6 Optimal predictor for the anti-run-dry mechanism

6.7 Total anti-run-dry algorithm

6.8 Summary








6.8 Summary


108

contains some process noise (expressing the uncertainty in the model), the uncertainty of the prediction of rTRS(t) increases over time, as illustrated in Figure 6.4.

The optimal controller deduced in Chapter 4 is given by Equations (4.48), (4.49) and (4.50) on page 88. By using these equations, the optimal control equation can be written as:

(6.1)

By using our knowledge of the optimal controller and of the predicted future values of xTRS(t),

we can, for a specified value of MRCV,d, predict the future control values ( ) from the

optimal controller, and thereby predict future values of MRCV(t). The predicted values of rTRS(t)

can be used to predict the contents of the virtual buffer, MVB(t), and by subtracting this from the

prediction of MRCV(t), a predicted value of MPBPLR(t) can be found.

This is illustrated in Figure 6.5 for two different values of MRCV,d; MRCV,d = 1.5 media-units

(green line) and MRCV,d = 10 media-units (blue line). The influence that the desired buffer level

has on the actual buffer level is illustrated in the figure, where the predicted bufferlevel is higher for MRCV,d = 10 media-units than for MRCV,d = 1.5 media-units. Figure 6.5 also illustrates the

normal distribution of the prediction error. As illustrated in Figure 6.4, the uncertainty of the transport segment state vector prediction (due to vTRS) increases over time. Therefore, with a

given MRCV,d, the uncertainty of the prediction of MPBPLR also increases over time. Thus, the

Figure 6.4:The uncertainty of the transport segment rate prediction- illustration of uncertainty growth over time, due to the process noise.

Time

)1,1(TRSX = Uncertainty of TRSr

units/s

Present time

u g1 g2 gTRS

MRCV MRCV d,–

rPLR rSNDR–

xTRS

=

r·PLR


109

probability that the buffer will run dry gets higher the further into the future we predict. The growth of the prediction error variance over time is illustrated at two different timesteps; at the first timestep, the distribution is quite narrow, compared to the next timestep. The probability that the buffer will run dry at a particular point in time is equal to the area under the illustrated distribution curve that lies below the line given by MPBPLR = 0. As can be seen from Figure 6.5,

the probability that the buffer will run dry, is dependent upon time and upon MRCV,d.

An intuitive way to seek the anti-run-dry algorithm now, would be to try to find a symbolical expression for the area under the distribution curve that lies below the line given by

, by solving the integration symbolically. That way, one could hope to find an

invertible mathematical function of the form:

(6.2)

Unfortunately, the integration of the probability density distribution cannot be solved symboli-

cally, since it has an integrand of the form . This is further explained in Appendix D.2 (p. 275).

To find another way to solve our problem, we study the density of the normal (Gauss) distribu-tion, as shown in Figure 6.6. Even though a symbolical solution to an integral of a part of the Gauss distribution shown in the figure cannot be found, the area of selected parts of the total

Figure 6.5:Illustration: Prediction of MPBPLR and prediction error distribution

Time

MPBPLR [media-units]

Prediction of MPBPLR when MRCV,d = 10 media-units

Prediction of MPBPLR whenMRCV,d = 1.5 media-units

Distribution of the prediction error


This area = P(empty buffer)

3

10

Present time

MPBPLR 0=

MRCV d, f t vTRS t( ) run-dry-probability, ,( )=

ex2


110

area can be calculated numerically, and can also be found in tables, for instance in [20], page 91. Thus, for specified c (number of sigma) or probability, the following relation exists:

(6.3)

where is the mean value and is the standard deviation of the general variable x.

The symmetric (bell-shaped) form of the normal probability distribution, gives the relationship:

(6.4)

By numerical calculations (see Section E.1 (p. 282)) or by use of Gauss distribution tables (for instance the one in [20]) and Equation (6.4), the relation between c (number of sigma) and p (probability) in the general equation can be found:

(6.5)

Figure 6.6:Density of the normal distribution= E (x), = standard deviation of x.

P x μ cσ+<( ) probability=

μ σ

f(x)

μ

σμ −

σμ 2−

σμ 3−

34,13% of area

13.59% of area

2.15% of area

0.13% of area

x

μ σ

P x μ cσ+<( ) 1 P x μ cσ+>( )–1 P x μ cσ–<( )–

==

P x μ cσ–<( ) p=


111

where and σ is the standard deviation of x. Thus, by choosing a specific run-dry prob-ability p, we can find the c that fits into Equation (6.5). By combining the equations for erf and erfc functions (found in [1]) with the density function of the Gauss distribution, the following equation for p can be found:

(6.6)

If

(6.7)

then:

(6.8)

and thus, the probability that x is zero or negative is equal to or lower than p. Equation (6.8) can be used to conclude that if:

, (6.9)

then the probability that is equal to or lower than p. In Equation (6.9), t is a

future point in time where a prediction of MPBPLR exists, and where

and is the standard deviation of the prediction

error. According to Equation (6.9), to keep the run-dry-probability at time equal to or lower than p, the following relation must be fulfilled:

, (6.10)

Since our anti-run-dry algorithm will run in a computer, with discrete timesteps, a discrete version Equation (6.10) is needed: To keep the run-dry-probability at timestep k+n equal to or lower than p, the following relation must be fulfilled:

, (6.11)

where:

(6.12)

μ E x( )=

p P x μ cσ–<( ) 12---erfc c

2-------� �� 1

2--- 1 erf c

2-------� �� –� �

� �= = =

μ cσ– 0≥

p P x μ cσ–<( ) P x 0<( )≥=

μMPBPLR t( ) cσMPBPLR t( )– 0≥

MPBPLR t( ) 0<

μMPBPLR t( ) E MPBPLR t( )( )= σMPBPLR t( )

t τ+

μMPBPLR t τ+( ) cσMPBPLR t τ+( )– 0≥

μMPBPLR k n+,cσMPBPLR k n+,

– 0≥

τ tk n+ tk–=


112

Since and

, Equation (6.11) can be written as:

. (6.13)

Equation (6.13) can also be written as:

. (6.14)

Since:

, (6.15)


(6.16)

We would now like to use Equation (6.16) to find the maximum limit for MRCV,d, given that the

probability that the buffer runs dry is to be kept lower than or equal to p. If we find expressions

for , and that includes only MRCV,d and

values that are known or predicted at timestep k, we can find an expression for the maximum limit for MRCV,d to use at timestep k.

μMPBPLR k n+,MPBPLR k n+,=

σMPBPLR k n+,var MPBPLR k n+, MPBPLR–( )=

MPBPLR k n+, c var MPBPLR k n+, MPBPLR–( )≥

MRCV k n+, MVB k n+,– c var MPBPLR k n+, MPBPLR–( )≥

MRCV k n+, MVB k n+,– MRCV k n+, MVB k n+,–=

MRCV k n+, MVB k n+,– c var MPBPLR k n+, MPBPLR–( )≥

MRCV k n+, MVB k n+, var MPBPLR k n+, MPBPLR–( )


113

Expressions for , and are calculated in

Sections 6.2, 6.3 (p. 124) and 6.4 (p. 125), respectively. The results are combined in Section 6.5 (p. 129) to find the lower limit for MRCV,d. This is illustrated in Figure 6.7.


As shown in figure 6.7, we need to find in order to deduce the equation for the anti-

run-dry algorithm.

According to Equation (3.16) on page 47, our continuous time buffer equation is:

(6.17)


(6.18)

Changing the variables of integration gives:

Figure 6.7:Flow diagram for deduction of anti-run-dry algorithm

MRCV k n+, MVB k n+, var MPBPLR k n+, MPBPLR–( )

Find estimated MVB Section 6.3

Find first equation for minimum MRCV,d Section 6.1

Find estimated MRCV Section 6.2

Find estimation error variance of MPBPLR Section 6.4

Calculate detailed equation for minimum MRCV,d Section 6.5

MRCV k n+,

MRCV k n+,

M· RCV t( ) rTRS t( ) rPLR t( )–=

MRCV tk n+( ) MRCV tk( ) rTRS β( ) βd

tk

tk n+

rPLR β( ) βd

tk

tk n+

–+=


114

(6.19)

where is given by Equation (6.12) on page 111.

We now define a delta operator that works this way:

(6.20)

where ‘variable’ is the variable that the delta operator works on.

We now define three new variables that use this delta operator:

, (6.21)

(6.22)

and

(6.23)

For now, t in these equations are considered as a constant, equal to the “now“ point-in-time (where we try to find the minimum value of MRCV,d to make sure that the probability that the

buffer will run dry seconds from “now“ is less than p).

By inserting the new variables, Equation (6.19) becomes:

(6.24)

We would like to express as a function of variables that are known or easy to

predict by the end of timestep k. Since the equations for the optimal predictor are known, future values of MRCV and rPLR can be expressed as functions of current and future values of the states

in the transport segment state vector. The necessary calculations to find this expression is complex when done in the time domain, but can be much easier solved in the Laplace plane. Therefore, the main part of the rest of this section will use Laplace transformation theory. For an introduction into Laplace transformation theory, see Chapter 5 of [20].

MRCV tk n+( ) MRCV tk( )– rTRS tk β+( ) βd

0

τ

rPLR tk β+( ) βd

0

τ

–=

τ

Δvariable t τ,( ) variable t τ+( ) variable t( )–=

ΔMRCV t τ,( ) MRCV t τ+( ) MRCV t( )–=

ΔrPLR t τ,( ) rPLR t τ+( ) rPLR t( )–=

ΔrTRS t τ,( ) rTRS t τ+( ) rTRS t( )–=

τ

ΔMRCV tk τ,( ) ΔrTRS tk β,( ) βd

0

τ

ΔrPLR tk β,( ) βd

0

τ

– τ r⋅ TRS tk( ) τ– r⋅ PLR tk( )+=

ΔMRCV tk τ,( )


115

We transform Equation (6.24) to the Laplace plane, with (not tk) as the time operator (since

tk = ‘now’ is frozen in time for these calculations):

(6.25)

To save space, the deduction of is placed in Appendix D.1 (p. 273), where it is

deduced that:

(6.26)

where the constants g1 and g2, and the vector gTRS are defined by Equations (4.40), (4.41) on

page 86, (4.47) on page 87 and (4.49) on page 87.

Inserting Equation (6.26) into Equation (6.25), and placing the terms containing on the

left hand side, gives:

(6.27)

By introducing the variable:

(6.28)

and using the elementary calculations:

, (6.29)

and:

τ

ΔMRCV tk s,( )ΔrTRS tk s,( )

s------------------------------

ΔrPLR tk s,( )s

------------------------------–rTRS tk( )

s2--------------------

rPLR tk( )

s2---------------------–+=

ΔrPLR tk s,( )

ΔrPLR tk s,( ) 1s g2–-------------- g1ΔMRCV tk s,( )

g1s

-----MRCV tk( )g1s

-----MRCV d,–

g2s

-----rSNDR–g2s

-----rPLR tk( ) gTRSΔxTRS tk s,( ) 1s---gTRSxTRS tk( )

+

+ + +

�

�

�

�

=

ΔMRCV

ΔMRCV tk s,( ) 1g1

s s g2–( )---------------------+

� �� ΔrTRS tk s,( )

s------------------------------ 1

s s g2–( )---------------------

g1s

-----MRCV tk( )

g1s

-----MRCV d,–g2s

-----rSNDR– gTRSΔxTRS tk s,( ) 1s---gTRSxTRS tk( )+ +

�

�

�

�

–

rTRS tk( )

s2-------------------- rPLR tk( )

g2

s2 s g2–( )------------------------ 1

s2-----+

� ��

–+

=

l s( ) 1

s2 sg2– g1+-------------------------------=

1g1

s s g2–( )---------------------+

s2 sg2– g1+s s g2–( )

-------------------------------=


116

, (6.30)


(6.31)

By the use of calculations in the Laplace plane, we have now been able to express

as a function of variables that, except for the transport segment state vector, are

either easy to predict or known by the end of timestep k, or constants. The player rate

is known by the end of timestep k. The receiver buffer level can be calculated from

a sum of a measurement (of ) and a prediction (of ), and the rate out of

the transport segment, can be either measured or predicted. The terms that depend

upon the transport segment state vector, , and must be

predicted, either because they are not measurable or because they represent future values. The

constants are the correct media speed and the parts of the optimal control gain vector

, and . The desired buffer level is, for this anti-run-dry calculation,

constant in the time period .

We do an inverse Laplace transform of Equation (6.31) to get back to the time domain:

(6.32)

g2

s2 s g2–( )------------------------ 1

s2-----+ 1

s s g2–( )---------------------=

ΔMRCV tk s,( ) s l s( )ΔrTRS tk s,( )⋅ g2l s( )ΔrTRS tk s,( )–

l s( )s

--------- g1MRCV tk( ) g1MRCV d,– g2rSNDR– gTRSxTRS tk( )+( )–

l s( )gTRSΔxTRS tk s,( )– l s( )rTRS tk( ) l s( )s

---------g2rTRS tk( )– l s( )rPLR tk( )–+

=

ΔMRCV tk τ,( )

rPLR tk( )

MRCV tk( )

MPBPLR tk( ) MVB tk( )

rTRS tk( )

xTRS tk( ) ΔxTRS tk s,( ) ΔrTRS tk s,( )

rSNDR

g1 g2 gTRS MRCV d,

tk tk τ+ ],(

ΔMRCV tk τ,( ) ΔrTRS tk τ β–,( )βd

d l β( )( ) βd

0

τ

g2 l β( )ΔrTRS tk τ β–,( ) βd

0

τ

– l β( ) βd

0

τ

g1MRCV tk( ) g1MRCV d,–

g2rSNDR– gTRSxTRS tk( ) g2rTRS tk( )+ +

(

)

⋅�

�

�

�

�

�

�

�

–

gTRS l β( )ΔxTRS tk τ β–,( ) βd

0

τ

–

l τ( ) rTRS tk( ) rPLR tk( )–( )+

=


117

Since , the Laplace transformation of , can be written as (see Equation (6.28)):

(6.33)

where:

(6.34)

and

, (6.35)

can (according to [7], page 36) be written as:

. (6.36)

Inserting Equations (6.21)-(6.23) on page 114 (for the delta-variables) into Equation (6.32), gives:

(6.37)

Equation (6.36) gives the relation:

l s( ) l t( )

l s( ) 1

s2 sg2– g1+-------------------------------

1s a–( ) s b–( )

--------------------------------

=

=

a 12--- g2 g2

2 4g1–+� �� =

b 12--- g2 g2

2 4g1––� �� =

l τ( )

l τ( ) ebτ eaτ–b a–

---------------------=

MRCV tk τ+( ) MRCV tk( )– rTRS tk τ β–+( ) rTRS tk( )–( )βd

d l β( )( ) βd

0

τ

g2 l β( ) rTRS tk τ β–+( ) rTRS tk( )–( ) βd

0

τ

– l β( ) βd

0

τ

g1MRCV tk( ) g1MRCV d,–

g2rSNDR– gTRSxTRS tk( ) g2rTRS tk( )+ +

(

)

⋅�

�

�

�

�

�

�

�

–

gTRS l β( ) xTRS tk τ β–+( ) xTRS tk( )–( ) βd

0

τ

–

l τ( ) rTRS tk( ) rPLR tk( )–( )+

=


118

(6.38)

By using the above equation, introducing the vector:

(6.39)

and remembering that is the first state of the state vector xTRS, Equation (6.37)

can be written as:

(6.40)

Equation (6.36) gives the relation:

(6.41)

where the variable

(6.42)

is introduced to save space in the further equations.

We also find the relation:

βdd l β( )( )� �

� � βd

0

τ

l τ( ) l 0( )– l τ( )= =

gSUM gTRS g2 01 nTRS 1–( )×+=

rTRS t( ) rSNDR–

MRCV tk τ+( ) MRCV tk( )– rTRS tk τ β–+( )βd

d l β( )( )� �� βd

0

τ

rTRS tk( )l τ( )– gSUM l β( )xTRS tk τ β–+( ) βd

0

τ

–

l β( ) βd

0

τ

g1 MRCV tk( ) MRCV d,–( )⋅� ��

– l τ( ) rTRS tk( ) rPLR tk( )–( )+

=

l β( ) βd

0

τ

a ebτ 1–( ) b eaτ 1–( )–

ab b a–( )--------------------------------------------------------

q1 τ( )b a–( )

----------------

=

=

q1 τ( ) 1ab------ a ebτ 1–( ) b eaτ 1–( )–( )=


119

(6.43)

Inserting Equations (6.36), (6.41) and (6.43) into Equation (6.40), and letting two terms of

on the right hand side of Equation (6.40) cancel each other, gives:

(6.44)

By assuming that the timestep length is small compared to the dynamics of , we can do

the approximations:

(6.45)

and:

(6.46)

By using Equations (6.45) and (6.44), the discrete-time version of Equation (6.44) can be written as:

(6.47)

τdd l τ( ) bebτ aeaτ–

b a–----------------------------=

rTRS tk( )l τ( )

MRCV tk τ+( ) MRCV tk( ) rTRS tk τ β–+( ) bebβ aeaβ–b a–

-----------------------------� �� βd

0

τ

gSUMebβ eaβ–

b a–----------------------xTRS tk τ β–+( ) βd

0

τ

–

q1 τ( )b a–( )

---------------- g1 MRCV tk( ) MRCV d,–( )⋅� �� – ebτ eaτ–

b a–---------------------rPLR tk( )–

+=

xTRS t( )

f t( )xTRS t( ) td

hj

h j 1+( )

xTRS j, f t( ) td

hj

h j 1+( )

=

f t( )rTRS t( ) td

hj

h j 1+( )

rTRS j, f t( ) td

hj

h j 1+( )

=

MRCV k n+, MRCV k, rTRS k n i–+,bebβ aeaβ–

b a–-----------------------------� �� βd

hi

h i 1+( )

i 0=

n 1–

�

gSUM xTRS k n i–+,ebβ eaβ–

b a–---------------------- βd

hi

h i 1+( )

i 0=

n 1–

�–

q1 hn( )b a–( )

----------------- g1 MRCV k, MRCV d,–( )⋅� �� – ebhn eahn–

b a–---------------------------rPLR k,–

+=


120

where h is the expected/ideal length of a timestep1. Since:

(6.48)

where the variable:

(6.49)

is introduced to save space in the further equations, and:

(6.50)

where the variable:

(6.51)

is also introduced to save space in the further equations, Equation (6.47) can be written as:

(6.52)

Another way to write this is:

1. Since these timesteps occur in the future, we have no better prediction for the length of the timesteps than their ideal lengths.

ebβ eaβ–b a–

---------------------- βd

hi

h i 1+( )

q2 h i,( )

b a–( )------------------=

q2 h i,( ) 1ab------ aebhi ebh 1–( ) beahi eah 1–( )–( )=

bebβ aeaβ–b a–

-----------------------------� �� βd

hi

h i 1+( )

q3 h i,( )

b a–( )------------------=

q3 h i,( ) ebhi ebh 1–( ) eahi eah 1–( )–=

MRCV k n+, MRCV k,q3 h i,( )

b a–( )------------------r

TRS k n i–+,i 0=

n 1–

� gSUMq2 h i,( )

b a–( )------------------xTRS k n i–+,

i 0=

n 1–

�–

q1 hn( )b a–( )

----------------- g1 MRCV k, MRCV d,–( )⋅� �� – ebhn eahn–

b a–---------------------------rPLR k,–

+=


121

(6.53)

By introducing the vector e:

, (6.54)

the sum of terms 3 and 4 of Equation (6.53) can be written as:

(6.55)

Thus, Equation (6.53) can be written as:

(6.56)

where the term:

(6.57)

is introduced to save space in later calculations.

At a timestep k, our anti-run-dry algorithm must be run before the optimal controller, since its output MRCV,d will be an input to the optimal controller. Thus, when the anti-run-dry algorithm

is run at timestep k, rPLR,k has not yet been decided by the optimal controller. Thus, at the time

MRCV k n+, 1q1 hn( )b a–( )

-----------------g1–� �� MRCV k,

q1 hn( )b a–( )

----------------- g1MRCV d,⋅q3 h i,( )

b a–( )------------------r

TRS k n i–+,i 0=

n 1–

�

gSUMq2 h i,( )

b a–( )------------------xTRS k n i–+,

i 0=

n 1–

�– ebhn eahn–b a–

---------------------------rPLR k,–

+ +

=

e 1 01 nTRS 1–( )×=

q3 h i,( )b a–( )

------------------rTRS k n i–+,

i 0=

n 1–

� gSUMq2 h i,( )

b a–( )------------------xTRS k n i–+,

i 0=

n 1–

�–

rSNDRq3 h i,( )

b a–( )------------------

i 0=

n 1–

� 1b a–( )

---------------- q3 h i,( )e q2 h i,( )gSUM–( )xTRS k n i–+,

i 0=

n 1–

�+=

MRCV k n+, 1q1 hn( )b a–( )

-----------------g1–� �� MRCV k,

q1 hn( )b a–( )

----------------- g1MRCV d,⋅ rSNDRq3 h i,( )

b a–( )------------------

i 0=

n 1–

�

1b a–( )

---------------- q4 h i,( )xTRS k n i–+,

i 0=

n 1–

� ebhn eahn–b a–

---------------------------rPLR k,–

+ +

+

=

q4 h i,( ) q3 h i,( )e q2 h i,( )gSUM–=


122

our equation is to be used, our newest knowledge of rPLR is rPLR,k-1. To make use of rPLR,k-1,

we substitute all k’s in the above equation with (k-1). This gives an equation for .

However, we still want to have an equation for . Therefore, we also substitute all

n’s in the above equation with n+1, and get:

(6.58)

The essence of the above equation is that the future receiver buffer level at timestep k+n is dependent upon the current buffer level, the current rate out of the player, the playout algorithm (given by g1, g2 and gTRS), the desired buffer level and the future rate from the transport

segment to the receiver buffer. The first term on the right hand side of the above equation shows the dependency of the receiver buffer level at timestep k+n upon the receiver buffer level at time k-1. The second term shows that a higher player rate at time k-1 gives a lower future receiver buffer level. The third term shows the dependency of the desired buffer level: a higher desired buffer level will as expected give a higher receiver buffer level. The fourth term is constant, since the correct media speed rSNDR is constant. The fifth term shows that the receiver buffer

level at timestep k+n is dependent upon the transport segment state vector, including the rate out of the transport segment, for all timesteps between k and k+n.

We have measurements of and . Therefore, the predicted value of

MRCV for timestep k+n is:

MRCV k 1– n+,

MRCV k n+,

MRCV k n+, 1q1 h n 1+( )( )

b a–( )-------------------------------g1–� �

� �MRCV k 1–,

ebh n 1+( ) eah n 1+( )–b a–

---------------------------------------------------rPLR k 1–,–

q1 h n 1+( )( )b a–( )

------------------------------- g1MRCV d,⋅ rSNDRq3 h i,( )

b a–( )------------------

i 0=

n

�

1b a–( )

---------------- q4 h i,( )xTRS k n i–+,

i 0=

n

�

+ +

+

=

MPBPLR k 1–, rPLR k 1–,


123

(6.59)

The aim of this section, to find the estimated receiver buffer level, which is the first term of the left hand side of Equation (6.16) on page 112, is fulfilled by Equation (6.59). We have been able to express the estimation of the receiver buffer level by constants and by variables that are avail-able before the controller is run at timestep k, by predictions of future transport segment behav-

iour, and by the desired buffer level . A more detailed description for each term on the

right hand side of the above equation is:

• In the first term, can be measured without measurement noise. The number

of media-units in the playoutbuffer is given by the playoutbuffer, and the number of media-units in the player buffer can either be given by the player or by a calculation module that keeps track of the player buffer level (by using information about the player rate and the transfer of media-units from the playoutbuffer to the player). The predicted virtual buffer

level at the previous timestep, is given by equations in Section 3.5.5 (p. 58).

• In the second term, is known, since this was decided by the optimal controller at

the previous timestep.

• The third term depends upon the desired buffer level , which is the variable to be

determined by this chapter.

• The fourth term is constant, since the correct media speed is constant.

• The fifth term contains future predictions of the transport segment state vector . The

Kalman filter described in Chapter 5 can be used to find these predictions.

MRCV k n+, 1q1 h n 1+( )⋅( )

b a–( )-----------------------------------g1–� �

� � MPBPLR k 1–, MVB k 1–,+( )

ebh n 1+( ) eah n 1+( )–b a–

---------------------------------------------------rPLR k 1–,–

q1 h n 1+( )⋅( )b a–( )

-----------------------------------g1MRCV d, rSNDRq3 h i,( )

b a–( )------------------

i 0=

n

�

1b a–( )

---------------- q4 h i,( )xTRS k n i–+,

i 0=

n

�

+ +

+

=

MRCV d,

MPBPLR k 1–,

MVB k 1–,

rPLR k 1–,

MRCV d,

rSNDR

xTRS


124


As shown in figure 6.7, we need to find in order to deduce the equation for the anti-

run-dry algorithm.

The virtual buffer is continuously filled at the rate rTRS from the transport segment. At any given

point in time, we assume that only a fraction of a media-unit, or no media, is contained in the virtual buffer, since all whole media-units leave the virtual buffer at the time of their times-tamps. The reason for this can be seen from Equation (3.46) on page 60. This will also apply to the points in time of timesteps, and thus, at timestep k+n, the amount of media in the virtual buffer is:

(6.60)

where the {x} is the fractional part of the variable x. Since in our case, x is always positive, we have the relation:

(6.61)

where the notations and are used according to [7]:

• is the fractional part of the variable x

• is the floor of the variable x, i.e. the highest integer value that is smaller than or equal to x

By assuming that the timestep length is small compared to the dynamics of , we can

approximate the integral in Equation (6.60) with the value of at the start of the interval

times the interval length. Thus, Equation (6.60) can be written as:

(6.62)

where h is the ideal length of a timestep. Thus, the predicted value of MVB,k+n is:

MVB k n+,

MVB k n+,

MVB k n+, MVB k, rTRS β( ) βd

ih

i 1+( )h

i k=

k n 1–+

�+

� � � � �

=

x{ } x x–=

x{ } x

x{ }

x

rTRS t( )

rTRS t( )

MVB k n+, MVB k, h r⋅ TRS k i+,

i 0=

n 1–

�+

� � � � �

=


125

(6.63)

In the above equation, we first calculate the sum of the current virtual buffer level and the number of media-units flowing from the transport segment into the virtual buffer during the timeperiod between timesteps k and k+n. This is the amount that the virtual buffer would contain if no media-units were sent out of the virtual buffer in the time period. Since media-units are sent out of the virtual buffer when they are full, we use the fractional operator ({x}) to find a prediction of the virtual buffer level at timestep k+n. We have now found the second term on the left hand side of Equation (6.16) on page 112.


As shown in figure 6.7, we need to find in order to deduce the

equation for the anti-run-dry algorithm.

To find an equation for the variance of the prediction error of MPBPLR,k+n, we first need an equa-

tion for the predicted value of MPBPLR,k+n, and before that, we need an expression for

MPBPLR,k+n:

(6.64)

Inserting Equations (6.58) on page 122 and (6.62) on page 124 into Equation (6.64), gives:

(6.65)

MVB k n+, MVB k, h rTRS k i+,

i 0=

n 1–

�+

� � � � �

=

var MPBPLR k n+, MPBPLR–( )

MPBPLR k n+, MRCV k n+, MVB k n+,–=

MPBPLR k n+, 1q1 h n 1+( )( )

b a–( )-------------------------------g1–� �

� � MPBPLR k 1–, MVB k 1–,+( )

ebh n 1+( ) eah n 1+( )–b a–

---------------------------------------------------rPLR k 1–,–q1 h n 1+( )( )

b a–( )------------------------------- g1MRCV d,⋅ rSNDR

q3 h i,( )b a–( )

------------------

i 0=

n

�

1b a–( )

---------------- q4 h i,( )xTRS k n i–+,

i 0=

n

� MVB k, h r⋅ TRS k i+,

i 0=

n 1–

�+

� � � � �

–

+ +

+

=


126

The predicted value of MPBPLR is thus:

(6.66)

The variance of this prediction is:

(6.67)

Inserting Equations (6.65) and (6.66) into the above equation, gives:

(6.68)

In Appendix D, Section D.5 (p. 280), it is deduced that:

(6.69)

Using the above equation together with Equation (6.68), gives:

MPBPLR k n+, 1q1 h n 1+( )( )

b a–( )-------------------------------g1–� �

� � MPBPLR k 1–, MVB k 1–,+( )

ebh n 1+( ) eah n 1+( )–b a–

---------------------------------------------------rPLR k 1–,–q1 h n 1+( )( )

b a–( )------------------------------- g1MRCV d,⋅ rSNDR

q3 h i,( )b a–( )

------------------

i 0=

n

�

1b a–( )

---------------- q4 h i,( )xTRS k n i–+,

i 0=

n

� MVB k, h rTRS k i+,

i 0=

n 1–

�+

� � � � �

–

+ +

+

=

var MPBPLR k n+, MPBPLR k n+,–( ) E MPBPLR k n+, MPBPLR k n+,–( )2

[ ]=

var MPBPLR k n+, MPBPLR k n+,–( )

E 1q1 h n 1+( )( )

b a–( )-------------------------------g1–� �

� � MVB k 1–, MVB k 1–,–( )

1b a–( )

---------------- q4 h i,( ) xTRS k n i–+, xTRS k n i–+,–( )

i 0=

n

�

MVB k, h rTRS k i+,

i 0=

n 1–

�+

� � � � �

MVB k, h rTRS k i+,

i 0=

n 1–

�+

� � � � �

–

� ��

–

+

2

�

�

�

�

�

�

�

�

�

�

=

c1 c2 c3+ +( )2 3 c12 c2

2 c32+ +( )≤


127

(6.70)

The second term on the right hand side of the above equation is calculated in Appendix D, Section D.4 (p. 277). Inserting the result from this calculation (Equation (D.31)), into the above equation, gives:

(6.71)

Appendix D.3 (p. 276) deduces different relations for the fraction operator. According to Equa-tion (D.19) on page 277, if x and y are positive, we have the relation:

(6.72)

Equation (6.72) gives:

var MPBPLR k n+, MPBPLR k n+,–( )

3E 1q1 h n 1+( )( )

b a–( )-------------------------------g1–� �

� � MVB k 1–, MVB k 1–,–( )� ��

2

� ��

3E 1b a–( )


i 0=

n

�� 2

� ��

3E MVB k, h rTRS k i+,

i 0=

n 1–

�+

� � � � �

��

MVB k, h r⋅ TRS k i+,

i 0=

n 1–

�+

� � � � �

�� 2

–

� ��

+

+

≤

var MPBPLR k n+, MPBPLR k n+,–( ) 3 1q1 h n 1+( )( )

b a–( )-------------------------------g1–� �

� �2

σMVB k 1–,

2⋅

3

b a–( )2-------------------E q4 h i,( ) I Ah+( )n i–

i 0=

n

��

XTRS k, q4 h i,( ) I Ah+( )n i–

i 0=

n

�� T

3

b a–( )2------------------- q4 h i,( ) I Ah+( )n i– j– CVTRS k j 1–+, I Ah+( )n i– j– C( )

T

j 1=

n i–

��

q4 h i,( )T

i 0=

n

�

3E MVB k, h rTRS k i+,

i 0=

n 1–

�+

� � � � �

��


i 0=

n 1–

�+

� � � � �

�� 2

–

� ��

+

+

+

≤

x{ } y{ }–( )2 1<


128

(6.73)

and therefore:

(6.74)

Inserting Equation (6.74) into Equation (6.71) and using the fact that is

constant, gives:

(6.75)

where the variables , given by:

, (6.76)

and , given by:

, (6.77)

are introduced to simplify the equations in this chapter, and where is given by Equa-

tions (3.54) on page 62 and (3.62) on page 65, and is given by the Kalman filter

explained in Chapter 5 (p. 91).

MVB k, h rTRS k i+,

i 0=

n 1–

�+

� � � � �

��


i 0=

n 1–

�+

� � � � �

�� 2

– 1<

E MVB k, h rTRS k i+,

i 0=

n 1–

�+

� � � � �

��

��


i 0=

n 1–

�+

� � � � �

�� 2

��

– 1<

VTRS k j 1–+, VTRS=

var MPBPLR k n+, MPBPLR k n+,–( ) 3 1q1 h n 1+( )( )

b a–( )-------------------------------g1–� �

� �2

σMVB k 1–,

2⋅

3q5 n h,( )XTRS k, q5 n h,( )( )T⋅ 3q6 n h,( ) 3+ + +

<

q5 n h,( )

q5 n h,( ) 1b a–( )

---------------- q4 h i,( ) I Ah+( )n i–

i 0=

n

��

=

q6 n h,( )

q6 n h,( ) 1

b a–( )2-------------------

q4 h i,( ) I Ah+( )n i– j– CVTRS I Ah+( )n i– j– C( )T

j 1=

n i–

��

q4 h i,( )T

i 0=

n

�

⋅=

σMVB k 1–,

2

XTRS k,


129

The essence of Equation (6.75) is that the variance of the prediction error of the number of media-units in the playoutbuffer and player at a future timestep k+n is dependent upon the current prediction error variance of the virtual buffer level and the current prediction error vari-ance of the transport segment state vector. The first term on the right hand side shows that a higher certainty of the current virtual buffer level prediction will give a higher certainty of the future playoutbuffer and player level prediction. The second term shows that a higher certainty in the current transport segment state vector prediction also gives a higher certainty of the play-outbuffer and player level prediction. The third and fourth terms are constant.

The aim of this section, to find the variance of the buffer estimation error, which is the right hand side of Equation (6.16) on page 112, is fulfilled by Equation (6.75). As expected, the equation

shows that the estimation variance of the prediction error of is dependent upon

the estimation error variance of the modules that the media-units goes through on their way to the playoutbuffer, namely the transport segment (terms 2, 3 and 4 on the right hand side of the above equation) and the virtual buffer (the first term on the right hand side of the above equa-tion).


As shown in Figure 6.7, this section will combine the results from the previous three sections to find an equation for the minimum value of MRCV,d. From Equation (6.13) on page 112, we

know that the relationship must be fulfilled

if the probability that the buffer runs dry at timestep k+n is to be kept lower than p. Equations

for and were found in Sections 6.2 (p. 113) and 6.3 (p. 124), and

combined to find an expression for in Equation (6.66) on page 126.

Inserting Equation (6.66) into Equation (6.13) on page 112 and solving for MRCV,d gives:

MPBPLR k n+,

MPBPLR k n+, c var MPBPLR k n+, MPBPLR–( )≥

MRCV k n+, MVB k n+,

MPBPLR k n+,


130

(6.78)

where is given by Equation (6.75) on page 128.

The above equation calculates the minimum value of that assures that the run-dry-

probability at timestep k+n is lower than or equal to p (where the relation between p and c in the

above equation is given by Equation (6.5)). The minimum value of is dependent upon

the variance of the prediction of the future (at timestep k+n) number of media-units in the play-outbuffer and player, of the current number of media-units in the playoutbuffer and player, of the current prediction of virtual buffer level, of the current player rate and of the prediction of the transport segment vector for all timesteps between k and k+n.

The first term on the right hand side of equation (6.78) shows that if the variance of the predic-tion of the number of media-units in the playoutbuffer and player at timestep k+n is increased,

needs to increase. The second term shows that the lower the sum of the predicted

virtual buffer level and the current level of the playoutbuffer and player is, the higher

needs to be. The third term shows that the higher the current player rate is, the higher

needs to be. The fourth term is constant since the correct media speed rSNDR is constant. The

MRCV d,b a–( )

q1 h n 1+( )( )g1-------------------------------------c var MPBPLR k n+, MPBPLR k n+,–( )

b a–( )q1 h n 1+( )( )g1------------------------------------- 1–� �� MPBPLR k 1–, MVB k 1–,+( )–

ebh n 1+( ) eah n 1+( )–q1 h n 1+( )( )g1

---------------------------------------------------rPLR k 1–,rSNDR

q1 h n 1+( )( )g1------------------------------------- q3 h i,( )

i 0=

n

�–

1q1 h n 1+( )( )g1------------------------------------- q4 h i,( )xTRS k n i–+,

i 0=

n

�–

b a–( )q1 h n 1+( )( )g1------------------------------------- MVB k, h rTRS k i+,

i 0=

n 1–

�+

� � � � �

+

+

≥

var MPBPLR k n+, MPBPLR–( )

MRCV d,

MRCV d,

MRCV d,

MRCV d,

MRCV d,


131

fifth term shows the dependency upon the predicted future transport segment state vector. If

lower the predicted future rate out of the transport segment is, the higher needs to be.

The sixth term shows that the higher the predicted value of MVB,k+n is, the higher

needs to be. The reason for this is that the level of the playoutbuffer and player is the difference between the total receiver buffer level and the virtual buffer level, and thus the total receiver buffer level needs to be high if the virtual buffer level is high and if the playoutbuffer and player should not run dry.


From Equations (6.78) and (6.75), we can see that we need a predictor for the transport segment, that gives us the following matrix:

(6.79)

The optimal predictor described in Section 5.3.2 (p. 100) can be used to calculate the matrix in Equation (6.79), by saving the results for each iteration of the loop in Figure 5.7 (p. 101) into the matrixes.


Using the calculated value of MRCV,d from Equation (6.78) assures that the run-dry-probability

in exactly n timesteps does not exceed p. Section 6.7.1 discusses different choices of n, and

concludes that the total run-dry-probability may exceed p if is oscillating. Section 6.7.2

describes a solution to this problem, namely to run Equation (6.78) in a loop.

6.7.1 The length of the prediction period

The length of the prediction period, , is a tunable parameter. This section will discuss the advantages and disadvantages of the different choices.

MRCV d,

MRCV d,

FuturePred xTRS( ) xTRS k, xTRS k 1+, … xTRS k n+,=

rTRS

τ n h⋅=


132

One example of a transport segment prediction is illustrated in Figure 6.8, together with three different prediction periods. Note that this a simplified illustration - the transport segment behaviour of the simulated and real transport segments in Chapter 8 are more complex.

If the prediction period is chosen as a too small value, such as in Figure 6.8, we will have

the correct run-dry-probability at time , but if the predicted transport segment rate

decreases after , the run-dry-probability will increase after . Thus, the prediction

period should at least be long enough to reach the first minimum of the transport segment rate prediction.

in Figure 6.8 is an example of a prediction period that reaches the first minimum point of the

transport segment rate prediction, at time . However, as illustrated by time and time

in Figure 6.8, the minimum point may not be reached when using the same prediction

period at another timestep.

To make sure that the minimum point is covered in the prediction period, the prediction period should be chosen to be equal to or longer than the oscillation period of the transport segment

Figure 6.8:Illustration: Different prediction periods for the transport segment rate

Time

TRSr

�a

rSNDR

�c

�b

tk tk+ �a

tk+ �b tk+ �c

�b

bkt τ+

2

2kt

τa

tk τa+

tk τa+ tk τa+

τb

tk τb+ tk2

tk2τb+


133

rate prediction (where the oscillation period is given by the specific transport segment model in

use). This is illustrated by the prediction period in Figure 6.8, where the prediction period

does always cover the first minimum point of the transport segment rate prediction.

By using Equation (6.78) to calculate , the run-dry-probability at exactly n timesteps

ahead (i.e. at exactly seconds ahead) is assured to be below p. However, there is no such assur-

ance for the timesteps between k and k+n (i.e. for the time period between tk and tk + ). If

has a lower value at a timestep between k and k+n, than at timestep k+n, the run-dry-probability at this timestep will be lower than p. This is illustrated in Figure 6.8, where the run-dry-proba-

bility at time is lower than the run-dry-probability at time , because

is lower than . Thus, by using Equation (6.78) to calculate

, the run-dry-probability will be higher than p if has a minimum point somewhere

between timesteps k and k+n.

As we have seen, all the different choices of prediction periods have drawbacks that can lead to a higher run-dry-probability than initially expected. Thus, we need a new solution, described in Section 6.7.2, to be able to guarantee that the run-dry-probability will not exceed p.

6.7.2 Anti-run-dry mechanism in a loop

As explained in Section 6.7.1, all the different choices of prediction periods have drawbacks that can lead to a higher run-dry-probability than initially expected. One solution to this problem is to calculate the minimum value of MRCV,d for many different prediction periods, smaller than

or equal to , and choose the largest of the calculated values of MRCV,d. To guarantee that the

run-dry-probability does not exceed p, one of the prediction periods should, like in Figure

6.8, hit the lowest value of when added to the time at timestep k. Thus, there should be

enough prediction periods to observe the fastest of the modelled oscillations of .

Choosing n different prediction periods (i.e. one for each timestep), with lengths , , ...,

, would guarantee that the run-dry-probability is kept lower than or equal to p during the

timeperiod between tk and tk+ . The value of n should be chosen according to the behaviour

of the transport segment. If n is chosen as a too small value, we get the same disadvantages as

by choosing the prediction period equal to in Figure 6.8. Choosing n as a large value has no

τc

MRCV d,

τ

τ rTRS

tk τb+ tk τc+

rTRS tk τb+( ) rTRS tk τc+( )

MRCV d, rTRS

τ

τb

rTRS

rTRS

1 h⋅ 2 h⋅

n h⋅

n h⋅

τa


134

disadvantages for the result of the anti-run-dry algorithm, since the calculation of the desired buffer level to use can be seen as a prediction of a stable process with a limited uncertainty. Thus, as long as n is chosen large enough, an increase in n will give no significant increase in the desired buffer level. The only drawback of choosing n as a too large value is that a large amount of CPU power would be needed to calculate MRCV,d for n different prediction periods.

Consequently, the value of n should be chosen so that is at least as long as the oscillation period of the oscillation with the largest amplitude. Chapter 8 shows simulations with a range of values for n.

In a practical implementation, one could make a loop, and run the anti-run-dry mechanism expressed by Equation (6.78) with a different number of prediction steps for each iteration of the loop, as illustrated in Figure 6.9.

The anti-run-dry mechanism is run for all numbers of prediction steps from 1 and up to n (i.e.

with prediction periods from and up to ). By saving MRCV,d for each of these

different prediction periods, and choosing the largest of the MRCV,d after the loop is finished, we

can guarantee that the run-dry-probability is always kept lower than p. The reason for this is that

Figure 6.9:Illustration of running Equation (6.81) in a loopSolid lines are used for all iterations of the loop, dotted lines are used only at iteration number n.

n h⋅

rPLR(t) [ ]nkTRSkTRSkTRS ++ ,1,, ,,, xxx �p = run-dry probability

anti-run-dry algorithm

( )tM PBPLR

Equation (6.81)with n = j

j=j+1

j

j

dRCVM ,

MRCV,d = max (all saved MRCV,d)

dRCVM ,

save MRCV,d

all saved MRCV,d

1 h⋅ τ n h⋅=

6.8 Summary

135

all of the different prediction periods from now and until n timesteps ahead will have a lower run-dry-probability than p.

Expressed mathematically, MRCV,d is set to:

(6.80)

where j is the number of prediction steps for each run of the anti-run-dry mechanism. The

parameter n should be set large enough to cover the first minimum of the oscillation of

with the largest amplitude, but there is no need use a larger n. One solution is to set n equal to

the number of timesteps in the oscillation of with the largest amplitude.

6.8 Summary

The total system, including the anti-run-dry algorithm deduced in this chapter, is illustrated in Figure 6.10.

The anti-run-dry mechanism finds MRCV,d according to the equation:

Figure 6.10:Total system including the anti-run-dry algorithm

MRCV d min, , Max MRCV d, j h⋅ anti-run-dry,( ) j 0 1 … n, ,[ , ]∈( )∀( )=

rTRS

rTRS

Receiver

Optimal controller

Transport segment

state predictor

rPLR(t)

)(~ trTRS

( )tM dRCV ,

)(ˆ tTRSx

[ ]nkTRSkTRSkTRS ++ ,1,, ,,, xxx �

p = run-dry probability

user

Sender

anti-run-dry-algorithm

Kalman filter

( )tM PBPLR

Transport segment


136

(6.81)

where is given by Equation (6.75) on page 128, is given

by Equation (6.42) on page 118, is given by Equation (6.51) on page 120, is

given by Equation (6.57) on page 121, a is given by Equation (6.34), b is given by Equation

(6.35) on page 117, c is given by Equation (6.5) on page 110, and is given by Equation (4.49)

on page 87.

Figure 6.10 (p. 135) shows that we need a transport segment state predictor that provides

. Since the Kalman filter is also an

optimal predictor, it is used in the transport segment state predictor.

The minimum value of MRCV,d given above ensures that the run-dry-probability in exactly n

timesteps does not exceed p. However, if half the time period of the transport segment delay

oscillations is smaller than , the run-dry-probability at tk+m, 1<m<n, could be higher

than p. To avoid this problem, equation (6.78) on page 130 is calculated for each timestep [tk+1

... tk+n], and the resulting MRCV,d is set to the maximum MRCV,d of these timesteps.

MRCV d,b a–( )

q1 h n 1+( )( )g1-------------------------------------c var MPBPLR k n+, MPBPLR–( )

b a–( )q1 h n 1+( )( )g1------------------------------------- 1–� �� MPBPLR k 1–, MVB k 1–,+( )–

ebh n 1+( ) eah n 1+( )–q1 h n 1+( )( )g1

---------------------------------------------------rPLR k 1–,

rSNDRq1 h n 1+( )( )g1------------------------------------- q3 h i,( )

i 0=

n

�– 1q1 h n 1+( )( )g1------------------------------------- q4 h i,( )xTRS k n i–+,

i 0=

n

�–

b a–( )q1 h n 1+( )( )g1------------------------------------- MVB k, h rTRS k i+,

i 0=

n 1–

�+

� � � � �

+

+

≥

var MPBPLR k n+, MPBPLR–( ) q1 τ( )

q3 h i,( ) q4 h i,( )

g1

FuturePred xTRS( ) xTRS k, xTRS k 1+, … xTRS k n+,=

τ h n⋅=

7.1 Notation and syntax

137

7 Implementation in Matlab

This chapter describes the implementation of a receiver system environment simulator and a transport segment simulator in Matlab. It also describes the implementation of the optimal control algorithm deduced in Chapter 4 and the anti-run-dry algorithm deduced in Chapter 6.

As shown in Figure 7.1, none of the sections in this chapter are essential for understanding the thesis, but Section 7.1 is essential for understanding the rest of the chapter. It gives an overview of the notation and syntax used in the chapter.

Section 7.2 (p. 139) first gives an overview of the total implementation in Matlab, and then gives a more detailed description of the implementation of the receiver system environment simu-lator. This simulator can be used to test all playoutbuffer algorithms, and may thus be interesting for all readers wanting to test new algorithms.

Section 7.3 (p. 148) describes the implementation of the controller algorithms deduced in this thesis, and may therefore not be interesting for all readers.


This section describes the code sample notation (in Section 7.1.1) and the figure syntax (in Section 7.1.2) used in the rest of this chapter.



7.2 Matlab system overview

7.3 Controller algorithms deduced in this thesis




Chapter 7 Implementation in Matlab

138

7.1.1 Matlab code sample notation

All the code samples in this chapter are written in Matlab language, and can be run in Matlab without modifications. The variables whose names are not self-explanatory are explained in the text. Table 7.1 summarizes the Matlab syntax and the variable name syntax used in this chapter.

To save space in the code samples in this chapter, the variable names use mu as an abbreviation for media-unit(s). The symbolic versions of the variable names used in this chapter are shown in Table A.3 (p. 258).

In the code samples, the Matlab code is written with a black font, while the comments are written with a green font.

Matlab code Equation Explanation

X’ The transpose of a matrix X

inv(X) The inverse of a matrix or a scalar X

exp(x) The exponential of x (e to the x)

eye(n) Identity matrix of size n by n

zeros(n,m) Zero matrix of size n by m

length(x) The number of elements in the vector x

sqrt(x) The square root of a variable x

X^2 The square of a matrix or scalar X

mod(x,1) The fractional part of the variable x

mod(x,y) Residue of x modulo y (i.e. the signed remainder after the division )

... The command line continues on the next line

x_k The value of x at timestep k

x_k_minus_1 The value of x at timestep k-1

Table 7.1: Matlab notation and syntax

XT

X 1–

ex

In n×

0n m×

x

X2

x{ }

x y⁄

xk

xk 1–


139

7.1.2 Figure syntax

Most of the figures in this chapter illustrate the connections between different modules in the Matlab program. In these illustrations, the arrows symbolizing data transfer between the different modules are coloured in accordance to when the transfer is made, by the following rules:• The magenta arrows symbolize transfer of data at each timestep. • The yellow arrows symbolize transfer of data each time a packet is transferred (either into

or out of the playoutbuffer).• The black arrows symbolize transfer of data only initially (i.e. once, before the media trans-

fer is started).


An overview of the total system Matlab implementation is shown in Figure 7.2.

As shown in Figure 7.2, the implementation can be run with either simulated or measured trans-port segment data. Since the transport segment data are presented to the total system simulator as a transport segment trace file, the total system simulator will not know whether it uses simu-lated or real transport segment data.

In this thesis, the transport segment trace file can have two different origins: • Transport segment measurements: A transport segment trace file can be made from real

measurements of a media stream by measuring the arrival times of the media-units at the receiving application.

• Transport segment simulation: A transport segment trace file can also be made from a trans-port segment simulator. The transport segment simulator that we have programmed for the use in this thesis is described in Section 7.2.1.

Section 7.2.2 will describe the path of a media-unit through our system, and the flow of control through the management tool is described in Section 7.2.3. Sections 7.2.4 to 7.2.9 explain the different modules shown in Figure 7.2.


140

7.2.1 Transport segment simulation in Matlab

The transport segment simulator consists of two parts. The first part, making a vector of

by use of the transport segment state space model, is explained in Section 7.2.1.1. The second part, using this vector to make a transport segment trace file, is described in Section 7.2.1.2.

Figure 7.2:Matlab program structureThe broad gray arrows correspond to transfer of media-unit(s). The thin-ner arrows correspond to transfer of information.

Player(see Section 7.2.6)

Playoutbuffer (see Section 7.2.5)

Transport segment (see Section 7.2.4)

Transport segment simulator

Management tool playoutspeed ( kPLRr , )

size of playoutbuffer ( kPBM , )

message: media-unit sent to player this timestep media-

unit

transport segment trace file

Transport segment

measuring

user

or

Send-out-time calculation

(see Section 7.2.9)

Controller algorithm (see Section 7.2.8)

Rate measurement (see Section 7.2.7)

mTRSrt ,

)(~,, mTRSrmTRS tr

kPLRr ,

media-unit(s)

message: send media-unit to player this timestep

timestamp and id number of media-unit(s)

all timesteps only when media-unit transferred only initially

size of player-buffer

( kPLRM , )

weight factors: w1, w2 and w3 optional: MRCV,d

rTRS


141

7.2.1.1 Making a vector of transport segment rate

The transport segment simulator receives the continuous state space model to use (consisting of

the matrixes , and ) as an input, and calculates the discrete matrixes

and by using the equations in Section 3.5.6 (p. 65) with a value of delta_t that is much

smaller than the playtime of one media-unit (to make the discretization error negligible).

The discrete transport segment state space model is used to calculate for a range of t’s

(placed delta_t seconds apart). The first state of (equal to ) is saved

for later use. This is done by Code sample 7.1, where whitevector is a vector with Gaussian white noise elements generated by Matlab with the parameters mean equal to zero and standard deviation equal to 1. The variable std_dev_v_TRS is the standard deviation of the discrete

process noise (calculated from ) and is initialized as a zero vector.

By adding to all elements in x1_vector, and then replacing all negative elements with

zero (since the transport segment rate can never be negative), we get a vector consisting of

at a range of closely spaced t’s.

7.2.1.2 Making the transport segment trace file

By time-integrating the vector of from the previous section, we get a value of the

number of media-units that has left the transport segment. Each time this integral exceeds one media-unit, the media-unit is subtracted from the integral, and saved in the transport segment

trace file, by saving its number m and its timestamp , equal to the time that the integral

Code sample 7.1 Making a vector of r_TRS

ATRS CTRS VTRS t( ) ΦTRS

VTRS k,

xTRS t( )

xTRS t( ) rTRS t( ) rSNDR–

VTRS k, xTRS

for i=1:n_of_r_TRS_samples, x_TRS_k_minus_1 = x_TRS_k; v_TRS_discrete = std_dev_v_TRS * whitevector(i); x_TRS_k = PHI_TRS * x_TRS_k_minus_1 + C_TRS * v_TRS_discrete; x1_vector(i) = x_TRS_k(1);end

rSNDR

rTRS t( )

rTRS t( )

tTRS m,


142

exceeded one. This is done by Code sample 7.2 (where delta_t has the same small value as in the previous section).

7.2.2 The path of a media unit

This section will describe the path of a media-unit from the moment it arrives from the transport segment, until it is being played out by the player, as illustrated by Figure 7.2.

When a media-unit is received from the transport segment in a real-time implementation, its media content is placed in memory (either on the network interface card or in the machine memory, depending on the receiver machine used), and a pointer to this place in memory is given to the controller system together with the timestamp and ID number of the media-unit. When it is time for the player to play the media-unit, the pointer is used to find the actual media, and the player plays the media at the speed decided by the controller. Thus, the actual media is not used in the Management tool (see Figure 7.2), but the pointer to the media is transferred through the receiver system together with the timestamp and ID number of the media-unit.

Thus, the Matlab implementation only contains the timestamp and ID number of each media-unit. The reason for this is that the use of a pointer to a media content will not add any new infor-mation.

The transport segment module (described in Section 7.2.4 (p. 144)) uses the contents of the transport segment trace file to make media-units, containing a timestamp (for the time of arrival)

Code sample 7.2 Making transport segment trace file

time = t0; % [s] the first unit will have a timestamp = t0m = 1; % m = media-unit numberi = 1;integral = 0;for i=1:n_of_r_TRS_samples, integral = integral + delta_t*r_TRS_vector(i); if integral >= 1 % one media-unit is gathered, and can be sent out of the transport segment transport_segment_trace_file(:,m)= [time m]'; integral = integral - 1; m = m + 1; end time = time + delta_t; end


143

and an ID number. At the first timestep after the timestamp of a media-unit, it is sent to the play-outbuffer. At the same time, its ID number and timestamp are sent to the rate measurement module in the management tool.

When the playoutbuffer (described in Section 7.2.5 (p. 145)) receives a new media-unit, it places the media-unit in the its FIFO buffer. At each timestep, the playoutbuffer sends a

message containing the number of media-units in its buffer ( ) to the controller algorithm

in the management tool.

When the management tool decides that a packet should be sent out from the playoutbuffer, it sends a message to the playoutbuffer. When receiving this message, the playoutbuffer first checks if it has any media-units in its buffer. If the playoutbuffer has no media-units in its buffer, it takes no action. Otherwise, it takes a media-unit out of its buffer, and sends it to the player. At the same time, a message is sent to the management tool, to inform that a new media-unit has been sent to the player.

The player (described in Section 7.2.6 (p. 145)) receives the media-unit and puts it into its buffer. Each timestep, the player receives a message from the management tool, containing the playout speed to use. The player then plays the media stream at this rate. The player keeps track of the amount of media-units in its buffer.

7.2.3 The management tool

The management tool contains all the algorithms and logic used to control the playoutbuffer and the player. In our implementation, it is run each timestep.

If a new media-unit (or several new media-units) has arrived from the transport segment since last timestep, the rate measurement module (described in Section 7.2.7 (p. 146)) receives one message for each arrived media-unit, containing the timestamp and ID number of the new media-unit. The rate measurement module uses this information to calculate the media-unit rate

from the transport segment, . It sends this calculated rate and its timestamp to

the controller algorithm. It is not strictly necessary to use ID numbers for the media-units, but in our implementation, they are used to control that the media-units are put into the playout-buffer in the right order, and to calculate the ideal time difference between to non-adjacent media-units.

MPB k,

rTRS m, trTRS m,( )


144

Each timestep, the controller algorithm (described in Section 7.2.8 (p. 147)) receives a message from the playoutbuffer containing the number of media-units in the playoutbuffer and a message from the send-out-time calculation module containing the player buffer size. If one or more new media-units have arrived from the transport segment since last timestep, it also receives the calculated rate measurement from the rate measurement module. Initially (before the transfer of media is started), the controller algorithm receives the weight factors w1, w2 and

w3 from the user. It also receives the maximum run-dry-probability p if the anti-run-dry algo-

rithm is used, or the desired buffer level MRCV,d if the anti-run-dry algorithm is not used. The

controller algorithm uses all the received information to determine the playout rate to be used by the player. It sends a message, containing the playout rate to use, to the player and to the send-out-time calculation module.

The send-out-time calculation module (described in Section 7.2.9 (p. 147)) uses the received information about the playout rate to calculate the current and future buffer level of the player. Each timestep, is sends a message containing the current player buffer level to the controller algorithm module. Its purpose is to make sure that the player receives a new media-unit from the playoutbuffer before the player buffer runs dry. When the send-out-time calculation module decides that the player needs a new media-unit, it gives the playoutbuffer a send-out message.

7.2.4 Transport segment

As shown in Figure 7.2 (p. 140), the transport segment used in the Matlab program can be repre-sented by either a real transport segment trace or by a trace made from a transport segment simu-lator.

The transport segment simulator (see Section 7.2.1 (p. 140)) creates a transport segment trace file with the same format as trace files made from real transport segment measurements. Thus, the total system simulator will not know whether the transport segment trace file was made from simulated or real transport segment data.

In the Matlab program implementation, the transport segment module uses the transport segment trace file to build a matrix where each row, corresponding to one media-unit, contains the media-unit’s time-stamp (tTRS,m) and ID number. In our implementation, the ID number is

identical to the number m of the media-unit. Each timestep, all media-units in the transport segment matrix with a timestamp that has a lower value than the time of the timestep, are sent out of the transport segment and into the playoutbuffer.


145

7.2.5 Playoutbuffer

The playoutbuffer module simulates a playoutbuffer with a possibly very large size, so that the only limitation to the number of media-units in the playoutbuffer is the need for a low latency communication. Each time a media-unit arrives from the transport segment, it is put into the input end of the FIFO queue in the playoutbuffer. The playoutbuffer receives a message from the send-out-time calculation module (in the management tool) when a new media-unit should be sent to the player. When such a message is received, the playoutbuffer checks if there are any media-units in its buffer, and if the playoutbuffer is not empty, a media-unit is taken from the playoutbuffer FIFO queue and sent to the player.

The playoutbuffer keeps track of the number of media-units it contains, and each timestep, it sends a message containing this number to the controller algorithm in the management tool, which can use it to control the player rate rPLR(t).

In our Matlab program, the playoutbuffer is implemented as a circular buffer.

When the Matlab program is started, the circular buffer in the playoutbuffer is initialized. When the playoutbuffer module is called because a media-unit is received from the transport segment, this media-unit is put into the FIFO buffer. Similarly, when the playoutbuffer is called because it is time to send a media-unit to the player, and the playoutbuffer is not empty, the media-unit at the output end of the FIFO queue is sent out from the playoutbuffer. If the playoutbuffer is empty, an empty-buffer message is sent out.

7.2.6 Player

In the Matlab program, the player is modelled as a buffer that can contain a floating-point number of media-units. The player module receives media-units from the playoutbuffer, usually before the player buffer runs dry. This way, the player buffer runs dry only if the playoutbuffer runs dry first. Each timestep, the player receives information about the playout speed to use from the controller algorithm (in the management tool), and plays out the media-units at this playout speed, by removing a floating-point number of media-units, equal to the playout rate times the length of the timestep, from the player buffer (to simulate the playout of this amount of media).

The player keeps track of the floating-point number of media-units inside its buffer (in the Matlab implementation, this number is saved for use in graphs).


146

When the Matlab program is started, the player is initialized by setting its size to zero. If the player receives a new media-unit, it is added to the size of the player buffer.

Each timestep, the player receives information about the playout speed to use in the timeperiod between this timestep and the next timestep. The following code is used to calculate the new size of the player buffer at the next timestep (i.e. after playing at the playout speed for the dura-tion of delta_t, where delta_t is the time between this timestep and the next timestep):

The variable number_of_mu_in_player is the information that is sent from the player to the controller algorithm in the management tool.

The modules that the media-units go through on their way through the system are now explained. The next three sections will describe the modules in the management tool.

7.2.7 Rate measurement

The rate measurement module receives the timestamps and ID numbers of the media-units that arrive from the transport segment. If the rate is to be calculated for a collection of n media-units, where media-unit m is the last one, rTRS(t) is calculated for the time interval

. If n=1, this means that the rate is calculated for the time interval

. Equations (3.39) and (3.41) on page 55 are used to calculate the rate for

a collection of n media-units. This corresponds to the following code sample (where t_TRS denotes the timestamp from the timestamping process):

Code sample 7.3 Running the player

Code sample 7.4 Rate measurement of a collection of media-units

if playoutspeed <= 0 %security in case the controller has no mechanism to avoid a negative playout speed: no_sound = 1;elseif number_of_mu_in_player >= (playoutspeed * delta_t) % playing at speed = playoutspeed for the duration of delta_t: number_of_mu_in_player = number_of_mu_in_player - (playoutspeed * delta_t);else % number_of_mu_in_player < (playoutspeed * delta_t) number_of_mu_in_player = 0;end

t( TRS m, n– tTRS m, ],

tTRS m, 1–( tTRS m, ],

rate = n/(media_unit_m.t_TRS - media_unit_m_minus_n.t_TRS);timestamp_r_TRS_collection = (media_unit_m.t_TRS + media_unit_m_minus_n.t_TRS)/2;


147

7.2.8 Controller algorithm

Any controller algorithm can be used in the ‘Controller algorithm’ module. The algorithms implemented in this thesis are:

1. Fixed playout delay

2. One of the existing playout speed adjusting algorithms

3. Optimal control algorithm

4. Optimal control algorithm with anti-run-dry algorithm

Algorithm number 1 is explained in Section 8.1.1 (p. 165), while algorithm number 2 is explained in Section 8.1.2 (p. 165). The last two algorithms are explained in Section 7.3 (p. 148).

7.2.9 Send-out-time calculation

As explained earlier, the send-out-time calculation module keeps track of the amount of media in the player. This information is used to ensure that the player receives a new media-unit from the playoutbuffer before the player buffer runs dry, and also to send a message, containing the player buffer level, each timestep to the controller. The only information the send-out-time calculation module uses is a message from the controller algorithm, received each timestep, containing the playout speed that the player should use until the next timestep.

At the beginning of each timestep, the send-out-time calculation module is called only to decide whether it is time for the playoutbuffer to send a new media-unit to the player. If the calculated number of media-units to be played out in the time period between this timestep and the next timestep, is larger than the calculated number of media-units in the player, a message is sent to the playoutbuffer, to make it send a new media-unit to the player.

At the end of each timestep, the send-out-time calculation module is called to update its internal variables. If the playoutbuffer has sent a media-unit to the player, the send-out-time calculation


148

module receives a message, and increases the calculated number of media-units in the player. The information from the controller about the playout speed is used by the following code:


This section describes the implementation of the controller algorithms deduced in this thesis. The implementation of the optimal controller is described in Section 7.3.1, and the implemen-tation of the anti-run-dry algorithm together with the optimal controller is describes in Section 7.3.2 (p. 150). Sections 7.3.3 to 7.3.7 describes the different modules inside the controller algo-rithm module. The existing algorithms that are implemented for comparison are explained in Chapter 8, in Sections 8.1.1 (p. 165) and 8.1.2 (p. 165).

7.3.1 Controller algorithm for optimal controller

The controller algorithm for optimal control of playoutbuffers is shown in Figure 7.3.

When a media-unit arrives from the transport segment, the transport segment rate is measured by the rate measurement module (as was shown in Figure 7.2 (p. 140)), and sent to the transport segment state estimation module in the controller algorithm module.

Even though the transport segment state estimation module (see Section 7.3.3 (p. 152)) in the controller algorithm receives a measurement of rTRS (together with the measurement’s times-

tamp) only at timesteps when at least one media-unit has arrived, it calculates estimates of the transport segment state vector at every timestep.

Every timestep, the virtual buffer estimation module (see Section 7.3.4 (p. 155)) receives esti-mates of the transport segment state vector xTRS and its covariance matrix XTRS for the current

timestep and for the middle time between the previous and the current timestep. For timesteps

Code sample 7.5 Update of internal variables for the send-out-time calculation module.

calculated_number_of_mu_in_player = calculated_number_of_mu_in_player - (playout_speed * delta_t);if calculated_number_of_mu_in_player < 0 calculated_number_of_mu_in_player = 0;end


149

where at least one media-unit has arrived since the last timestep, it also receives estimates of xTRS and XTRS for the point in time of the timestamp of the media-unit.

As shown in Figure 7.3, the optimal controller (see Section 7.3.7 (p. 160)) receives the following inputs when used alone (i.e. without the anti-run-dry algorithm):

• Once initially: The three weight factors w1, w2 and w3 from the user or application program-

mer.• Once initially: The desired buffer size MRCV,d from the user or application programmer.

• Each timestep: An estimate of the transport segment state vector xTRS from the transport

segment state estimation module • Each timestep: The number of media-units in the playoutbuffer• Each timestep: The number of media-units in the player buffer.• Each timestep: An estimate of the number of media-units in the virtual buffer.

Figure 7.3:Controller algorithm for optimal control

Controller algorithm

size of playoutbuffer (MPB,k)

kPLRr ,

Optimal controller (see Section 7.3.7)

desired buffer size ( dRCVM , )

Transport segment state estimation (see Section 7.3.3)

weight factors: w1, w2 and w3


size of player-buffer (MPLR,k)

kVBM ,

( )kVBM ,var

Virtual buffer estimation

(see Section 7.3.4)

kTRS ,xkTRS ,X

mTRSrt ,

)(~,mTRSrTRS tr

2/1, −kTRSx2/1, −kTRSX

( )mTRSTRS t ,x

( )mTRSTRS t ,X


150

7.3.2 Controller algorithm for optimal control with anti-run-dry algorithm

In Section 7.3.1, the optimal controller (explained in Section 7.3.7 (p. 160)) received the desired buffer level (MRCV,d) from the user or application programmer. In this section, the target

buffer level (MRCV,d) is determined by the anti-run-dry algorithm (see Section 7.3.6 (p. 157)).

Each time a new media-unit (or a collection of media-units) has arrived from the transport segment, the anti-run-dry algorithm calculates a new value of MRCV,d and sends it to the optimal

controller, as shown in Figure 7.4. To be able to calculate the minimum value of MRCV,d, the

anti-run-dry algorithm needs to know the optimal control gain vector, and it also needs the following input at each timestep where at least one media-unit has arrived from the transport segment:• The number of media-units in the playoutbuffer at the previous timestep• The number of media-units in the player buffer at the previous timestep• An estimate of the number of media-units in the virtual buffer at the previous timestep, and

the variance of this estimate• An estimate of the transport segment state vector and its covariance matrix, for both the cur-

rent and the previous timestep• Predictions of the transport segment state vector for the next n timesteps• The playout speed determined by the optimal controller at the previous timestep

The transport segment state prediction module (see Section 7.3.5 (p. 157)) is used to calculate predictions of the transport segment state vector xTRS for the next n timesteps.

As in the previous section, a transport segment state estimation module (see Section 7.3.3 (p. 152)) is used to estimate the transport segment state vector needed by the optimal controller and by the virtual buffer estimation module (explained in Section 7.3.4 (p. 155)). In this section,


151

these estimates are also needed by the transport segment state prediction module and by the anti-run-dry algorithm, as shown in Figure 7.4.

The reason that the anti-run-dry algorithm and the transport segment state prediction module is run only at timesteps when at least one media-unit has arrived from the transport segment, is that the future prediction of the different buffer levels are not changed unless we have a new measurement of the transport segment rate.

Figure 7.4:Control algorithm for optimal control with anti-run-dry algorithm

rPLR,k

Optimal controller (see Section 7.3.7)

Transport segment state prediction

(see Section 7.3.5)

Anti-run-dry algorithm (see Section 7.3.6)

[ ]nkTRSkTRS +,, xx �

Minimum value of dRCVM ,

Transport segment state estimation (see Section 7.3.3)

kTRS ,x

kTRS ,X

w1, w2, w3


MPLR,k

MPB,k

Virtual buffer estimation (see Section 7.3.4)

kVBM , ( )kVBM ,var

mTRSrt ,

)(~,mTRSrTRS tr

( )mTRSTRS t ,x

( )mTRSTRS t ,X

rPLR,.k-1

2/1, −kTRSx2/1, −kTRSX


152

7.3.3 Transport segment state estimation

The task of the transport segment state estimation module is to find optimal estimates of the

transport segment state vector at all timesteps , when receiving measurements at the asyn-

chronous times for each media-unit m.

At each timestep, the transport segment state estimation module is used to calculate a prediction

of the transport segment state vector and its covariance matrix . The predicted trans-

port segment state vector and its covariance matrix is used by the transport segment state predic-tion module, by the virtual buffer estimation module, by the anti-run-dry algorithm and by the optimal controller, as was shown in Figure 7.4.

At timesteps where at least one media-unit has arrived from the transport segment, the transport segment state estimation module receives a measurement of the transport segment rate rTRS,m

and its timestamp from the rate measurement module. As shown in Figure 7.5, the

timestamp is first used to calculate the prediction period between the previous rate

measurement and the current rate measurement. By using this prediction period and the updated estimates of the transport segment state vector and its covariance matrix from the previous rate measurement, a Kalman filter prediction module (see Section 7.3.3.1 (p. 154)) calculates the

predicted state vector and its covariance matrix for , the time of the current rate meas-

urement.

These predictions are received by a Kalman filter update module (see Section 7.3.3.2 (p. 155)), which also receives the current rate measurement, and uses the rate measurement to update the transport segment state vector and its covariance matrix.

The Kalman filter update module give us updated estimates at time . However, we

would like to have predictions of the transport segment state vector and its covariance matrix at

tk, the time of the current timestep. As shown in Figure 7.5, the prediction period from

to tk are calculated and sent into a new Kalman filter prediction module1, together with the

1. In the implementation, the same Kalman filter prediction function as above is called here, but with a different set of input and output.

tk

trTRS m,

xTRS XTRS

trTRS m,

trTRS m,

trTRS m,

trTRS m,

trTRS m,


153

updated estimates at time . The new Kalman filter prediction module calculates predic-

tions of the transport segment state vector and its covariance matrix for time tk.

The paragraphs above have explained the yellow arrows in Figure 7.5. However, at timesteps where no media-units have arrived from the transport segment, only the magenta arrows are used. First, the prediction period is calculated as the time period since the previous timestep. Then, as shown in Figure 7.5, the prediction period is sent into a Kalman filter prediction module together with the prediction of the transport segment state vector and its covariance matrix from the previous timestep. The Kalman filter prediction module uses the received infor-mation to calculate predictions of the transport segment state vector and its covariance matrix for the current timestep.

Figure 7.5:Transport segment state estimation

trTRS m,

Transport segment state estimation

)(~,mTRSrTRS tr

Kalman filter update (see Section 7.3.3.2)

Kalman filter prediction(see Section 7.3.3.1)

kTRS.x

kTRS ,X

1, −kTRSx

1, −kTRSX

)(ˆ,mTRSrTRS tx

)(ˆ,mTRSrTRS tX

wait )(ˆ 1, −mTRSrTRS tx )(ˆ

1, −mTRSrTRS tX )(

,mTRSrTRS tx )(

,mTRSrTRS tX

Kalman filter prediction (see Section 7.3.3.1)

tk wait

1, −mTRSrtwait

+ _

1,, −−

mTRSrmTRSr tt

mTRSrk tt,

−

wait

mTRSrt ,

tk-1

+ _

tk - tk-1

_

+

or(use either updated or predicted values)

all timesteps only when media-unit transferred


154

The wait-box in Figure 7.5 is used as a symbol to show that we save the input value until the next timestep (or until the arrival of the next media-unit).

The structure shown in Figure 7.5 gives optimal estimates of the transport segment state vector

at all timesteps , when receiving delayed measurements at the asynchronous times

for each media-unit m.

7.3.3.1 Kalman filter prediction

To be able to find a future prediction of a state vector and its covariance matrix, the Kalman filter prediction module needs to know the state space model of the system in question (not illus-trated in the figures of this chapter), like Equation (3.63) on page 65, consisting of the matrixes A, B and C. It also needs to know the spectral density matrix V (called V_continuous in Code sample 7.6) of the continuous process noise.

In addition to a state space model, the Kalman filter prediction module needs the following inputs:

• The state vector x and its covariance matrix X.• The prediction timeperiod delta_t, which is the time between the timestamp of the input

state vector (and its covariance matrix) and the timestamp of the output state vector (and its covariance matrix).

By using the general Kalman filter prediction Equations (5.5) and (5.6) on page 93, we get the following code for the Kalman filter prediction module (where delta_t is the length of the prediction time period):

When the Kalman filter prediction module is called from other parts of the code, the transport

segment matrixes and and the transport segment state vector are sent as

inputs, and hence, the calculations equals the ones in Equations (5.18) and (5.19) on page 95.

Code sample 7.6 Kalman filter prediction

tk trTRS m,

PHI = eye(length(x_input))+delta_t*A;LAMBDA = delta_t*B;V_discrete= delta_t*V_continuous;GAMMA=C;x_output = PHI*x_input + LAMBDA*u;X_output = PHI*X_input*PHI' + GAMMA*V_discrete*GAMMA';

ATRS CTRS xTRS


155

7.3.3.2 Kalman filter update

To be able to update the state vector with a new measurement, the Kalman filter update module needs to know the measurement equation, like Equation (5.2) on page 93, consisting of a meas-urement matrix D and a covariance matrix W of the measurement noise. In addition, the Kalman filter update module needs the following inputs:

• The state vector x and its covariance matrix X.• The measurement vector y

By using the general Equation (5.7) on page 93 to calculate the Kalman filter gain matrix K, and then using the general Kalman filter update Equations (5.8) and (5.9) on page 93, we get the following code for the Kalman filter update module:

When the Kalman filter update module is called from the transport segment state estimation module, the measurement matrix D, given by Equation (5.15) on page 95, the rate measurement

, the measurement noise covariance matrix W (which in our case is scalar

and equal to the variance of the rate measurement error) and the transport segment state vector

are sent as inputs. Thus, the calculations equals the ones in Equations (5.21) and (5.22)

on page 95.

7.3.4 Virtual buffer estimation

As was shown in Figures 7.3 (p. 149) and 7.4 (p. 151), the virtual buffer estimation module receives predictions of the transport segment state vector and its covariance matrix from the

transport segment state estimator at the points in time and , and at timestamps of rate

measurements, i.e. at . However, the virtual buffer uses only the first state of the trans-

port segment state vectors and element (1,1) of the covariance matrixes.

Code sample 7.7 Kalman filter update

K = X_input*D'*inv(D*X_input*D' + W);x_output = x_input + K*(y - D*x_input);X_output = (eye(length(x_input)) - K*D)*X_input;

y rTRS trTRS m,( )=

xTRS

tk tk 1 2⁄–

trTRS m,


156

At timesteps where at least one media-unit has arrived since the previous timestep, the estimates

for time and are used by Equations (3.49) on page 61 and (3.54) on page 62, by the

following code:

At timesteps where no media-units have arrived since the previous timestep, the estimates for

time and are used by Equations (3.57) on page 63 and (3.62) on page 65, by the

following code, where delta_t is the time since the previous timestep:

Code sample 7.8 Virtual buffer estimation for timesteps where media-unit(s) have arrived

since the last timestep

Code sample 7.9 Virtual buffer estimation for timesteps where no media-units have arrived

since the last timestep

trTRS m,tk

r_TRS_k_prediction = x_TRS_k_prediction(1) + r_SNDR;var_r_TRS_k_prediction = X_TRS_k_prediction(1,1);if r_TRS_k_prediction < 0, % as security in case predicted rate is negative r_TRS_k_prediction = 0;endr_TRS_timestamp_estimate = x_TRS_timestamp_update(1) + r_SNDR;var_r_TRS_timestamp_estimate = X_TRS_timestamp_update(1,1);if r_TRS_timestamp_estimate < 0, % as security in case estimated rate is negative r_TRS_timestamp_estimate = 0;endM_VB_k_estimate = (t_k - t_timestamp)*... (r_TRS_k_prediction + r_TRS_timestamp_estimate) /2;var_M_VB_k_estimate = (t_k - t_timestamp)^2 * ... max(var_r_TRS_k_prediction, var_r_TRS_timestamp_estimate);

tk tk 1 2⁄–

r_TRS_k_prediction = x_TRS_k_prediction(1) + r_SNDR;r_TRS_k_minus_half_prediction = x_TRS_k_minus_half_prediction(1) + r_SNDR;r_TRS_k_minus_1_prediction = x_TRS_k_minus_1_prediction(1) + r_SNDR;M_VB_k_estimate = M_VB_k_minus_1_estimate + delta_t*(r_TRS_k_prediction +... r_TRS_k_minus_1_prediction + 2*r_TRS_k_minus_half_prediction)/4;var_M_VB_k_estimate = var_M_VB_k_minus_1_estimate+max(X_TRS_k_prediction(1,1),... X_TRS_k_minus_half_prediction(1,1),X_TRS_k_minus_1_prediction(1,1))*(delta_t)^2;


157

7.3.5 Transport segment state prediction

As was shown in Figure 7.4 (p. 151), the transport segment state prediction module receives predictions of the transport segment state vector and its covariance matrix at timesteps where at least one media-unit has arrived since the last timestep. As shown in Figure 7.6, the transport segment state prediction module calls the Kalman filter prediction module n consecutive times (as explained in Section 6.7.2 (p. 133)), and stores the output in a matrix containing state vector predictions for the next n timesteps.

7.3.6 Anti-run-dry algorithm

As was shown in Figure 7.4 (p. 151), the anti-run-dry algorithm is called at timesteps where at least one media-unit has arrived since the previous timestep. It receives the current estimate and n future predictions of the transport segment state vector from the transport segment state prediction module. It also receives the number of media-units in the player and the playout-buffer, and an estimate of the number of media-units in the virtual buffer, in addition to the playout speed determined by the optimal controller at the previous timestep.

Figure 7.6:Transport segment state prediction modulewhere the Kalman filter prediction is run n times.

Transport segment state prediction

Kalman filter prediction(see Section 7.3.3.1)

1, ++ikTRSx1, ++ikTRSX

kTRS ,xkTRS ,X

ikTRS +,xikTRS +,X

i = i+1

[ ]nkTRSkTRSkTRS ++ ,1,, xxx �

Initial estimate:

Collection of all n predictions:

all iterations of the loop


158

To be able to calculate the minimum value of MRCV,d, the anti-run-dry algorithm also needs to

know the optimal control gain vector G.

The anti-run-dry algorithm runs Equation (6.78) on page 130 in a loop, as shown in Figure 7.7.

The first thing the anti-run-dry algorithm must do, is to calculate the variables used by the later equations. Equations (4.49) on page 87, (6.39), (6.34), (6.35) on page 117, (6.36), (6.42) on page 118 and (6.49) on page 120 are used to make the following code:

Figure 7.7:Anti-run-dry algorithm

Code sample 7.10 Anti-run-dry algorithm: making variables for later use

Anti-run-dry algorithm

[ ]nkTRSkTRS +,, xx �

dRCVM ,

n times during the anti-run-dry algorithm only when media-unit transferred

MPLR,k

MPB,k

kVBM ,

( )kVBM ,var

rPLR,k-1

Find MRCV,d for n = i(Equation (6.81))

Find maximum of the input MRCV,d

dRCVM , for n = i

i = i+1 i

i

r_TRS_k_minus_1_prediction = x_TRS_k_minus_1_prediction(1) + r_SNDR;var_r_TRS_k_minus_1_prediction = X_TRS_k_minus_1_prediction(1,1);g_1 = G(1); g_2 = G(2); g_TRS = G(3:end);g_SUM = g_TRS + [g_2 zeros(1,length(g_TRS)-1)];a = (g_2 + sqrt(g_2^2 - 4*g_1))/2;b = (g_2 - sqrt(g_2^2 - 4*g_1))/2;l_h_n_plus_1 = (exp(b*h*(n+1))-exp(a*h*(n+1)))/(b-a);q1_h_n_plus_1 = (a*(exp(b*h*(n+1))-1) - b*(exp(a*h*(n+1))-1))/(a*b);q2_hn = (a*exp(b*h*n)*(exp(b*h)-1) - b*exp(a*h*n)*(exp(a*h)-1))/(a*b);


159

Next, equations (6.76) and (6.77) on page 128 are used to calculate the vector and the

scalar by the code:

The anti-run-dry algorithm will find the minimum value of MRCV,d by using Equation (6.78) on

page 130, which consists of several terms. The first term is calculated by the following code, by using variables from Code samples 7.10 and 7.11 and Equation (6.75) on page 128:

Code sample 7.11 Calculating variables for the first term of Equation (6.78)

Code sample 7.12 Calculating the first term of Equation (6.78)

q5 n h,( )

q6 n h,( )

q5_nh = 0; q6_nh = 0;for i = 0:n, q2_hi = (a*exp(b*h*i)*(exp(b*h)-1) - b*exp(a*h*i)*(exp(a*h)-1))/(a*b); q3_hi = exp(b*h*i)*(exp(b*h)-1) - exp(a*h*i)*(exp(a*h)-1); q4_hi = q3_hi*[1 zeros(1,size(A_TRS,2)-1)] - q2_hi*g_SUM; q5_nh = q5_nh + (1/(b-a))*q4_hi*(I+A_TRS*h)^(n-i); temp = 0; for j=1:n-i, temp = temp + (I+A_TRS*h)^(n-i-j)*C_TRS*V_TRS*((I+A_TRS*h)^(n-i-j)*C_TRS)'; end q6_nh = q6_nh + (1/(b-a))^2 * q4_hi* temp * q4_hi';end

temp = (1-q1_h_n_plus_1*g_1/(b-a))^2*var_M_VB_k_minus_1_prediction;var_M_PBPLR_k_plus_n = 3*temp + 3*q5_hn'*X_TRS_k_prediction*q5_hn) + 3*q6_hn + 3;term1 = c*sqrt(var_M_PBPLR_k_plus_n)*(b-a)/(g_1*q1_h_n_plus_1);


160

The rest of the terms of Equation (6.78) are calculated by using variables from Code sample 7.10 by the code:

The minimum value of MRCV,d can now be found by using the calculated terms from Code

samples 7.12 and 7.13, by the code:

7.3.7 Optimal controller

The optimal controller runs at each timestep, and as was shown in Figures 7.3 (p. 149) and 7.4 (p. 151), it receives the following inputs:

• Once initially: The three weight factors w1, w2 and w3

• Once initially, or (if the anti-run-dry algorithm i used) at timesteps when at least one media-unit has arrived: The desired receiver buffer level MRCV,d

Code sample 7.13 Calculating terms 2, 3, 4, 5 and 6 of Equation (6.78)

Code sample 7.14 Calculating the minimum value of MRCV,d from Equation (6.78)

term2 = -((b-a)/(g_1*q1_h_n_plus_1) - 1) * ... (M_PBPLR_k_minus_1 + M_VB_k_minus_1_estimate);term3 = r_PLR_k_minus_1*(exp(b*h*(n+1)) - exp(a*h*(n+1)))/(g_1*q1_h_n_plus_1);term4 = 0;term5 = 0;for i = 0:n, q2_hi = (a*exp(b*h*i)*(exp(b*h)-1) - b*exp(a*h*i)*(exp(a*h)-1))/(a*b); q3_hi = exp(b*h*i)*(exp(b*h)-1) - exp(a*h*i)*(exp(a*h)-1); q4_hi = q3_hi*[1 zeros(1,size(A_TRS,2)-1)] - q2_hi*g_SUM; term4 = term4 - (r_SNDR/(g_1*q1_h_n_plus_1)) * q3_hi; term5 = term5 - q4_hi*x_TRS_pred_matrix(:,n-i)/(g_1*q1_h_n_plus_1);endtemp_sum = 0;sum_r_TRS_p_i = 0;for i = 0:n-1, temp_sum = temp_sum + h*(r_SNDR + x_TRS_pred_matrix(1,i)); sum_r_TRS_p_i = sum_r_TRS_p_i + (r_SNDR + x_TRS_pred_matrix(1,i+1))endterm6 = mod(M_VB_k_prediction + h* sum_r_TRS_p_i,1) * (b-a)/(g_1*q1_h_n_plus_1);

M_RCV_desired =term1 + term2 + term3 + term4 + term5 + term6;


161

• Each timestep: An estimate of the transport segment state vector xTRS

• Each timestep: The number of media-units in the playoutbuffer• Each timestep: The number of media-units in the player buffer.• Each timestep: An estimate of the number of media-units in the virtual buffer.

By using some of the input values, the system state vector x can be calculated by the code:

In our program, the weight factors are determined before the media stream starts. However, it is easy to change the weight factors during the media stream, by re-calculating the gain matrix G of the optimal controller. Equations (4.40), (4.41), (4.42) on page 86, (4.47) on page 87 and (4.49) on page 87 are used to calculate G by the following code (where n_TRS is the number of states in the transport segment state vector):

If the buffer has not run dry, the optimal controller equation is implemented by the following code (where x is from Code sample 7.15 and G is from Code sample 7.16):

If the buffer has just been empty (either because the media stream has not yet started, or because the buffer has run dry), we stop the playout until the buffer level has reached the desired buffer

Code sample 7.15 Finding the system state vector x

Code sample 7.16 Calculation of the gain vector for the optimal controller

Code sample 7.17 Implementation of optimal controller equation

M_RCV_estimate = M_VB_estimate + M_PB + M_PLR;x1_estimate = M_RCV_estimate - M_RCV_d;x2 = r_PLR - r_SNDR;x = [x1_estimate; x2; x_TRS_estimate]’;

r12 = -sqrt(w_3*w_1);r22 = sqrt(w_3*(w_2 - 2*r12));r11 = -r12*r22/w_3;r2B = w_3*([r11 zeros(1,n_TRS-1)]+[r12 zeros(1,n_TRS-1)]*A_TRS)*... inv(eye(n_TRS)*r12 - w_3*A_TRS^2 + A_TRS*r22);G = -(1/w_3)*[r12 r22 r2B];

u = G * x; r_PLR_k = r_PLR_k_minus_1 + delta_t*u;


162

level. When the target level is reached, we start the player with a speed equal to the correct media speed, as shown in the following code:

When the buffer runs dry, the model of the two first states of the total state space model (Equa-

tion (3.29) on page 50) is not valid, but the model of the last states ( ) is still valid and esti-

mated by the Kalman filter also during the run-dry period. Since the total state space model has had a period where it was not valid, its first two states need to be re-initialized, and therefore, we cannot continue to use the value of r_PLR that the optimal controller made before the run-dry period. This is the reason why the player speed is set to r_SNDR when restarting the player.

Code sample 7.18 Initialization procedure to be used at start-up and when the buffer has run

dry.

if buffer_was_empty if x(1)< 0 % i.e. if M_RCV < M_RCV_d r_PLR = 0; else r_PLR = r_SNDR; % re-starting the player buffer_was_empty = 0; endend

xTRS

163

8 Results and discussion

The Matlab implementation described in Chapter 7, of the receiver system environment simu-lator and the transport segment simulator, is used in this section to evaluate and compare the optimal control algorithm deduced in Chapter 4 and the anti-run-dry algorithm deduced in Chapter 6 with existing playout control algorithms.

As shown in Figure 8.1, Sections 8.1 and 8.9 are essential for understanding this chapter, since both the different algorithms described in Section 8.1 and the different quality metrics described in Section 8.9 are used in the sections describing the essential results.

Sections 8.10 and 8.11 present the essential results by using standardized quality metrics. In Section 8.10, the optimal control algorithm is run on the same UDP transport segment traces as Ranganathan and Kilmartin used in their paper [42]. Thus, we have been able to directly compare the results of the optimal control algorithm to their results, by using PESQ (Perceptual Evaluation of Speech Quality), an objective voice quality measurement algorithm. Section 8.11 presents a subjective listening test that has been performed for both a simulated transport segment and a real transport segment, together with the corresponding PESQ results.

The simulated transport segment is presented in Section 8.2, and is used in Sections 8.4 to 8.7 and Section 8.11.

An explanation of the parameters used in this chapter is given in Section 8.3, which should therefore be read before the sections where these parameters are used (Sections 8.5 - 8.8).

Sections 8.4 to 8.8 contains many graphs and details to give a thorough understanding of how the different algorithms and different tunings of the optimal controller affect the buffer level, the total end-to-end delay and the playout speed. These sections can be skipped for readers only wanting to know the resulting quality of the different algorithms. Sections 8.4 to 8.7 shows results on a simulated transport segment. First, two existing playout algorithms are run in Section 8.4. (In Section 2.4, we claimed that the optimal controller is general, so that it can be tuned to act almost identical to existing controllers. This is shown in Sections 8.5.1.3 and 8.5.4.3 for these two existing algorithms.) Next, the results of different tuning parameters of the optimal controller and the anti-run-dry algorithm are shown in Sections 8.5 - 8.7. Section 8.8 shows the results of running the optimal controller on a real TCP transport segment trace.

Chapter 8 Results and discussion

164

.

This chapter places each simulation or experiment or closely related group of simulations or experiments in a separate section, where the parameters and results are located in the first subsection(s), and the discussion of the results are located in the last subsection. The reason for placing the discussions in separate subsections is to avoid the mixing between results and discussions.


8.1 Algorithms used

8.2 Quality metrics

8.3 Simulated transport segment

8.4. Parametres used in this chapter

8.5. Results when using existing playout algorithms

8.6 Results when using the optimal controller

8.7 Optimal control with erroneous transport segment model

8.8 Anti-run-dry algorithm

8.9 Results on real TCP transport segment trace

8.10 Results on real UDP transport segment traces

8.11 DMOS, PESQ and Arentz tests

8.12 Execution time

8.13 Comparison with newer algorithms

8.1 Algorithms used

8.2 Quality metrics










8.12 Execution time


8.1 Algorithms used

8.9 Quality metrics










8.12 Execution time


8.1 Algorithms used

165

8.1 Algorithms used

In this chapter, two existing playoutbuffer algorithms are compared with the optimal control algorithm (see Section 8.1.3) and the anti-run-dry algorithm (see Section 8.1.4). The two existing algorithms are the original version of Adaptive playout delay, described in Section 8.1.1, and one of the few published playout speed adjusting algorithms, described in Section 8.1.2.

8.1.1 Algorithm 1

Algorithm 1 is the original Adaptive Playout Delay algorithm explained in Section 1.3.2 (p. 11).

The reason for the choice of the original Adaptive Playout Delay is to demonstrate the principle of using a constant playout speed. Since all the versions of Adaptive Playout Delay uses a constant playout speed within a talkspurt, the original version is representative for demon-strating this principle.

Since we use only one talkspurt in our simulations, the results from the use of algorithm 1 are also valid for the Fixed Playout Delay algorithm. As long as there is only one talkspurt, changing K (and thus changing qj) for the Adaptive Playout Delay algorithm is the same as

changing d for the Fixed Playout Delay algorithm.

8.1.2 Algorithm 2

Algorithm 2 is the first algorithm used by the researchers at Stanford University. It is described in Section 1.3.3.1.

The simulations in this thesis use the suggested values and , i.e. a +/- 25% change of the time used to play each packet. This translates to a slowdown of 20%, (normal_play_time/(1.25*normal_play_time) = 0.8), and a speed-up of 33% (normal_play_time/(0.75*normal_play_time) = 1.33) of the playout speed.

s 1.25= f 0.75=


166

8.1.3 Algorithm 3

Algorithm 3 is the optimal control of playout speed, explained in Chapter 4 (p. 71). It is simu-lated with a range of different parameter settings for the desired buffer level and the weight factors.

8.1.4 Algorithm 4

Algorithm 4 is the anti-run-dry algorithm (explained in Chapter 6 (p. 103)) together with the optimal control of playout speed. This algorithm is simulated for different values of run-dry-probability and for different number of prediction steps.


This section describes the simulated transport segment used in Sections 8.4 - 8.12.

The simulated transport segment has a state space model like the one in Equation (3.26) on page 49, where:

(8.1)

and , (8.2)

where T1 = 0.3 s, T2 = 0.1 s, and the autoeffectspecter of the white noise is chosen such that the

standard deviation of rTRS is 3 media-units/s. Thus, behaves like a Markov

process that is driven by another Markov process instead of by white noise. The correct media

speed is set to 50 media-units/s.

The system variance of a second order Markov model is deduced in Appendix C.1 (p. 265).

Equation (C.18) on page 268 gives the relation between the variance of (the first state

in ) and the autoeffectspecter V of the system noise :

ATRS1 T1⁄– 1

0 1 T2⁄–=

CTRS01

=

rTRS rSNDR–

rSNDR

x1 TRS,

xTRS v t( )

8.3 Parameters used in this chapter

167

(8.3)

Thus, when our simulations used a standard deviation of 3 media-units/s for rTRS, the autoef-

fectspecter of the system noise was set to:

(8.4)

The simulated media stream consists of 400 media-units, i.e. 8 seconds, since this is sufficient to demonstrate and compare the different algorithms and their different parameter settings. Figure 8.2 shows the transport segment delay of each media-unit of the media stream.


For all results shown in this chapter, the controller was run each 10 ms, i.e. the period between each timestep was 10 ms, since at the time of the first simulations (year 2002), this was the lowest available timing resolution on standard PCs.

In this chapter, algorithms 3 and 4 use the start-up/initialization procedure explained in Section 4.4.1 (p. 79).

Figure 8.2:Transport segment delay for the simulated transport segment

V2 T2 T1+( )

T12T2

2--------------------------var x1 TRS,( )=

V2 T2 T1+( )

T12T2

2--------------------------var x1( ) 2 0.1s 0.3s+( )

0.3s( )2 0.1s( )2----------------------------------- 9 media-units/s( )2⋅

888.9 media-units2 s5⁄( )

= =

=

0 100 200 300 400-0.1

-0.050

0.05

media-unit number

s ab

ove

avg

λ TRS


168

8.3.1 Use of weight factors

8.3.1.1 Equations for weight factors

In Section 4.2 (p. 75), it was explained that Equation (4.7) can be used to find the value of each weight factor. For our system, the equivalent equations to Equation (4.7) are:

(8.5)

where is the acceptable value of ,

, (8.6)

where is the acceptable value of , and

(8.7)

where is the acceptable value of .

For all simulations in this chapter, the acceptable errors are first chosen, and then inserted into the above three equations to calculate the weight factors.

8.3.1.2 Numerical values of acceptable errors

In situations where we want the buffer level to be kept close to the desired buffer level, we want

to use a small value for . Since it is difficult to control the buffer level at a granularity that

is smaller than a media-unit, we choose 10 ms as the lowest value for .

If the buffer level (and hence the total delay) is not important for the user, she/he will be satisfied

even with a delay of several seconds. We therefore choose 10 s as the highest value for .

If the user wants a constant playout speed, she/he may be very sensitive to playout speed changes. According to [11], two tones must differ in frequency by 0.3 percent for the average person to perceive their differences (according to [38] this is true in the range of 1 to 4 kHz, where the ear is most sensitive to slight changes). Thus, if WSOLA (or other algorithms that can

w11

Δx1( )2-----------------=

Δx1 MRCV MRCV d,–

w21

Δx2( )2-----------------=

Δx2 rPLR rSNDR–

w31

Δu( )2--------------=

Δu r·PLR

Δx1

Δx1

Δx1


169

change the playoutspeed without changing the pitch) is not used, a change of playoutspeed of 0.3% can be perceived. For a playoutspeed of 50 media-units/s, this corresponds to a change of 0.15 media-units/s (and for a playoutspeed of 33.33 media-units/s, this corresponds to a change

of 0.1 media-units/s). We therefore choose 0.1 media-units/s as the lowest value for , i.e.

the acceptable deviation from the correct media speed is set to 0.1 media-units/s.

The playout speed will always have a certain importance for the user, hence the difference

between the minimum and maximum value of cannot be as large as for . In situations

where the playout speed is not important for the user, she/he still needs the playout speed to be close enough to the correct media speed to comprehend the information in the media. We thus

choose 25 media-units/s as the highest value for .

If the user wants the playout speed to be smooth, its derivative should be small. As stated above, a change of playoutspeed down to 0.3%, corresponding to a speed change of 0.1 media-units/s (for a playoutspeed of 33.33 media-units/s), may be perceived. By allowing one such change

per second, we choose 0.1 media-units/s2 as the smallest value of . This corresponds to a change in playout speed of only 0.001 media-units/s per timestep.

For situations where the change of playout speed has no importance to the users perceived

quality, we choose 25 media-units/s per timestep, i.e. 2500 media-units/s2 as the largest value

of . For more normal situations, we choose 10 media-units/s2 (or 0.1 media-units per

timestep) as the largest value of .

8.3.2 Use of graphs

Notice that the figures/graphs of buffer level in this chapter shows MPBPLR, and not MRCV, and

therefore usually shows a lower level than . The reason for this, is that MVB cannot be

measured, and therefore MRCV cannot be calculated or shown.

Notice also that the “total“ latency shown in the figures is , and not

. The reason for this is that only whole media-units can be timestamped, and thus

each media-unit is timestamped at the time it leaves the playoutbuffer and enters the player buffer. The graph for the “total“ latency is therefore the time from the media-unit is sent from

Δx2

Δx2 Δx1

Δx2

Δu

Δu

Δu

MRCV d,

λTRS λPBPLR+

λTRS λRCV+


170

the sender application in the sender machine, and until it is sent to the player in the receiver

machine, i.e. .

8.4 Results when using existing playout algorithms

8.4.1 Algorithm 1: Adaptive playout delay

8.4.1.1 Parameters and results

Algorithm 1 is described in Section 8.1.1. From equation (1.1) on page 12, we find the following equation for talkspurt j for the Adaptive Playout Delay:

(8.8)

where K is a positive constant, dj is the average delay, vj is the average deviation of the delay

and qj is the total delay of talkspurt j, so that each packet in talkspurt number j is played out

exactly qj ms after it was sent from the sender.

The Adaptive Playout Delay algorithm in this section has been simulated with K=1. The average deviation of the delay was measured as vj = 0.05s and hence K*vj = 0.05s.

λTRS λPBPLR+

qj dj Kvj+=

8.4 Results when using existing playout algorithms

171

We have used the simulated transport segment explained in Section 8.2 (p. 166).

8.4.1.2 Discussion

As can be seen from Figure 8.3 a), the total delay ( ) of the packets is kept at a

constant level, which means that each media-unit in the talkspurt is played out exactly qj ms

after it was sent from the sender.

Figure 8.3 a) shows that is below for the first 276 media-units. This means

that the first 276 media-units arrive before their deadline, and thus are played out correctly. This can be seen from Figure 8.3 c), where the media is played at the correct media speed during the first 5.6 seconds. During this period, the media content of the playoutbuffer and player, MPBPLR, corre-

sponds to the difference between and , as can be seen by comparing Figure

8.3 a) with Figure 8.3 b).

In the very beginning of the media transfer period, MPBPLR is increasing fast, as can be seen

from Figure 8.3 b). This is due to the adaptive playout delay algorithm, where the first packet is

played out at K*vj (which in this simulation is 0.05 s) + avg( ) after the media-unit was

sent from the sender. During the first few milliseconds of the transfer period, MPBPLR is

increasing because new packets are arriving from the transport segment, while the playout of the first packet has not yet started.

a) Magenta: Blue/bold: b) MPBPLR

c) rPLR

Figure 8.3:Fixed/adaptive playout delay on simulated transport segment

0 100 200 300 400-0.1

0

0.1

0.2

media-unit number

s ab

ove

avg

λ TRS

λTRS λTRS λPBPLR+

0 2 4 6 80

0.1

0.2

Time [s]

Leve

l [s]

0 2 4 6 80

204060

Time [s]

med

ia-u

nits

/s

λTRS λPBPLR+



λTRS


172

The input to the playoutbuffer are whole media-units from the transport segment, while the output from the player is a continuous media stream. This is the reason for the frequent changes of low amplitude that can be seen in Figure 8.3 b). A new media-unit from the transport segment corresponds to a 20 ms increase of MPBPLR (since rSNDR is 50 media-units/s). Since MPBPLR is

sampled every 10 ms, the player has played out approximately 0.5 media-unit (i.e. 10 ms) between each sample. This is the reason for the amplitude of 10 ms (not 20 ms) of the frequent changes of Figure 8.3 b).

Figure 8.3 a) shows that is larger than for media-units number 277 to

316 (40 media-units). These media-units are discarded, since they arrive after their deadline. This can be seen from Figure 8.3 c), where the playout speed drops to zero after 5.6 s. Media-units number 317 to 320 arrive in time, and Figure 8.3 c) shows that they are played at the correct media speed. The next 29 media-units are delayed, and are thus discarded. Since the adaptive playout delay algorithm discards all media-units arriving after their deadline, the play-outbuffer and the player buffer are empty during the two periods where media-units are delayed. A total of 69 media-units are discarded due to late arrivals.

The last 50 media-units arrive in time, and are played out at the correct media speed.

Choosing a larger value of K (giving a larger value of di) would result in a lower number of lost

media-units, at the cost of a higher total delay.

8.4.2 Algorithm 2


Algorithm 2 is described in Section 8.1.2. In this section, the target buffer level has been set to 2 media-units, which corresponds to 40 ms of playout time (since, as explained in Section 8.2,

= 50 media-units/s). Notice that the target buffer level for algorithm 2 is the desired


rSNDR


173

value of MPBPLR, while the target buffer level for the optimal control algorithm is the desired

value of MRCV (where MRCV = MVB + MPBPLR).

8.4.2.2 Discussion

Figure 8.4 c) shows that during the first half second, the playout speed is set to the low value 40 media-units/s. The reason for this can be seen from Figure 8.4 b), where the first half second is used to raise the buffer level to the desired value. After this first period, Figure 8.4 a) shows an added delay of about 40 ms, which is equal to the target buffer level. As long as the buffer level is below the target buffer level, the low value of playout speed is used. After the first half second, the amount in the playoutbuffer and player is kept very close to the target buffer level, as can be seen in Figure 8.4 b). The drawback of this algorithm is the extra introduced jitter, as can be seen from Figure 8.4 c), and the zoomed version in Figure 8.4 d). The playout speed switches very often between the low value 0.75*50 media-units/s = 40 media-units/s (used when the buffer level is below the target buffer level), and the high value 1.25*50 media-units/s = 62.5 media-units/s (used when the buffer level is above the target buffer level).


The optimal controller can be tuned by the use of three weight factors and the desired buffer level.


c) rPLR d) zoomed version of rPLR

Figure 8.4:Algorithm 2 on simulated transport segment

0 100 200 300 400-0.1

0

0.1

0.2

media-unit number

s ab

ove

avg

λ TRS


0 2 4 6 80

0.1

0.2

Time [s]

Leve

l [s]

0 2 4 6 80

204060

Time [s]

med

ia-u

nits

/s

0 2 4 6 840

50

60

Time [s]m

edia

-uni

ts/s


174

To demonstrate the effect of the weight factors, the first part (the first three sub-sections) of this section uses the weight factors to tune the optimal controller to fulfil only one of the user requirements described in Section 1.2. In Sections 8.5.1, 8.5.2 and 8.5.3, the optimal controller is tuned to fulfil requirements A, B and C, respectively.

To further demonstrate the effect of the weight factors, the second part of this section (the next three sub-sections) tunes the optimal controller to fulfil two requirements simultaneously. In Section 8.5.4, the optimal controller is tuned to fulfil requirements B and C, in Section 8.5.5 to fulfil requirements A and C, and in Section 8.5.6 to fulfil requirements A and B.

Section 8.5.7 tunes the optimal controller to fulfil all three requirements simultaneously.

Results from simulations with different settings of the desired buffer level are shown in Section 8.5.8.

8.5.1 Fulfilling requirement A only

Requirement A is to minimize the total delay (between the sender application and the playout part of the receiver application) for each part of the media. In the optimal controller, this is fulfilled by minimizing the difference between the buffer level and a desired buffer level,

. Thus, we use the smallest value (from the numerical values given in Section 8.3.1.2)

for , and the largest values for and as shown in Table 8.1. Since the total delay is

to be kept small, we have chosen a low value for , as shown in Table 8.1.

Fulfilling requirement A only, corresponds to algorithm 2, as described in Section 8.5.1.3.

In this section 8.5.1, we have used limits for the playout speed, at 20% above and below the correct media speed. This is not a part of the optimal control algorithm, but is used to make the simulation results more equal to algorithm 2, to be able to make a better comparison between the two algorithms.

MRCV d,

Δx1 Δx2 Δu

MRCV d,


175

8.5.1.1 Parameters and results.

8.5.1.2 Discussion

As can be seen from Figure 8.5 a), b) and d), the amount of media in the playoutbuffer and player, MPBPLR , is kept almost constant, and equal to about a half media-unit lower than the

desired buffer level (the difference of a half media-unit between MPBPLR and

is due to the amount of media in the virtual buffer).

Figure 8.5 b) and d) show that MPBPLR has two short periods with an increased value; the first

0.2 seconds, and between 1 and 1.7 seconds. This is due to the fact that the controller is run only

Parameter Value Weight factors

Acceptable error x1 10 ms = 0.01 s s-2

Acceptable error x2 25 media-units/s s2/media-units2

Acceptable error u 25 media-units/s per timestep, i.e. 2500 media-units/s2 s4/media-units2

2 media-units = 40 ms

Table 8.1: Parameters for trying to fulfil requirement A


c) rPLR d) zoomed version of MPBPLR

Figure 8.5:Results for trying to fulfil requirement A

w1� 10000=

w2� 0.0016=

w� 3 1.6 10 7–⋅=

MRCV d,

0 100 200 300 400-0.1

0

0.1

0.2

media-unit number

s ab

ove

avg

λ TRS


0 2 4 6 80

0.1

0.2

Time [s]

Leve

l [s]

0 2 4 6 80

204060

Time [s]

med

ia-u

nits

/s

0 2 4 6 80

0.02

0.04

Time [s]

Leve

l [s]

MRCV d, MRCV d,


176

every 10 ms, and that it tries to keep the total receiver buffer level equal to , which for

this simulation is set to 40 ms. If the estimate of the virtual buffer level is slightly less than one media-unit, at the same time as the actual virtual buffer level is slightly above one media-unit, the playoutbuffer will receive one media-unit that was not expected to arrive yet. As can be seen from Figure 8.6, the playoutbuffer level switches between 0 and 1 media-unit most of the time, while during the two periods mentioned above, it switches between 1 and 2 media-units. This happens at periods when the actual rate from the transport segment is slightly higher than the predicted rate. Figure 8.5 a) shows that the transport segment latency decreases, and therefore the transport segment rate increases during the two periods where the playoutbuffer level is increased. Figure 8.5 a) also shows that there are two short periods where the transport segment latency decreases without giving an increase in the playoutbuffer level. For these periods, the prediction of the rate from the transport segment, and thus of the amount of media in the virtual buffer is more correct than for the previous two periods.

Figure 8.6: The playoutbuffer level MPB

Since the receiver buffer level is kept close to the desired buffer level, requirement A is fulfilled. The high weight of requirement A, together with a positive desired buffer level makes sure that the buffer does not run dry.

The drawback of the weight factor setting in Table 8.1 is that a large amount of jitter is intro-duced, as can be seen from Figure 8.5 c). Since the playout speed jumps very often between 40 and 60 media-units/s, requirement B and C are very poorly fulfilled.

8.5.1.3 Comparison to algorithm 2

The aim of algorithm 2 is to keep the buffer level close to the target buffer level. Thus, require-ment A is the most important for algorithm 2. Requirements B and C are not important to algo-rithm 2.

MRCV d,

0 2 4 6 80

0.02

0.04

Time [s]

Leve

l [s]


177

When comparing the graphs of algorithm 2 (Figure 8.4 (p. 173)) with the graphs of the optimal controller when tuned to behave like algorithm 2 (Figure 8.5 (p. 175)), we find the following similarities and differences:

• Similarities:• Both algorithms keep the buffer level close to the target buffer level.

• For both algorithms, the player speed has frequent jumps between a value above the cor-

rect media speed (66.5 media-units/s for algorithm 2, and 60 media-units/s for the opti-

mal control algorithm) and a value below the correct media speed (40 media-units/s for

both algorithms). The reason for the difference in playout speed is that the limits we

made for our algorithm was set to 20% above and below the correct media speed. As

explained in Section 8.1.2 (p. 165), algorithm 2 sets the playout time of each media-unit

to 25% shorter or longer than the correct playout time. This corresponds to a slowdown

of 20% and a speed-up of 33% of the playout speed.

• Differences:• For algorithm 2, the playout speed is kept within the limits 66.5 and 40 media-units/s.

The original optimal controller does not use such limits. However, in this section 8.5.1,

such limits have been used also for the optimal controller.

8.5.2 Fulfilling requirement B only

Requirement B is to minimize the difference between the actual playout speed and the correct media speed. To fulfil requirement B, we use the smallest value (from the numerical values

given in Section 8.3.1.2) for , and the largest values for and , as shown in Table 8.1. Δx2 Δx1 Δu


178


8.5.2.2 Discussion

As long as the buffer does not run dry, the playout speed is very close to the correct media speed (50 media-units/s), as can be seen from Figure 8.7 c), and the zoomed-in version in Figure 8.7 d), thus requirement B is fulfilled. However, as can be seen from Figure 8.7 d), the playout speed is not smooth (it has frequent low amplitude changes), thus requirement C is not fulfilled. Figure 8.7 b) shows that the buffer level has large variations, hence requirement A is not fulfilled. Figure 8.7 a), b) and c) also show that the receiver buffer runs dry, and that the playout speed drops to zero when this happens. The playout is stopped until the buffer level reaches the


Acceptable error x1 10 s s-2

Acceptable error x2 0.1 media-units/s s2/media-units2

Acceptable error u 0.1 media-units/s per timestep, i.e. 10 media-units/s2

s4/media-units2


Table 8.2: Parameters for trying to fulfil requirement B



Figure 8.7:Results for trying to fulfil requirement B

w� 1 0.01=

w2� 100=

w� 3 0.01=

MRCV d,

0 100 200 300 400-0.1

0

0.1

0.2

media-unit number

s ab

ove

avg

λ TRS


0 2 4 6 80

0.1

0.2

Time [s]

Leve

l [s]

0 2 4 6 80

204060

Time [s]

med

ia-u

nits

/s

0 2 4 6 8

50

50.05

50.1

Time [s]

med

ia-u

nits

/s


179

desired buffer level, and then the playout starts with a playout speed equal to the correct media speed.

8.5.3 Fulfilling requirement C only

Requirement C is to change the playout speed as slowly as possible (i.e. the playout speed should be as smooth as possible). Thus, we use the smallest value (from the numerical values

given in Section 8.3.1.2) for and the largest value for a small acceptable error for and

, as shown in Table 8.2.





Acceptable error u 0.001 media-units/s per timestep, i.e. 0.1 media-units/s2

s4/media-units2


Table 8.3: Parameters for trying to fulfil requirement C

Δu Δx1

Δx2

w� 1 0.01=

w� 2 0.0016=

w3� 100=

MRCV d,


180

8.5.3.2 Discussion

As can be seen from Figure 8.8 d) (the zoomed-in version of Figure 8.8 c)), the playout speed is very smooth, with only slow changes of low amplitude, thus requirement C is fulfilled. This can be compared to Figure 8.7 d), where the playout speed is not as smooth, since it has frequent changes with a low amplitude, in addition to slow changes with a comparable amplitude to the one in Figure 8.8 d). Figure 8.8 e) shows that the rate of change of rPLR (which is the same as

the control variable u, to be minimized in this section) is kept below its acceptable error. Figure 8.8 c) and d) show that the playout speed is very close to the correct media speed (50 media-units/s), because it starts off correct, and since the derivative is close to zero it will not move away fast. Since w2 is small, it could drift slowly away from the correct media speed for a longer

trace. The buffer level has large variations, as can be seen from Figure 8.8 b), thus requirement A is not fulfilled.



e) time derivative of rPLR (zoomed)

Figure 8.8:Results for trying to fulfil requirement C

0 100 200 300 400-0.1

0

0.1

0.2

media-unit number

s ab

ove

avg

λ TRS


0 2 4 6 80

0.1

0.2

Time [s]

Leve

l [s]

0 2 4 6 80

204060

Time [s]

med

ia-u

nits

/s

0 2 4 6 8

50

50.05

50.1

Time [s]m

edia

-uni

ts/s

0 2 4 6 8-0.05

0

0.05

Time [s]

(med

ia-u

nits

/s)/s


181

8.5.4 Trying to fulfil requirements B and C

For non-interactive streaming of media, such as streaming of video or high-quality music, the users may be satisfied with the quality of service, even if they have to wait several seconds before the playout starts. In this scenario, requirement A is not important. However, the users may expect the playout speed to be kept smooth and equal to the correct media speed.

As is shown in Table 8.4, we use the smallest values (from the numerical values given in Section

8.3.1.2) for and , to fulfil requirements B and C, and the largest value for , since

requirement A is not important here.

Δx2 Δu Δx1


182


8.5.4.2 Discussion

As can be seen from Figure 8.9 c) and d), the playout speed is very close to the correct media speed (50 media-units/s), and at the same time, the playout speed is smooth. This can be





s4/media-units2


Table 8.4: Parameters for trying to fulfil requirements B and C


c) rPLR d)zoomed version of rPLR


Figure 8.9:Results for trying to fulfil requirements B and C

w� 1 0.01=

w� 2 100=

w� 3 100=

MRCV d,

0 100 200 300 400-0.1

0

0.1

0.2

media-unit number

s ab

ove

avg

λ TRS


0 2 4 6 80

0.1

0.2

Time [s]

Leve

l [s]

0 2 4 6 80

204060

Time [s]

med

ia-u

nits

/s

0 2 4 6 8

50

50.05

50.1

Time [s]

med

ia-u

nits

/s

0 2 4 6 8-0.05

0

0.05

Time [s]

(med

ia-u

nits

/s)/s


183

compared to Figure 8.7 d), where the weight factors were tuned to fulfil requirement B only, or to Figure 8.8 d), where the weight factors were tuned to fulfil requirement C only. Figure 8.8 e) shows that the rate of change of rPLR (which is the same as the control variable u, one of the

factors be minimized in this section) is kept below its acceptable error. Figure 8.9 d) shows a much smoother playout speed than Figure 8.7 d) (since it does not have the frequent changes shown in Figure 8.7 d)), and a playout speed closer to the correct media speed than Figure 8.8 d). Figure 8.9 b) shows a large variation of the buffer level, hence requirement A is not fulfilled.

8.5.4.3 Comparison to algorithm 1

By comparing the graphs of algorithm 1 (Figure 8.3 (p. 171)) with the graphs of the optimal controller when tuned to behave like algorithm 1 (Figure 8.9 (p. 182)), we find the following similarities and differences:

• Similarities:• The playout speed is constant, or close to constant as long as the buffer does not run dry.

• If the buffer runs dry, the media quality gets low for both algorithms. With algorithm 1,

the playout speed goes to zero for all media-units that arrive after their deadline, and for

the optimal control algorithm, the playout speed drops to zero for a short period (until the

buffer level reaches the desired buffer level), and then jumps back to the correct media

speed.

• Differences:• Algorithm 1 discards all packets that arrive after their deadline, but the optimal control

algorithm does not discard any packets.

• With algorithm 1, all media-units have the same delay between the sender and the

receiver.

8.5.5 Trying to fulfil requirements A and C

For interactive media, low latency (requirement A) is important. For some types of interactive media, it may be more important to keep the playout speed smooth than to keep it close to the correct media speed. Examples could be interactive jam sessions for music or interactive singing.


184

We use the smallest values (from the numerical values given in Section 8.3.1.2) for and

, to fulfil requirements A and C. Since requirement B is not important here, we use the

largest value for , as shown in Table 8.5.



Acceptable error x1 10 ms = 0.01s s-2



s4/media-units2


Table 8.5: Parameters for trying to fulfil requirements A and C


Figure 8.10:Results for trying to fulfil requirements A and C

Δx1

Δu

Δx2

w� 1 10000=

w� 2 0.0016=

w� 3 100=

MRCV d,

0 100 200 300 400-0.1

0

0.1

0.2

media-unit number

s ab

ove

avg

λ TRS


0 2 4 6 80

0.1

0.2

Time [s]

Leve

l [s]


185

8.5.5.2 Discussion

Figure 8.10 b) shows a quite stable buffer level, not too far from the desired buffer level, thus requirement A is fairly well fulfilled. Figure 8.10 c) and d) show that the playout speed is quite smooth, without very frequent changes, thus requirement C is also fairly well fulfilled, even though Figure 8.10 e) shows that the time derivative of rPLR is not kept within the acceptable

error we used.

Thus, requirements A and C are both relatively well fulfilled, but not as well as when the weight factors were tuned to fulfil only one of them.

Figure 8.10 d) shows that the playout speed is often quite far from the correct media speed (compared with simulations in previous sections), thus requirement B is not very well fulfilled.

8.5.6 Trying to fulfil requirements A and B

We use the smallest values (from the numerical values given in Section 8.3.1.2) for and

, to fulfil requirements A and B. Since requirement C is not important here, we use the

largest value for , as shown in Table 8.6.

c) rPLR d)zoomed version of rPLR


Figure 8.10:Results for trying to fulfil requirements A and C

0 2 4 6 80

204060

Time [s]

med

ia-u

nits

/s

0 2 4 6 845

50

55

Time [s]

med

ia-u

nits

/s

0 2 4 6 8-20

0

20

Time [s]

(med

ia-u

nits

/s)/s

Δx1

Δx2

Δu


186


Two simulations are used in this subsection.





s4/media-units2


Table 8.6: Parameters for simulation 1 in trying to fulfil requirements A and B



Figure 8.11:Results for simulation 1 in trying to fulfil requirements A and B

w� 1 10000=

w� 2 100=

w3� 0.01=

MRCV d,

0 100 200 300 400-0.1

0

0.1

0.2

media-unit number

s ab

ove

avg

λ TRS


0 2 4 6 80

0.1

0.2

Time [s]

Leve

l [s]

0 2 4 6 80

204060

Time [s]

med

ia-u

nits

/s

0 0.5 140

50

60

Time [s]

med

ia-u

nits

/s


187

8.5.6.2 Discussion

Requirements A and B are conflicting, and therefore difficult to fulfil simultaneously. Figure 8.11 and Figure 8.12 show two simulations with different weight factor settings. The first simu-lation (Figure 8.11) has a very large weight for requirement A (i.e. a small acceptable error for x1). Figure 8.11 b) shows a quite constant buffer level that is close to the required buffer level, thus requirement A is fulfilled, but the playout speed varies a lot, and although its short term mean value is close to the correct media speed, many of the peaks are at 40 and at 60 media-units/s, thus requirement B is not well fulfilled.





s4/media-units2


Table 8.7: Parameters for simulation 2 in trying to fulfil requirements A and B



Figure 8.12:Results for simulation 2 in trying to fulfil requirements A and B

w� 1 100=

w2� 100=

w� 3 0.01=

MRCV d,

0 100 200 300 400-0.1

0

0.1

0.2

media-unit number

s ab

ove

avg

λ TRS


0 2 4 6 80

0.1

0.2

Time [s]

Leve

l [s]

0 2 4 6 80

204060

Time [s]

med

ia-u

nits

/s

0 2 4 6 846485052

Time [s]

med

ia-u

nits

/s


188

The next simulation (Figure 8.12) has a lower weight for requirement A (i.e. a larger acceptable error for x1). Figure 8.12 b) shows that the buffer level is not as constant, and not as close to the desired buffer level, as in the first simulation. However, a comparison of Figure 8.11 d) and Figure 8.12 d) shows that the second simulation has a much lower amplitude of the frequent playout speed changes than the first simulation. Thus, in the last simulation, we have found a relatively good compromise between requirements A and C. Figure 8.12 b) shows that the error of x1 is kept within its acceptable level, but Figure 8.12 d) shows that the error of x2 is far out of its acceptable level. If we had used a lower weight for requirement A (as in Figure 8.8, where the weight factors were tuned to fulfil requirement C), requirement C would have been better fulfilled.

8.5.7 Trying to fulfil all three requirements

In this section, we use normally acceptable errors, as shown in Table 8.8. The reason for not using very small acceptable errors for all of x1, x2, and u at the same time, is that the acceptable

errors are used to weigh the importance of each of these requirements compared to the other requirements. Only their relative magnitude of the weight factors matters.


189


8.5.7.2 Discussion

Figure 8.13 b) shows that the buffer level is kept relatively constant and close to the desired buffer level, thus requirement A is fairly well fulfilled, although not as good as in Figure 8.5 b), where all the weight factors were tuned to fulfil requirement A only.

Figure 8.13 c) shows that the playout speed is kept relatively close to the correct sender speed, so that requirement B is relatively well fulfilled. Figure 8.13 d) shows that the playout speed is not completely smooth, but much smoother than for simulations with low weight for require-ment C, such as Figures 8.11 and 8.12. Thus, requirement C is relatively well fulfilled.


Acceptable error x1 1 media-unit = 20 ms = 0.02 s s-2



s4/media-units2


Table 8.8: Parameters for trying to fulfil requirements A, B and C



Figure 8.13:Results for trying to fulfil requirements A, B and C

w� 1 2500=

w� 2 0.04=

w� 3 1=

MRCV d,

0 100 200 300 400-0.1

0

0.1

0.2

media-unit number

s ab

ove

avg

λ TRS


0 2 4 6 80

0.1

0.2

Time [s]

Leve

l [s]

0 2 4 6 80

204060

Time [s]

med

ia-u

nits

/s

6 6.5 7

49.550

50.5

Time [s]

med

ia-u

nits

/s


190

Figure 8.13 shows that both x1, x2 and u are kept within their acceptable error limits, and that

the weight factor setting gives a total result that is a compromise between requirements A, B and C.

8.5.8 Varying the desired receiver buffer level

When using the optimal control algorithm without the anti-run-dry algorithm, the user needs to

decide the desired buffer level in addition to the weight factors. The desired buffer

level should not be chosen too large, since this will introduce unnecessary delay, or to small, since this will increase the chance that the buffer will run dry.


In the simulations in this section, the weight factors are set to the same values as in Section 8.5.7.

In the following simulation is set to 2 media-units = 40 ms.



Figure 8.14:Results for simulation 1 in setting the desired buffer level

MRCV d,

MRCV d,

0 100 200 300 400-0.1

0

0.1

0.2

media-unit number

s ab

ove

avg

λ TRS


0 2 4 6 80

0.1

0.2

Time [s]

Leve

l [s]

0 2 4 6 80

204060

Time [s]

med

ia-u

nits

/s

0 2 4 6 840

50

60

Time [s]

med

ia-u

nits

/s


191

In the next simulation, is set to 1.5 media-units = 30 ms.

8.5.8.2 Discussion

Figures 8.13, 8.14 and 8.15 use the same weight factor settings, but different desired buffer

levels. Figure 8.13 a) and b) show that with = 80 ms, a quite large amount of delay is

introduced. Figure 8.14 b) shows that with = 40 ms, the buffer level is kept low, but

the buffer does not run dry. By lowering by only 10 ms, to 30 ms, the buffer runs dry

several times, as shown in Figures 8.15 b) and c). Since the best choice of can be diffi-

cult to find, we can see the need for an anti-run-dry mechanism that finds the correct level of

to avoid that the buffer runs dry. Section 8.7 will show that the anti-run-dry algorithm

reduces the number of run-dry incidents.


The previous sections have used the correct transport segment model. In this section, we will use an erroneous transport segment model, to examine the robustness of the optimal controller to errors in the transport segment model.



Figure 8.15:Results for simulation 2 in setting the desired buffer level

MRCV d,

0 100 200 300 400-0.1

0

0.1

0.2

media-unit number

s ab

ove

avg

λ TRS


0 2 4 6 80

0.1

0.2

Time [s]

Leve

l [s]

0 2 4 6 80

204060

Time [s]

med

ia-u

nits

/s

0 2 4 6 840

50

60

Time [s]m

edia

-uni

ts/s

MRCV d,

MRCV d,

MRCV d,

MRCV d,

MRCV d,


192

This section uses the same transport segment trace file as the previous sections (the transport segment is explained in Section 8.2 (p. 166)).

8.6.1 The model

The true transport segment model is a second order model. The erroneous model used here, is simpler, since it is a first order model. We have chosen to use the first order Markov model, with

and . We have also used a wrong order of magnitude for the time

constant, and chosen T1 = 3. The variance of a first order Markov process has been deduced

several places, e.g in appendix A.2 of [13], as , and thus . The

autoeffectspecter of the white noise is chosen such that the standard deviation of rTRS is 1

media-unit/s.

8.6.2 Parameters and results

The simulations in this section use the same weight factor settings as Sections 8.5.7 and 8.5.8. The desired buffer level is set to 2 media-units = 40 ms.



Figure 8.16:Results for using a wrong transport segment model

ATRS1

T1------–= CTRS 1=

σx2 T

2---V t( )= V t( ) 2

T1------σ

rTRS

2=

0 100 200 300 400-0.1

0

0.1

0.2

media-unit number

s ab

ove

avg

λ TRS


0 2 4 6 80

0.1

0.2

Time [s]

Leve

l [s]

0 2 4 6 80

204060

Time [s]

med

ia-u

nits

/s

0 2 4 6 840

50

60

Time [s]

med

ia-u

nits

/s


193

8.6.3 Discussion

The same parameters are used for the simulations in Figure 8.14 and Figure 8.16. The only difference between these simulations is that the first one uses a correct transport segment model, while the second one uses the wrong model given in Section 8.6.1. Figures 8.16 shows that both the delay, the buffer level and the playout speed behave relatively similar to the simulation with the correct model. However, by comparing Figure 8.16 d) with Figure 8.14 d), the drawback of using the wrong transport segment model is shown in that the playout speed has peaks that are further away from the correct media speed than when the correct model was used.

In addition to the test described above, we have used the following transport segment models to test the robustness of the optimal control algorithm: • 2nd order Markov model, i.e. a correct model, but with wrong time constants. The

following time constants were used: • T1 = 0.01 and T2 = 0.03

• T1 = 0.01 and T2 = 0.01

• T1 = 0.01 and T2 = 0.01

• T1 = 1 and T2 = 1

• T1 = 1 and T2 = 3

• Oscillator model (equal to the one described in Section 8.8.4.2 (p. 208))

All these tests gave a lower performance than with the correct model, but with a quite stable playout speed and a maximum of one run-dry incident. The tests show that the algorithm is robust and performs well even when it uses a transport segment model far from the correct model.


The anti-run-dry algorithm deduced in Chapter 6 (p. 103) lets the user specify the maximum probability p that the buffer will run dry. The run-dry-probability can also be given by the number of sigma c, which is what we do in this section. The relation between p and c is given by equation (6.5) on page 110.

As can be seen from equation (6.78) on page 130, the anti-run-dry algorithm calculates a minimum value for MRCV,d . By giving MRCV,d a value that is equal to or larger than the value

from the anti-run-dry algorithm, the run-dry-probability will be kept at or below p. At times when the run-dry-probability is very low, the anti-run-dry algorithm may give a zero value or


194

negative value for the minimum MRCV,d . At these times, the anti-run-dry algorithm is not really

necessary, since the run-dry-probability is low without increasing MRCV,d. In this section, we do

not want to use negative or zero values for MRCV,d , thus the value of from the anti-

run-dry algorithm is used only if it is above a small minimum value of . The anti-run-

dry algorithm result is used only if there is a risk that the run-dry-probability will get higher than p.

In this section, we have used a limit for the playout speed, at 20% below the correct media speed.

8.7.1 Parameters and results

Since we would like to have a good compromise between the requirements A, B and C, we use the same weight factors settings as Section 8.5.7.

The simulations use different values of c (the relation between c and the run-dry-probability p

is given by Equation (6.5) on page 110, where c is the number of (standard deviation) above the run-dry limit), number of prediction steps (explained in Section 6.7.2 (p. 133)) and the

minimum value of .

Figure 8.17 shows the results of varying the parameter c for a low buffer level. In Figure 8.17,

the number of prediction steps = 10, and the minimum value of is set to 0.5 media-

units = 10 ms. The value of c ranges from 0.1 to 1.


Acceptable error x1 1 media-unit = 20 ms = 0.02 s s-2



s4/media-units2

Table 8.9: Parameters for anti-run-dry

MRCV d,

MRCV d,

w� 1 2500=

w2� 0.04=

w� 3 1=

σ

MRCV d,

MRCV d,


195

Figure 8.18 shows the result of varying the parameter c for a higher buffer level. In Figure 8.18, the number of prediction steps = 10 and the value of c ranges from 0.1 to 1. The minimum value

a) rPLR

b) MPBPLR

Figure 8.17:Results for anti-run-dry for a range of values of c, and low buffer level

0246810

0

0.5

1

0

10

20

30

40

50

60

Time [s]c

r PLR [m

edia

-uni

ts/s

]

02

46

8

0

0.2

0.4

0.6

0.8

1

0

0.02

0.04

0.06

0.08

Time [s]c

MPB

PLR [s

]


196

of is set to 1.5 media-units = 30 ms. Please note that in Figure 8.18, the horizontal axes

run the opposite way of Figure 8.17, to place the playoutspeed with the smallest amplitude vari-ations in front of the figure.

a) rPLR

b) MPBPLR

Figure 8.18:Results for anti-run-dry for a range of values of c and higher buffer level

MRCV d,

0 2 4 6 8 10

0

0.5

140

45

50

55

60

Time [s]c

r PLR [m

edia

-uni

ts/s

]

02

46

8

00.2

0.40.6

0.81

0

0.02

0.04

0.06

0.08

Time [s]c

MPB

PLR [s

]


197

In Figure 8.19, the minimum value of is set to 0.5 media-units = 10 ms, the value of c

is 0.3 and the number of prediction steps ranges from 1 to 13.

a) rPLR

b) MPBPLR

Figure 8.19:Results for anti-run-dry for number of prediction steps between 1 and 13

MRCV d,

0246

810

0

5

10

15

0

10

20

30

40

50

60

Time [s]number of prediction steps

r PLR [m

edia

-uni

ts/s

]

02

46

8

0

5

10

15

0

0.02

0.04

0.06

Time [s]number of prediction steps

MPB

PLR [s

]


198

In Figure 8.20, the value of c is 0.3, the minimum value of is set to 0.5 media-units =

10 ms and the number of prediction steps ranges from 13 to 80.

a) rPLR

b) MPBPLR

Figure 8.20:Results for anti-run-dry for number of prediction steps between 13 and 80

MRCV d,

0246810

0

50

100

40

45

50

55

60

number of prediction steps

Time [s]

r PLR [m

edia

-uni

ts/s

]

02

46

8

0

20

40

60

80

0

0.02

0.04

0.06

number of prediction stepsTime [s]

MPB

PLR [s

]


199

In Figure 8.21, the value of c is 0.3, the number of prediction steps is 5 and the minimum value

of ranges from 0.75 media-units = 15 ms to 2 media-units = 40 ms.

a) rPLR

b) MPBPLR

Figure 8.21:Results for anti-run-dry for a range of minimum values of MRCV,desired

MRCV d,

0246810

0.5

1

1.5

2

40

45

50

55

60

Min of M RCV ,des ired

Time [s ]

r PLR [m

edia

-uni

ts/s

]

02

46

8 0.5

1

1.5

2

0

0.02

0.04

0.06

0.08

Min of MRCV,desiredTime [s]

MPB

PLR [s

]


200

8.7.2 Discussion

To guarantee that the run-dry-probability does not exceed p during the prediction period, the anti-run-dry algorithm gives a quite conservative value for MRCV,d, thus the actual run-dry-

probability will be lower than p.

Figure 8.17 a) shows that for low values of c (with a high run-dry probability), the buffer runs dry several times, but the playoutspeed is relatively smooth between each run-dry event. For higher values of c, the buffer does not run dry, but the playoutspeed has several drops where it drops to a low level and jumps back again shortly after. For each such drop, the anti-run-dry algorithm decides that the run-dry-probability may be too high if we continue to play with the current speed, thus the playout speed is lowered (by increasing MRCV,d). Shortly after, when the

risk of getting a too high run-dry-probability decreases, the playout speed is allowed to go back to its previous value.

Figure 8.17 b) shows that the buffer level increases when the values of c increases. Lower values of c gives lower buffer levels and thus a higher run-dry probability.

Figure 8.18 a) shows that when the minimum value of MRCV,d is increased to 1.5 media-units =

30 ms, the buffer does not run dry when the value of c is in the range 0.1 to 1. Please notice that the horizontal axes in Figure 8.18 a) and Figure 8.17 a) have opposite directions, and that the vertical axes in these figures have different scaling. For low values of c, the playoutspeed is equal to the playoutspeed for the optimal controller without the anti-run-dry algorithm, but for higher values of c, the playoutspeed has larger variation due to several situations where the anti-run-dry algorithm reduces the playoutspeed to avoid that the buffer runs dry. As Figure 8.17 b), Figure 8.18 b) shows that higher values of c gives higher buffer levels.

Thus, low values of c (i.e. high values of the run-dry probability) increases the run-dry proba-bility (as shown in Figure 8.17 a)) but also gives a smoother playoutspeed between the run-dry incidents. Very low values of c result in a playoutspeed behaviour that is close to the playout-speed from the optimal controller without the run-dry algorithm. Higher values of c give a better protection against run-dry incidents, but (as shown in figures 8.17 a) and 8.18 a)) also increases the receiver buffer level and the number of incidents where the playoutspeed drops to a low level for a short time period.

Figure 8.19 a) shows that by increasing the number of prediction steps, the number of run-dry events decreases. Figure 8.20 a) shows that when the number of prediction steps is further increased, the playoutspeed is smoother, without the drops where it, for lower number of predic-


201

tion steps, was reduced to avoid a run-dry event. Figures 8.19 b) and 8.20 b) show that the buffer level is not very dependent upon the number of prediction steps. Thus, for a correct transport segment model, a high number of prediction steps gives the best result. This is because the run-dry incidents can be predicted a long time in advance, and thus prevented without having to abruptly reduce the playoutspeed.

Figure 8.21 a) shows that by increasing the minimum value of MRCV,d, the playoutspeed

becomes smoother, since there are fewer events where the anti-run-dry algorithm needs to reduce the playoutspeed to avoid a future run-dry event. Figure 8.21 b) shows, as expected, that increasing the minimum value of MRCV,d also increases the buffer level.

The best parameters to use for the anti-run-dry algorithm are dependent upon the application and user requirements. A high value of c gives the best protection against run-dry incidents, but for a low number of prediction steps, this also gives a higher receiver buffer level and a less smooth playoutspeed. The playoutspeed gets smoother if the minimum value MRCV,d is

increased, but this also increases the receiver buffer level. However, if the transport segment model is good, a high number of prediction steps results in a smoother playoutspeed even with low values of c and of the minimum value of MRCV,d, without increasing the receiver buffer

level. Thus, ideally, a very high number of prediction steps should be used, but this increases the number of calculations and therefore the execution time. Section 8.12 shows execution times for a range of number of prediction steps.


This section runs the optimal control algorithm on a real TCP transport segment trace. Section 8.8.1 presents the trace. Algorithms 1 and 2 are run on this trace in Sections 8.8.2 and 8.8.3, and the optimal control algorithm is run on the trace in Section 8.8.4. The anti-run-dry algorithm is run in Section 8.8.5.

8.8.1 Real TCP transport segment trace

We have sent a stream of media-units between Kjeller, Norway and Oregon, USA, by using TCP over IP. Since the sender application sent a new media-unit each 30 ms, the correct media speed of the stream is 33.33 media-units/s.


202

Since interactive data is usually sent with UDP over the internet, we first tried to use UDP for our real transport segment trace. The network connection was very good, and thus, the measured UDP trace had very little jitter. To get a trace with more jitter, and hence more need for a good playoutbuffer algorithm, we chose to use TCP. Section 8.10 presents results for UDP traces with more jitter than what we measured for UDP.

Figure 8.22 shows the transport segment delay of each media-unit.

Figure 8.22:Transport segment delay for TCP transport segment trace

0 50 100 150 200 250 300 350 400-0.05

0

0.05

0.1

0.15

s ab

ove

avg

λ TRS


203

8.8.2 Algorithm 1


Algorithm 1, Adaptive Delay algorithm, is described in Section 8.1.1. In this section, it has been simulated with K*vj = 0.05s. In this first experiment, 20 media-units were lost because they

missed their deadline.


c) rPLR

Figure 8.23:Results for experiment 1 of adaptive playout delay

0 100 200 300 400-0.1

0

0.1

0.2

media-unit number

s ab

ove

avg

λ TRS


0 5 100

0.1

0.2

Time [s]

Leve

l [s]

0 5 100

204060

Time [s]

med

ia-u

nits

/s


204

In the next experiment, we have used K*vj = 0.04s. In this experiment, 71 media-units were lost

because they arrived after their deadline.

8.8.2.2 Discussion

As can be seen from Figures 8.23 a) and 8.24 a), the delay of the packets is kept at a constant level. Figure 8.23 a) and c) show that during the first second, most of the packets are lost because they arrive later than their scheduled playout-time. Figure 8.24 a), c) and d) show that the reduced value of K*vj in the second experiment lead to many lost packets, and we can also

see that the playout speed drops to zero each time a packet is lost.

8.8.3 Algorithm 2


Algorithm 2 is described in Section 8.1.2. In the first experiment, the target buffer level has been set to 4 media-units, which equals 120 ms (since, as explained in Section 8.8.1, the size of each media-unit is 30 ms for the real transport segment trace). Notice that the target buffer level for


c) rPLR d) zoomed version of a)

Figure 8.24:Results for experiment 2 of adaptive playout delay

0 100 200 300 400-0.1

0

0.1

0.2

media-unit number

s ab

ove

avg

λ TRS


0 5 100

0.1

0.2

Time [s]

Leve

l [s]

0 5 100

204060

Time [s]

med

ia-u

nits

/s

0 100 200 300 4000.0340.0360.038

0.040.042

media-unit number

s ab

ove

avg

λ TRS


205

algorithm 2 is the desired value of MPBPLR (see Section 8.1.2 (p. 165)), while the target buffer

level for the optimal control algorithm is the desired value of MRCV.

In the next experiment, the target buffer level is set to 1.5 media-units = 45 ms.

8.8.3.2 Discussion

Figure 8.25 a), b) and c) show that the buffer runs dry twice during the first half second, and by looking at the Figure 8.25 d), we can see that the playout quality is very low during the first half



Figure 8.25:Results for experiment 1 of algorithm 2

a) Magenta: Blue/bold:

b) MPBPLR


Figure 8.26:Results for experiment 2 of algorithm 2

0 100 200 300 400-0.1

0

0.1

0.2

media-unit number

s ab

ove

avg

λ TRS


0 5 100

0.1

0.2

Time [s]

Leve

l [s]

0 5 100

204060

Time [s]

med

ia-u

nits

/s

0 0.5 1 1.5 2 2.50

20

40

60

Time [s]m

edia

-uni

ts/s

0 100 200 300 400-0.1

0

0.1

0.2

media-unit number

s ab

ove

avg

λ TRS


0 5 100

0.1

0.2

Time [s]

Leve

l [s]

0 5 100

204060

Time [s]

med

ia-u

nits

/s

0 0.5 1 1.5 2 2.50

20

40

60

Time [s]

med

ia-u

nits

/s


206

second. For the rest of the experiment period, the playout speed is shown to behave in the same manner as with the simulated transport segment (see Figure 8.4 (p. 173)).

Figure 8.26 a), b) and c) show that with a too low value of the target buffer level, the playout-buffer runs dry very often, giving a drop to zero in the playout speed each time the buffer runs dry. Because of its frequent drop to zero, the playout speed shown in Figure 8.26 c) and d) has a very low quality for the entire duration of the talkspurt.

8.8.4 Algorithm 3: Optimal control of playout speed

Since we use a real transport segment trace, the true state space model for the transport segment is not known. This section shows the results of running the optimal control algorithm with two different transport segment models; a second order Markov model (in Section 8.8.4.1) and an oscillator model (in Section 8.8.4.2).

8.8.4.1 Parameters and results for second order Markov model

The transport segment model used in this section is the same as was used for the simulated trans-port segment trace file (given by Equations (8.1) and (8.2) on page 166). The parameters T1 and

T2 have been determined by use of Matlab Identification Toolbox as T1 = 0.03 s, T2 = 0.01 s.

The acceptable error for x1 is chosen as a larger value than for the simulated transport segment,

since the transport segment speed from the real transport segment trace has large variations that

in turn will give large variations in the playoutbuffer level. Increasing gives us the chance

to decrease (compared to what was chosen in Section 8.5.7). Thus, the acceptable error for

Δx1

Δx2


207

x2 is chosen as 1 media-unit/s. The acceptable error for u is kept equal to what was chosen for

the simulated transport segment in Section 8.5.7, where we tried to fulfil all three requirements.


Acceptable error x1 3 media-units = 90 ms = 0.09 s s-2



s4/media-units2


Table 8.10: Parameters for optimal control (with second order Markov model) of real transport segment


c) rPLRd) zoomed version of rPLR

Figure 8.27:Results for experiment 1 of optimal control with second order Markov model

w� 1 123.5=

w� 2 1=

w� 3 1=

MRCV d,

0 100 200 300 400-0.1

0

0.1

0.2

media-unit number

s ab

ove

avg

λ TRS


0 5 100

0.1

0.2

Time [s]

Leve

l [s]

0 5 100

204060

Time [s]

med

ia-u

nits

/s

0 5 1025

30

35

40

Time [s]

med

ia-u

nits

/s


208

In the next experiment, has been changed to 4 media-units.

8.8.4.2 Parameters and results for oscillator model

By looking at the real transport segment trace, we have found an oscillatory behaviour, and thus, we have chosen to use an oscillatory model for the transport segment. This subsection uses a state space model like the one in Equation (3.26) on page 49, where:

, (8.9)

and . (8.10)

The real transport segment trace showed 5 oscillations of rTRS per second, thus we use

rad/s. The parameter damp was chosen by the use of Matlab’s System Identi-

fication Toolbox as , which means that the system is not damped (since the oscilla-tions do not decrease over time). By use of Matlab, the standard deviation of rTRS was measured

to be 15 media-units/s, and therefore, the auto effect spectre of the white noise is chosen such that the standard deviation of rTRS is 15 media-units/s.


c) rPLRd) zoomed version of rPLR

Figure 8.28:Results for experiment 2 of optimal control with second order Markov model

MRCV d,

0 100 200 300 400-0.1

0

0.1

0.2

media-unit number

s ab

ove

avg

λ TRS


0 5 100

0.1

0.2

Time [s]

Leve

l [s]

0 5 100

204060

Time [s]

med

ia-u

nits

/s

0 5 1025

30

35

40

Time [s]m

edia

-uni

ts/s

ATRS0 1

ωrTRS

2– damp–=

CTRS01

=

ωrTRS5 2π⋅=

damp 0=


209

The weight factors are equal to the ones used for the second order Markov model in Section 8.8.4.1. Since the oscillator model describes the transport segment better than the second order Markov model, we can choose a lower desired receiver buffer level for the oscillator model.

Thus, is chosen to be 2 media-units.

8.8.4.3 Discussion

Figures 8.27 d), 8.28 d) and 8.29 d) show that the behaviour of the playout speed is not very dependent on the choice of transport segment model. The playout speed has a larger variation for the oscillator model, but after the first large oscillations, its error is not larger than the accept-able error that was set to 1 media-unit/s. For the second order Markov model, Figures 8.27 d), 8.28 d) show that increasing the desired buffer level from 3 to 4 media-units avoids the run-dry



Acceptable error x2 1 media-unit/s s2/media-units2


s4/media-units2


Table 8.11: Parameters for optimal control (with oscillator model) of real transport segment



Figure 8.29:Results for optimal control with oscillator model

MRCV d,

w� 1 123.5=

w� 2 1=

w� 3 1=

MRCV d,

0 100 200 300 400-0.1

0

0.1

0.2

media-unit number

s ab

ove

avg

λ TRS


0 5 100

0.1

0.2

Time [s]

Leve

l [s]

0 5 100

20

40

60

Time [s]

med

ia-u

nits

/s

0 5 1025

30

35

40

Time [s]

med

ia-u

nits

/s


210

incident at approximately 0.5 seconds after the start of the trace. Since the two very different models both result in an algorithm with satisfying behaviour, we can conclude that the optimal controller is quite robust to the choice of model.

8.8.5 Algorithm 4: optimal control with anti-run-dry

This section shows the results of running the optimal control with anti-run-dry algorithm, with the same oscillator model as in Section 8.8.4.2.

The previous section showed that the optimal control algorithm is very robust with regard to the transport segment model. The anti-run-dry algorithm needs a model that has some resemblance with the real transport segment. This is the reason why only the oscillator model is used in this section.


The weight factor settings from Section 8.8.4.2 are used. The relation between c in the table below and the run-dry-probability p is given by Equation (6.5) on page 110.





s4/media-units2

min. 2 media-units = 60 ms

anti_run_dry_C 0.3 sigma

n_of_prediction_steps

25

Table 8.12: Parameters for anti-run-dry (with oscillator model) of real transport segment

w� 1 123.5=

w� 2 1=

w� 3 1=

MRCV d,

8.9 Quality metrics

211

8.8.5.2 Discussion

As shown in Figure 8.30 a) and b), the delay from the playoutbuffer and player is kept relatively low. Figure 8.30 c) and d) show that the playout speed is relatively stable most of the time, but has some 20% drops where the anti-run-dry algorithm decreases the playoutspeed to prevent the buffer from running dry.

8.9 Quality metrics

This section describes the quality metrics used in Section 8.10 and Section 8.11.

Most published playout buffer algorithms use a deadline, and discards packets that arrive after the deadline. The rate of discarded packets to the total number of packets is called the late packet loss rate. The most commonly used quality metrics for playout buffer algorithms are this late packet loss rate, and additional buffering delay. The difference between late packet loss rate

and run-dry rate is described in Section 8.9.3. We use as the additional buffering

delay metric.



Figure 8.30:Results for anti-run-dry with oscillator model

0 100 200 300 400-0.1

0

0.1

0.2

media-unit number

s ab

ove

avg

λ TRS


0 5 100

0.1

0.2

Time [s]

Leve

l [s]

0 5 100

204060

Time [s]

med

ia-u

nits

/s

0 5 1025

30

35

40

Time [s]

med

ia-u

nits

/s

mean MPB( )


212

Some papers also use MOS (Mean Opinion Score) [15] or DMOS (Degradation MOS), which are subjective listening tests described in Section 8.9.1, or PESQ [16], described in Section 8.9.2, which is a more objective technique with limitations regarding the playout speed.

Section 8.9.4 describes a dissimilarity measure for Query-by-Humming systems developed by Arentz et al. [3] which in this thesis will be used and compared to the scores from DMOS and PESQ.

8.9.1 MOS and DMOS

ITU-T Recommendation P.800 [15] describes various methods for subjective determination of transmission quality. One of the most commonly used scores of this recommendation is the MOS score (described in Annex B, section B.4.5 a) in [15]), where subjects are asked to rate the voice quality according to Table 8.13.

The MOS score tends to lead to low sensitivity in distinguishing among good quality circuits. A modified version, called the DCR (Degradation Category Rating), described in Annex D of [15], affords higher sensitivity. The DCR procedure, which uses an annoyance scale and a quality reference before each configuration is to be evaluated, is described in [15] as being suit-able for evaluating good quality speech. The quantity evaluated from the scores is represented by the symbol DMOS (Degradation Mean Opinion Score). In this test, the test persons first hear the correct sound, followed by a short period of silence, and then the output sound from the

Quality of the speech Score

Excellent 5

Good 4

Fair 3

Poor 2

Bad 1

Table 8.13: MOS quality scale, defined by Annex B of [15]

8.9 Quality metrics

213

system to be tested. The subjects are asked to rate the difference between the two sound samples according to the five point degradation category scale in Table 8.14.

In order to perform DMOS and PESQ (see Section 8.9.2) tests, we have made sound files that are scaled according to the playout speed output of different algorithms, and run both a subjec-tive (DMOS) and an objective (PESQ) quality test on these files. The results are presented in Section 8.11 (p. 228).

The following sound samples were collected:• 4 voice samples: two male and two female voices, each saying the same sentences. These

were collected from a database of Norwegian dialect samples [37], consisting of sound samples of persons with different Norwegian dialects reading the same short history.

• 2 music samples: a sample of pop music (from David Byrne’s “Like humans do“) and a sample of classical music (from Beethoven’s 9th symphony)

WSOLA was used to scale the sound samples according to the playout speed of different algo-rithms (we programmed WSOLA in MATLAB[29], based on the C++ implementation used by Ranganathan and Kilmartin [42]).

Incidents of packet loss or run-dry (where the playout speed is zero) is replaced by a low ampli-tude white noise.

For each combination of the 6 different sound samples, 3 different algorithms and 2 different transport segments, we made a testing sample consisting of the correct sound, followed by one second of silence, and then the output sound from the algorithm and transport segment to be tested. For each of the 6 sound samples, we also made a “perfect“ testing sample, where the sound samples before and after the one second silence period were equal. We thus had a total of 42 testing samples, placed in a random order to assure that the test persons did not know

Quality of the speech Score

Degradation is inaudible 5

Degradation is audible but not annoying 4

Degradation is slightly annoying 3

Degradation is annoying 2

Degradation is very annoying 1

Table 8.14: DMOS quality scale, defined by Annex D of [15]


214

which files contained the “perfect“ testing samples or which algorithms were used in the different files.

The test was sent to 30 employees at FFI, and 14 persons performed the test. These test persons were instructed to rate the difference between the first and the second sound according to the five point degradation category scale in Table 8.14. The test was performed according to the DMOS standard described in Annex D of ITU-T Recommendation P.800 [15].

All sound files used 16-bit mono PCM encoding, sampled at 44100Hz, compared to the CD quality of 16-bit stereo PCM encoding sampled at 44100Hz and the regular telephone quality of 8-bit mono PCM encoding sampled at 8000Hz. The reason for using this high quality of the sound, is that we would like our algorithms to work for all quality levels, and that VoIP and other sound transmitted over networks may have higher quality in the future.

8.9.2 PESQ

ITU-T Recommendation P.862 [16] describes PESQ (Perceptual Evaluation of Speech Quality) as an objective alternative to MOS for measuring voice quality. PESQ is a computer program that compares an original signal X(t) with a degraded signal Y(t) that is the result of passing X(t) through a communications system (including the playout algorithm). The output of PESQ is a prediction of the perceived quality that would be given to Y(t) in a subjective listening test, where the quality of the speech is rated according to the MOS scale in Table 8.13.

The reference implementation of PESQ, which is included as part of the standard [16], is used in this thesis.

According to [27], PESQ has a limitation in that it does not give a good measurement of the stretching elasticity and dynamic of the speech signal. They used WSOLA to change the playout speed without changing the pitch. For a speech signal with much stretching and compression, where the subjective listening tests showed very good hearing results, [27] reports that PESQ gave an average score of 3.2.

The PESQ program code from ITU-T recommendation P.862 [16] works for only two sampling frequencies: 8000 and 16000 Hz, thus the PESQ scores in this thesis were obtained by using 16000 Hz sound files. Since PESQ is defined only for voice, this algorithm was not used for music samples in this thesis.

8.9 Quality metrics

215

8.9.3 Packet loss and run-dry incidents

Many of the published playout buffer algorithms use a deadline, and discards packets that arrive after the deadline. For such algorithms, the late packet loss rate is used as a quality metric.

The optimal control algorithm normally does not loose packets, but may experience incidents where the buffer runs dry. The corresponding run dry rate is used as a quality metric in this thesis.

A run-dry incident will probably affect the sound quality less than the loss of a packet, since no information is lost during a run-dry incident. Thus, with otherwise equal quality, a speech signal with x% run-dry rate will probably have a higher quality than a speech signal with x% late packet loss.

8.9.4 Arentz dissimilarity measure

Content-based retrieval is an active research area, where methods are developed for searching for contents contained in digital text, sound, music, image and video, etc. This is a big step forward from traditional database search which is largely based on simple attributes. One of the research areas within content based musical retrieval is Query-by-Humming systems. A Query by Humming system allows the user to find a song by humming part of the tune. The user hums into the microphone, the computer records the hum and extracts certain features corresponding to the melody and rhythm characteristics, and it then compares the features to the features of the songs in a database. Finally it returns a ranked list of the songs or song segments most similar to the humming.|

Arentz et al. [3] have developed a dissimilarity measure for Query-by-Humming systems, where the dissimilarity between two pieces of music (one hummed tune and one music tune) is calculated. This same dissimilarity measure is used in this thesis to calculate the dissimilarity between the original sound (e.g. music or speech) sent from the sender at the speed rSNDR, and

the resulting sound from the player, played at the speed rPLR(t).

The following dissimilarity measure between two tunes a and b was developed by Arentz et al.[3]:

(8.11)d a b,( ) ω aj 1– aj bj 1– bj, , ,( )2

j 1=

i

�=


216

where i is the number of notes in the tune and represents the cost of pairing

up the note pair ( ) in tune a with the note pair ( ) in tune b. The cost function is

defined as1:

, (8.12)

where is the timestamp for the given note .

In this thesis, the cost function in Equation (8.12) is calculated for an integer (x) number of media-units. We have used periods of x = 1, x = 3, x = 12, x = 30 and x = 60 media-units, for 20 ms media-units (in Sections 8.11.1.2 (p. 230) and 8.11.2 (p. 231)). This corresponds to 3000, 1000, 250, 100 and 50 cost calculation periods per minute. The cost is calculated as the time difference between the correct playout time period and the actual time period used to play the x media-units.

Equation (8.11) is used to calculate the total dissimilarity measure for the playout period as the sum of the above mentioned costs. The dissimilarity measure given by equation (8.11) is dependent upon the length of the two tunes a and b. Therefore, in this thesis, the dissimilarity per second will be used as the quality measure.


Mohan Krishna Ranganathan and Liam Kilmartin have been so kind to let me use their meas-urement traces, which they used in their paper [42]. They measured the Internet packet delays by transmitting packet streams from a host located at National University of Ireland, Galway (NUIG), Ireland to two other hosts, the first located at University of New South Wales (UNSW), Sydney, Australia, and the other at Dublin City University (DCU), Ireland. The sender and receiver clocks were not synchronised during the measurements, and hence the measured delays are relative. However, the measured delays reflect the true variation in network delay. The trace

1. Arentz et al. used a constant scaling factor in their equation, to compensate for tempo differences between the two tunes. Since we will compare two traces with identical long term tempo, the scaling factor is not used (i.e. set equal to 1) in this thesis.

ω ak al bm bn, , ,( )

ak al, bm bn,

ω ak al bm bn, , ,( ) t al( ) t ak( )–( ) t bn( ) t bm( )–( )–=

t si( ) si s∈


217

details are given in Table 8.15. In all runs in this section, the media-unit size was chosen equal to the inter packet interval.

For each of these four traces, Ranganathan and Kilmartin [42] evaluated their algorithm with PESQ.

We have run the optimal control algorithm (algorithm 3) on the same four traces, and evaluated the resulting voice files with PESQ (as described in Section 8.9.2 (p. 214)), and in the following subsection, this is compared to the PESQ results in [42].

As explained in Section 1.3.3.4 (p. 19), Ranganathan and Kilmartin [42] lets the user or appli-

cation choose the history size to be used by the fuzzy network and the sensitivity parameter used to control the responsiveness of the system for decreasing network delays. Their results (shown in b), d) and f) in the following figures) are shown in 3D graphs with the algorithm

parameters (history size and ) along the horizontal axes and the parameters that we want to compare our algorithm to, on the vertical axis.

Since, as explained in Section 8.9.2, PESQ is very sensitive to stretching and compression of the sound signal (and thus also to the changes made by WSOLA), we can think of PESQ as a user and application that requires the player rate to be close to the correct media speed. In this section, we have therefore used a large acceptable error for x1 and smaller acceptable errors for

x2 and u.

Trace no. Internet path Inter packet interval Trace date

Trace 1 NUIG - DCU 20 ms 28. April 2003

Trace 2 NUIG - UNSW 20 ms 30. April 2003

Trace 3 NUIG - DCU 40 ms 7. May 2003

Trace 4 NUIG - UNSW 40 ms 28. April 2003

Table 8.15: Internet delay traces from Ranganathan and Kilmartin

λ

λ


218

8.10.1 Trace 1: NUIG-DCU trace with 20 ms packetization interval

For trace 1, we used 14722 packets, each containing 20 ms of media, i.e. a total of 294.44 seconds, or 4.91 minutes. Figure 8.31 shows the transport segment delay of each media-unit.

a) All media-units

b) The first 200 media-units

Figure 8.31:Transport segment delay for UDP trace 1

0 5000 10000 15000-0.05

0

0.05

0.1

media-unit number

s ab

ove

avg

λ TRS

0 50 100 150 200-0.05

0

0.05

0.1

media-unit number

s ab

ove

avg

λ TRS


219

8.10.1.1 Results

The results shown in Figure 8.32 a), c) and e) are obtained by running the optimal controller with Acceptable error x1= 1, Acceptable error x2= 5 and Acceptable error x3 = 0.1.

a) Arentz dissimilarity cost for optimal con-troller b) Additional buffering delay from [42]

c) PESQ for the optimal controller

d) PESQ from [42]

e) Run dry rate for the optimal controller

f) Late packet loss rate from [42]

Figure 8.32:Results from the optimal controller and from [42] for the NUIG-DCU trace with 20 ms packetization interval

0.05 0.06 0.07 0.08 0.09 0.12

3

4

5x 10-3

mean(MPB) [s]

Are

ntz

diss

imila

rity/

s

60ms cost period240ms cost period600ms cost period1200ms cost period

0.05 0.06 0.07 0.08 0.09 0.10

1

2

3

4

mean(MPB) [s]

PE

SQ

MO

S

0.05 0.06 0.07 0.08 0.09 0.10

0.5

1

1.5

2x 10-3

mean(MPB) [s]

run

dry

rate


220

8.10.1.2 Discussion

Figure 8.32 a) shows that the Arentz dissimilarity measure for the optimal control algorithm is not very dependent upon the cost period. For all cost periods, the dissimilarity measure decreases (i.e. shows improved quality of the algorithm) for increased buffer levels. Except for different scaling, Figure 8.32 a) and e) are very similar, thus it seems that Arentz dissimilarity measure is quite dependent upon the run dry rate.

The results from the optimal control algorithm is presented in Figure 8.32 c) with the additional buffering delay along the x-axis and the PESQ score along the y-axis. This can be compared to Figure 8.32 b), where the additional buffering delay is on the vertical axis, and Figure 8.32 d), where the PESQ score is on the vertical axis. This comparison shows that the optimal control algorithm has a higher PESQ score for most receiver buffer levels. At the highest buffer level (at approximately 100 ms), the PESQ score is close to equal, and at the lowest buffer level (at approximately 50 ms) the optimal control algorithm has one point higher PESQ score than [42].

Figure 8.32 e) shows the additional buffering delay on the x-axis and the run-dry rate on the y-axis. This can be compared to Figure 8.32 b), where the additional buffering delay is on the vertical axis, and Figure 8.32 f), where the late packet loss rate is on the vertical axis.

Figure 8.32 e) shows that the maximum run-dry rate of the optimal control algorithm is less than 0.002. Figure 8.32 b) and f) shows a late packet loss rate with a minimum value of 0.005, which is more than double the run-dry rate of the optimal control algorithm.

Thus, the optimal control algorithm had a higher PESQ score (close to one point better for most buffer levels) and a much lower run-dry rate than the corresponding numbers from [42], and can thus be said to be a considerably better playout algorithm for Trace 1.


221

8.10.2 Trace 2: NUIG-UNSW trace with 20 ms packetization interval


a) All media-units



0 5 0 0 0 1 0 0 0 0 1 5 0 0 0

-0 .1

0

0 .1

0 .2

m e d ia -uni t num b e r

s ab

ove

avg

λ TRS

0 100 200 300 400 500 600 700 800 900 1000

-0.1

0

0.1

0.2

m ed ia -unit num ber

s ab

ove

avg

λ TRS


222

8.10.2.1 Results


a) Arentz dissimilarity cost for optimal con-troller b) Additional buffering delay from [42]


d) PESQ from [42]



Figure 8.34:Results from the optimal controller and from [42] for the NUIG-UNSW trace with 20 ms packetization interval

0.012 0.014 0.016 0.0180

0.05

0.1

0.15

mean(MPB) [s]

Are

ntz

diss

imila

rity/

s


0.012 0.014 0.016 0.0180

1

2

3

4

mean(MPB) [s]

PE

SQ

MO

S

0.012 0.014 0.016 0.0180

0.005

0.01

0.015

0.02

mean(MPB) [s]

run

dry

rate


223

8.10.2.2 Discussion

Figure 8.34 a) shows that the Arentz dissimilarity measure for the optimal control algorithm is dependent upon the cost period, but not very dependent upon the buffer levels. The similarity between Arentz dissimilarity measure (Figure 8.34 a)) and the run dry rate (Figure 8.34 e)) that was found for Figure 8.32 is not very visible here, which may be due to narrow range of buffer levels shown in Figure 8.34.

A comparison of Figure 8.34 c) with Figure 8.34 b) and d) shows that for the optimal control algorithm has a PESQ score that for different buffer levels is equal to or higher than the PESQ score from [42].

Figure 8.34 e) shows that the run-dry rate of the optimal control algorithm is lower than 0.01 for all buffer levels. Figure 8.34 f) shows a late packet loss rate where the minimum seems close to 0.01, and the maximum is above 0.1.

Thus, the optimal control algorithm is clearly a better playout algorithm for Trace 2.

8.10.3 Trace 3: NUIG-DCU trace with 40 ms packetization interval


a) All media-units



0 2 0 0 0 4 0 0 0 6 0 0 0 8 0 0 0 1 0 0 0 0 1 2 0 0 0 1 4 0 0 0-0 .0 5

0

0 .0 5

0 .1

m e d ia -un i t num b e r

s ab

ove

avg

λ TRS

0 20 40 60 80 100 120 140 160 180 200-0.05

0

0.05

0.1

media-unit number

s ab

ove

avg

λ TRS


224

8.10.3.1 Results


a) Arentz dissimilarity cost for the optimal controller b) Additional buffering delay from [42]


d) PESQ from [42]



Figure 8.36:Results from the optimal controller and from [42] for the NUIG-DCU trace with 40 ms packetization interval

0.02 0.03 0.04 0.05 0.060

0.05

0.1

mean(MPB) [s]

Are

ntz

diss

imila

rity/

s


0.02 0.03 0.04 0.05 0.060

1

2

3

4

mean(MPB) [s]

PE

SQ

MO

S

0.02 0.03 0.04 0.05 0.060

0.02

0.04

mean(MPB) [s]

run

dry

rate


225

8.10.3.2 Discussion

Figure 8.34 a) shows that the Arentz dissimilarity measure for the optimal control algorithm is most dependent upon the buffer level for the 60 ms cost period. Increased buffer levels give decreased dissimilarity measures (i.e. shows better sound quality). For all cost periods, there is a similarity between Arentz dissimilarity measure (Figure 8.36 a)) and the run dry rate (Figure 8.36 e)), but with different scaling.

A comparison between Figure 8.36 c) and Figure 8.36 b) and d) shows that the optimal control algorithm has a PESQ score that is on average one point higher than the PESQ score from [42] for equal buffer levels.

A comparison between Figure 8.36 e) and Figure 8.36 b) and f) shows that the run-dry-rate of the optimal control algorithm is slightly below the late packet loss rate of [42] for equal buffer levels. As explained in Section 8.9.3, the run-dry rate will have a lower impact than an equal late packet loss rate.

Thus, the optimal control algorithm is a considerably better algorithm for Trace 3.

8.10.4 Trace 4: NUIG-UNSW trace with 40 ms packetization interval


a) All media-units



0 2 0 0 0 4 0 0 0 6 0 0 0 8 0 0 0 1 0 0 0 0 1 2 0 0 0 1 4 0 0 0

0

0 . 2

0 . 4

m e d i a -u n i t n u m b e r

s ab

ove

avg

λ TRS

0 200 400 600 800 1000

0

0.2

0.4

m ed ia -unit num be r

s ab

ove

avg

λ TRS


226

8.10.4.1 Results

The optimal controller is run with Acceptable error x1= 10, Acceptable error x2= 1 and Accept-able error x3 = 0.1 for the red lines in Figure 8.38 c) and e) and Acceptable error x1= 100, Accept-able error x2= 1 and Acceptable error x3 = 0.1 for the green lines in Figure 8.38 c) and e) and for all lines in Figure 8.38 a).


227

a) Arentz dissimilarity test for the optimal controller b) Additional buffering delay from [42]

c) PESQ for the optimal controller.Red line: Acceptable error x1= 10

Green line: Acceptable error x1= 100 d) PESQ from [42]

e) Run dry rate for the optimal controllerRed line: Acceptable error x1= 10

Green line: Acceptable error x1= 100 f) Late packet loss rate from [42]

Figure 8.38:Results from the optimal controller and from [42] for the NUIG-UNSW trace with 40 ms packetization interval

0.015 0.02 0.025 0.03 0.035 0.040

0.02

0.04

0.06

0.08

mean(MPB) [s]

Are

ntz

diss

imila

rity/

s


0.01 0.02 0.03 0.040

1

2

3

4

mean(MPB) [s]

PE

SQ

MO

S

0.01 0.02 0.03 0.040

0.02

0.04

mean(MPB) [s]

run

dry

rate


228

8.10.4.2 Discussion

Figure 8.38 a) shows that the Arentz dissimilarity measure for the optimal control algorithm is dependent upon both the cost period and the buffer level, with a similarity to the run-dry rate (shown in Figure 8.38 e)) for all cost periods.

A comparison of Figure 8.38 c) with Figure 8.38 b) and d) shows that the PESQ score is on the same level. A comparison of Figure 8.38 e) with Figure 8.38 b) and f) shows that the run-dry rate is comparable to the late packet loss rate from [42] and will thus (as explained in Section 8.9.3) have a lower impact than the late packet loss rate.

Thus, the optimal control algorithm is slightly better than [42] for Trace 4.


In this section, we have run both a subjective (DMOS) quality test, as described in Section 8.9.1 (p. 212) and an objective (PESQ) quality test as described in Section 8.9.2 (p. 214). PESQ was not used for the music samples, since it is defined only for voice.

8.11.1 DMOS and PESQ results

The upper part of Figure 8.39 shows the DMOS and PESQ results for algorithms 1, 2 and 3 for the simulated transport segment described in Section 8.2. The lower part of Figure 8.39 shows the DMOS and PESQ results for algorithms 1, 2 and 3 for a real UDP transport segment, where all three algorithms have the same mean receiver buffer level.

The real UDP transport segment used is the first of the four traces that Ranganathan and Kilmartin used in their paper [42]. This trace has 20 ms packetization interval and is obtained by measuring a UDP stream between National University of Ireland, Galway (NUIG), Ireland and Dublin City University (DCU), Ireland. It is described in Section 8.10.1.

Since the rating of the perfect sound is not dependent upon the transport segment used, the results shown for perfect sound in the lower part of Figure 8.39 is equal to the corresponding results in the upper part of Figure 8.39.

The playout speed graph for algorithm 1 is shown in Figure 8.3 c), the same graph for algorithm 2 is shown in Figure 8.4 c) and d), and for algorithm 3 in Figure 8.13 c). Please note that DMOS


229

and PESQ use different quality scales, found in Table 8.14 (p. 213) and Table 8.13 (p. 212), respectively.

For each transport segment, each algorithm and each sound sample, Figure 8.39 show markers, connected by lines, at the minimum value, the mean value and the maximum value of the DMOS rating given by the test persons. For each algorithm, the PESQ score is calculated for each of the 4 voice sample, and markers are shown for the minimum, mean and maximum values of these PESQ scores.

8.11.1.1 Discussion for simulated transport segment

The upper part of Figure 8.39 shows that algorithm 1 received low DMOS scores for both speech and music. The reason for this can be seen from Figure 8.3 (p. 171) c), which shows that

Figure 8.39:DMOS and PESQ results

Algorithm 1 Algorithm 2 Algorithm 3 perfect0

1

2

3

4

5

Sco

re

UDP transport segment

Algorithm 1 Algorithm 2 Algorithm 3 perfect0

1

2

3

4

5

Sco

re

Simulated transport segment

DMOS pop musicDMOS classical musicDMOS Male aDMOS Male bDMOS Female aDMOS Female bDMOS mean voicesPESQ for voices


230

the playout speed was zero for more than one second. Since algorithm 1 discarded all packets during this period, more than one second of the speech or music was lost.

PESQ rated algorithm 1 between “poor“ (2) and “fair“ (3), thus it seems that PESQ is less sensi-tive to loss of sound or information than DMOS.

For algorithm 2, the playout speed had frequent changes with high amplitude, as shown in Figure 8.4 (p. 173) c) and d). The DMOS scores show that for pop music, the test persons react in different ways to this frequent speed changes, since they give scores ranging from ‘degrada-tion is very annoying’ (1) to ‘degradation is inaudible’ (5)), with an average of ‘slightly annoying’. For classical music, the scores had a lower variation, with an average of ‘audible but not annoying’. For speech however, the DMOS scores are lower than for the music, with an average of 2.3.

As expected (since PESQ is sensitive to stretching and compression, as explained in Section 8.9.2), PESQ gave a lower score than DMOS for algorithm 2, with an average of 1.5, between “bad“ (1) and “poor“ (2).

The optimal control algorithm (algorithm 3) received high DMOS scores for both voice and music. The average score of the voice samples were 0.65 points below the corresponding score of the perfect sound, and with a larger variation than the perfect sound. Algorithm 3 uses stretching and compression, but without the frequent high amplitude changes of algorithm 2. Thus, as expected, PESQ gave a lower score than DMOS also for algorithm 3.

8.11.1.2 Discussion for real UDP transport segment

The lower part of Figure 8.39 shows that algorithm 1 received high DMOS scores for 3 of the voice samples and lower DMOS scores for one voice sample and for the music samples. The reason is that the periods where information was lost (because algorithm 1 discarded late packets) happened during silent periods for 3 of the voice samples. The music samples did not contain any silent periods, and thus the information loss was easily heard. The PESQ score is comparable to the average DMOS score for algorithm 1.

As with the results from the simulated transport segment, the lower part of Figure 8.39 shows that algorithm 2 receives low DMOS scores for the voice samples. On average, the degradation of the voice samples were rated between “annoying“ and “very annoying“. The PESQ scores were also low, due to the sensitivity to stretching and compression.


231

The optimal control algorithm (algorithm 3) received high DMOS scores, comparable to the scores of the perfect sound (the average score of the voice samples are also equal to the corre-sponding score of the perfect sound). The PESQ score was one point lower, probably due to its sensitivity to stretching and compression.

Algorithm 1 received a slightly higher PESQ score than algorithm 3. Algorithm 1 received a high PESQ score because the late packet loss happened during silent periods in the voice samples, and algorithm 3 received a lower PESQ score because the PESQ algorithm is very sensitive to the stretching and compression of algorithm 3. In the DMOS test, however, algo-rithm 3 received a 0.6 point higher score than algorithm 1.

8.11.2 Results for Arentz dissimilarity measure

In this thesis, Arentz dissimilarity measure costs are calculated based on the output results from running the different algorithms, as described in Section 8.9.4, and not based on sound files. Thus, the Arentz dissimilarity measure gives only one cost for each combination of transport segment and algorithm.

Figure 8.40 shows the Arentz dissimilarity measure per second for:• Algorithm 1, 2 and 3 for the simulated transport segment described in Section 8.2. This is

the same combination of algorithms and transport segment as used in Section 8.11.1.• Algorithm 1, 2 and 3 for the same UDP transport segment described in Section 8.10.1. This

is the same combination of algorithms and transport segment as used in Section 8.11.1.2


232

Figure 8.40 shows that for algorithms 1 and 3, Arentz dissimilarity measure is not very dependent upon the cost period. For algorithm 2, different cost periods give very different dissimilarity measures.

To be able to roughly compare the scores from DMOS, PESQ and Arentz dissimilarity measure, we have used a common scale from 0 to 1, where 1 represents the best quality. The PESQ and DMOS scores are divided by 5, and the following equation is used for the Arentz dissimilarity measure:

(8.13)

In this section, we have chosen ‘max dissimilarity measure’ = 0.2, since the maximum score in Figure 8.40 is close to, but below 0.2.

Arentz dissimilarity cost for:Sim,Alg1: Algorithm 1 on the simulated transport segmentSim,Alg2: Algorithm 2 on the simulated transport segmentSim,Alg3: Algorithm 3 on the simulated transport segment

UDP,Alg1: Algorithm 1 on the UDP transport segmentUDP,Alg2: Algorithm 2 on the UDP transport segmentUDP,Alg3: Algorithm 3 on the UDP transport segment

Figure 8.40:Arentz dissimilarity cost

Are

ntz

diss

imila

rity/

s

Sim,Alg1 Sim,Alg2 Sim,Alg3 UDP,Alg1 UDP,Alg2 UDP,Alg30

0.05

0.1

0.15

0.2

20ms cost period60ms cost period240ms cost period600ms cost period1200ms cost period

new score 1 dissimilarity measuremax dissimilarity measure---------------------------------------------------------------–=


233

Figure 8.41 shows scaled DMOS and PESQ scores for the mean of the voice samples (equal to the mean values shown in Figure 8.39) and Arentz dissimilarity measure (calculated by equation (8.13)) for the same combination of algorithms and transport segments as in Figure 8.40.

Figure 8.41 shows that Arentz dissimilarity measure is relatively close to the DMOS score for algorithms 1 and 3, but for algorithm 2, the closeness between DMOS and Arentz dissimilarity measure is very dependent upon the cost period.

For the 60 ms cost period, Arentz dissimilarity measure is relatively close to the DMOS score for all algorithms, even closer than the PESQ score. Thus it seems that Arentz dissimilarity cost with a 60 ms cost period may be used as a good prediction for the DMOS score. Figure 8.41 shows only 6 different combinations of algorithms and networks, thus to draw a better conclu-sion regarding the use of Arentz dissimilarity measure to predict the DMOS score, more algo-rithms and transport segments needs to be tested with both DMOS and Arentz dissimilarity.

Figure 8.41:Comparison of DMOS, PESQ and Arentz dissimilarity measure

a)

Nor

mal

ized

qua

lity

Sim,Alg1 Sim,Alg2 Sim,Alg30

0.2

0.4

0.6

0.8

1

Arentz diss. 20, 60, 240, 600 and 1200ms cost periodDMOSPESQArentz diss. 60ms cost period

b)

Nor

mal

ized

qua

lity

UDP,Alg1 UDP,Alg2 UDP,Alg30

0.2

0.4

0.6

0.8

1


234

8.12 Execution time

The execution time for the whole media receiver simulation system has been measured for two different machines; Machine 1, a pentium III with a 650 MHz CPU and Machine 2, a Pentium M with a 1.6 GHz CPU.

To be able to measure the execution time of the algorithm, we have measured the entire receiver simulation system first with no algorithm, and then with the optimal control algorithm and the anti-run-dry algorithm.

Notice that the execution is performed in Matlab, which does not give a very fast execution compared with other programming languages.

As can be seen from Table 8.16, the optimal control algorithm uses only an insignificant amount of time. The anti-run-dry algorithm however, can use much time when run on the slow machine 1. With the faster machine 2, the anti-run-dry algorithm runs 4 times faster than real-time when the number of prediction steps is 20, and close to real-time when the number of prediction steps is 50. With a different programming language or with a faster machine, the anti-run-dry algo-rithm can run at real-time also for a higher number of prediction steps.


The existing algorithms that in Sections 8.11 and 8.4 - 8.8 were compared to the algorithms from this thesis, were developed before the algorithms in this thesis were developed in 2002. As

Machine 1 Machine 2

algorithm Total execution time

difference to no algorithm

Total execution time

difference to no algorithm

No algorithm 17.6150 s 0.5810 s

optimal control 18.0960 s 0.4810 s 0.5010 s -0.0800 s

Anti-run-dry, n = 2

19.5780 s 1.9630 s 0.8120 s 0.2310 s

n = 20 32.8360 s 15.2210 s 3.1550 s 2.5740 s

n = 50 74.8160 s 57.2010 s 10.8250 s 10.2440 s

n = 70 - - 18.7070 s 18.1260 s

Table 8.16: Execution time for 8 seconds of media, with 10 ms between timesteps


235

explained in Section 1.3.3 (p. 16), two new research groups have published work on adaptation of playout speed during the time period when I had sick leave and maternity leave.

Liang et al. [25] exchanged the algorithm used as algorithm 2 in Chapter 8 with a new algorithm that use network predictions based on the last w received packets, and that lets the user set the acceptable loss rate. Subjective listening tests with the acceptable loss rate set to zero are

reported with a DMOS1 (Degradation Mean Opinion Score) of 4.5 - 4.7. In these tests, no packets are lost, but between 17.8% and 24.1% of the packets are scaled. This shows the good quality of the WSOLA-scaled speech. In addition to 3 DMOS tests, the results are given as loss rate vs. average buffering delay. No DMOS score is presented for loss rates above zero. Thus, the results are not easy to compare to the results from other algorithms. The algorithms devel-oped in this thesis are more general, in that they give the user or application the ability to control the playout quality by setting three different weight factors.

As discussed in Section 1.3.3.2 (p. 19), Laoutaris and Stavrakakis [21] use two different methods to control the buffering latency, and both methods are shown in the paper to have buff-ering delays above 0.8 seconds. Therefore, none of them are suited for interactive voice commu-nication. In the simulations and experiments in Chapter 8, the algorithms presented in this thesis had a buffer occupancy that seldom exceeded 0.2 seconds, and they are thus better suited for interactive voice communication.

Liu et al. [26] (see Section 1.3.3.3 (p. 19)) use an adaptation of an Adaptive Playout Delay algo-rithm, to enable it to handle delay spikes that happen within a talkspurt. Their paper shows a plot of packet loss rate versus average delay, where the average delay ranges from 80 ms to 240 ms, and the late packet loss rate ranges from 0.02 and 0.25. The results from the optimal control algorithm has shown a much lower run-dry rate versus buffering delay than the corresponding late packet loss rate from Liu et al.

As explained in Section 1.3.3.4 (p. 19), Ranganathan and Kilmartin [42] use a fuzzy network to do a network delay trend analysis, and use the output of this analysis as an input to the decision process for the playout speed. The scaled packet length is chosen to be a linear function of the network delay trend if the trend is decreasing, and a nonlinear function of the network delay trend if the trend is increasing. Since the buffering delay showed a tendency to undergo uncon-trolled increases by the chosen solution, the authors gave the receiver buffer delay as an extra input to the fuzzy network to stabilize the buffering delay.

1. The listeners are asked to rate the speech quality on a 5-point scale with grades corresponding to 5-deg-radation inaudible, 4-audible but not annoying, 3-slightly annoying, 2-annoying, 1-very annoying.


236

Ranganathan and Kilmartin [42] lets the user or application choose the history size to be used

by the fuzzy network and the sensitivity parameter used to control the responsiveness of the system for decreasing network delays. The algorithms developed in this thesis lets the user or application control the playout quality by setting three different weight factors, which probably is easier and more intuitive than to find the right history size and sensitivity parameter.

Section 8.10 (p. 216) compared the optimal control algorithm and the results from [42] for four different traces, with the result that the optimal control algorithm had a clearly better perform-ance for three of the traces, and a slightly better performance for the fourth trace.

This thesis uses an exact model of the receiver buffer system. The complexity of the transport segment model can be chosen by the application programmer. In this chapter, a simple model of the transport segment (a 2nd order Markov model) has lead to very good results (e.g. when the optimal control algorithm was compared to the results from Ranganathan and Kilmartin [42] and showed better results). This shows that a simple model of the transport segment is often sufficient. More complex transport segment models could give even better results, but only if they give a more correct representation of the transport segment.

A more complex model of a real system will not always lead to better results. In their paper “Optimal brain damage“, LeCun et al. [23] showed that removing unimportant parameters from a neural network improved both its speed and its accuracy. Their conclusion is that a “simple“ network whose description needs a small number of bits is more likely to generalize correctly than a more complex network because it presumably has extracted the essence of the data and removed the redundancy from it.

λ

9.1 Summary of the thesis

237

9 Conclusion

The main developments from this thesis are:• A strict notation and mathematical models that can be used as a basis for developing any

kind of playoutbuffer algorithms.• A receiver system environment simulator that can also be used for developing playoutbuffer

algorithms.• An optimal control algorithm that has shown very good results when compared to other

algorithms in both objective and subjective quality tests for both simulated and real network traces.

A summary of the thesis can be found in Section 9.1. Section 9.2 contains assessment of the claims and Section 9.3 describes future work and open problems.

9.1 Summary of the thesis

We have developed stringent notation and stringent mathematical models of the media receiver system, to be able to deduce the statistically optimal control of the playout speed. Based on the mathematical models, an analysis tool, called the receiver system environment simulator, has been implemented for experiments and comparison of playoutbuffer algorithms. Both the nota-tion, mathematical models and the receiver system environment simulator are network and protocol independent, and can also be used as a basis for developing any kind of playoutbuffer algorithms.

We have identified three deviations from perfect playout: 1) Buffering delay 2) A playout rate different from the sender rate and 3) A change of playout rate. Our approach is statistically optimal by minimizing the three deviations from the perfect playout, based on their relative importance. The importance will vary for different user and application requirements, and are thus freely tunable by means of weight factors. The optimal controller is also independent of the networks and protocols used.

The optimal control algorithm has demonstrated, by different weight factor settings, low jitter, low delay and a playout rate close to the sender rate. Given the correct transport segment model, it is the statistically optimal controller. Without the correct model, it is sub-optimal, but it has shown to be very robust with regard to modelling errors. With the use of the optimal control algorithm with standard tuning, the amount of media in the playoutbuffer is not kept as constant as with some other algorithms, but this is not perceived by the user. The user only notices the

Chapter 9 Conclusion

238

playout rate, which, by using the right weight factors ensures that he/she will experience a high level of QoS.

An anti-run-dry algorithm for the optimal controller has been developed, that gives the user the ability to specify the maximum run-dry-probability. By using the anti-run-dry algorithm together with the optimal control algorithm, the run-dry-probability can be controlled by the user. The run-dry-probability has been verified by simulations and experiments.

The optimal control algorithm has shown very good results when compared to other algorithms in an objective technique for measuring voice quality (PESQ - Perceptual Evaluation of Speech Quality) and in a subjective listening test (DMOS - Degradation Mean Opinion Score), for both simulated and real network measurement traces.

9.2 Assessment of claims

This section will review the 5 claims from Section 2.4 (p. 29).

9.2.1 Claim 1: Development of a stringent notation

Based on the definitions and descriptions in Sections 3.1 to 3.3, a stringent notation for the receiver buffer system was developed in Section 3.4 (p. 41). Thus claim 1 is fulfilled.

9.2.2 Claim 2: Mathematical modelling

Claim 2 is the development of mathematical descriptions and transport segment independent state space models of the receiver buffer systems and the transport segment by use of the strin-gent notation.

Mathematical descriptions were developed in Section 3.5.1 (p. 45). The state space models developed in Section 3.5.2 (p. 49) were all transport segment independent. Thus claim 2 is fulfilled.

9.2 Assessment of claims

239

9.2.3 Claim 3: The optimal controller

9.2.3.1 Claim 3a: Transport segment independence

The optimal controller developed in Chapter 4 (p. 71) is transport segment independent. The user can insert the state space model of her/his particular transport segment into the state space model used in the optimal controller. Thus claim 3a is fulfilled.

9.2.3.2 Claim 3b: Tunable and general

Claim 3b is the development of an optimal controller that:• is general, so that it can be tuned to any application or user requirements with respect to

playout speed and latency. • can be tuned to act close to identical to many existing controllers. The tuning is done by let-

ting the user put his/her own individual weights on the importance of:• Keeping the buffer level close to a target level (requirement A)• Keeping the playout speed close to the sender speed (requirement B)• Keeping the playout speed smooth (i.e. keeping a low rate of change of playout speed)

(requirement C)

The optimal controller has three weight factors that the user can tune. Section 8.5 (p. 173) showed that by using these weight factors, the optimal controller can be tuned to different appli-cation or user requirements with respect to playout speed and latency. Hence, we regard the first part of claim 3b to be fulfilled.

In Sections 8.5.1.3 (p. 176) and 8.5.4.3 (p. 183), the optimal controller was tuned to act almost equal to two different existing controllers. Thus, we regard the second part of claim 3b to be fulfilled.

Since the optimal control algorithm is general enough to fit different application and user requirements with regard to playout speed and latency, and also to act close to identical to different existing algorithms, by change of tuning parameters, we regard claim 3 to be fulfilled.

9.2.4 Claim 4: The anti-run-dry algorithm

Claim 4 is the development of an anti-run-dry algorithm for the optimal control of playout speed, by the use of claims 1-3. As an input to the anti-run-dry algorithm, the user can give a maximum limit for the run dry probability of the buffer.


240

An anti-run-dry algorithm was developed in Chapter 6 (p. 103). This algorithm lets the user set a maximum limit for the run dry probability. The experiments in Section 8.7 (p. 193) showed an actual run-dry-probability that was always lower than the upper limit set by the user. Hence, we regard claim 4 to be fulfilled.

9.2.5 Claim 5: The receiver system environment simulator

Claim 5 is that, when simulated, the algorithms from claims 3 and 4: • performs better than (or, for specific situations, equal to) existing algorithms• are more general and adaptable to transport segment types and user requirements regarding

latency and playout speed than existing algorithms.

In Chapter 8, the same simulated transport segment trace were used to first test two existing algorithms (in Section 8.4 (p. 170)) and then to test the optimal controller with a range of different weight factor settings, in Sections 8.5 (p. 173). For some weight factor settings (Sections 8.5.1.3 (p. 176) and 8.5.4.3 (p. 183)), the optimal controller gave almost identical result as the existing controllers, but for other weight factor settings, the results were better (depending on the aim of the weight factor tuning). Section 8.11 (p. 228) showed that in a subjective listening test, the optimal controller gave better results than the above mentioned existing algorithms. Section 8.10 (p. 216) showed that the optimal controller gave better results than another existing algorithm in an objective test for measuring voice quality. Thus the first part of claim 5 is fulfilled.

Since any transport segment state space model can be put into the optimal controller (and also the anti-run-dry algorithm), it can be said to be adaptable to different transport segment types. The tunability of the weight factors make the optimal controller (and the anti-run-dry algorithm) adaptable to user requirements. As far as we know, existing algorithms does not have this degree of flexibility. Thus, we regard claim 5 to be fulfilled.

9.3 Future work and open problems

9.3.1 Future work

This section suggests some extensions to the algorithms developed in this thesis.

9.3 Future work and open problems

241

The algorithms developed in this thesis have been tested by simulations and experiments in Matlab. For use in real-time, the implementation could be done in hardware, e.g. on a sound card, on a video card or in a mobile phone.

When a real-time implementation of the algorithm is available, it would be natural for the appli-cation developers to tune the weight factors to find a good default setting for their specific appli-cation(s). Since the three weight factors are connected to intuitive properties, they are easier for the user to tune correctly than the parameters of some of the other published playoutbuffer control algorithms. Therefore, interested users could be given the choice to tune the weight factors different from the default setting, by using e.g. logarithmic slide bars in an advanced menu of the application.

Combining solutions to the packet loss problem (such as Forward Error Control) with optimal control of playout speed may be a fruitful research area that could give a total solution for inter-active media transfer, and is left as an open area to future researchers.

9.3.2 Open problems

The optimal control algorithm works very well even when it uses a wrong transport segment model, i.e. it is very robust. Thus a general model of the transport segment may be sufficient. However, for optimal performance, the algorithms presented in this thesis are dependent upon the user to find a state space model of her/his transport segment. Finding a good transport segment state space model is outside the limits of this thesis, and is thus an open problem. The networks and protocols of the future may not be invented yet, and we let the problem of finding the state space models of these networks and protocols be open for future researchers.

Thus, the area of automatic real-time identification or detection of the transport segment state space model (to combine it with the optimal control algorithm or the anti-run-dry algorithm together with the optimal control algorithm) is an open problem.

One way to solve this problem could be to make a real-time system with parts similar to a subset of Matlab’s System Identification Toolbox [29]. This system could find the state space model of the transport segment, and insert it into the optimal controller.

Another way to solve the problem could be to have a pool of state space models for different transport segments, and make a system that picks the right state space model for the specific transport segment in use. Such a system could also have real-time dynamical predictors to tune these state space models to network behaviour changes due to increased network traffic at


242

different time periods (where the average network delay may be higher during the day than during the night).

The total algorithm could use a default transport segment model during the first few seconds of the real-time media transferral, and switch to the correct model when it is found. For long lasting transfer sessions, the automatic detection algorithm could be run regularly to adapt the transport segment model to changing network conditions (e.g. due to increased network traffic at certain time periods)

243

10 References

[1] Abramowitz, M. and Stegum, I. A., editors, “Handbook of Mathematical Functions with Formulas, Graphs and Mathematical tables“, Applied Mathematics Series No. 55, National Bureau of Standards, 7th printing, May 1968, pp. 297-299.

[2] Anderson, B. D. O. and Moore, J. B., Optimal Control - Linear Quadratic Methods, Pren-tice-Hall Inc, 1989.

[3] Arentz, W. A., Hetland, M. L. and Olstad, B., “Retrieving musical information based on rhythm and pitch correlations“, Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence. Printed in. W. A. Arentz, “Searching and Classifying Non-Textual Information“, Doctoral Theses at NTNU 2004:51, ISBN 82-471-6312-8 (printed version), ISBN 82-471-6310-1 (electronic version).

[4] Atzori, L. and Lobina, M. L., “Speech Playout Buffering Based on a Simplified Version of the ITU-T E-model“, IEEE signal processing letters, Vol. 11, No. 3, March 2004, pp. 382-385.

[5] Atzori, L., Lobina, M. L. and Isola, M., “Playout Buffering in IP Telephony: a Quality Maximization Approach“, 1st International Conference on Multimedia Services Access Networks, 2005, MSAN’05, 13-15 June 2005, pp. 49-53.

[6] Balchen, J. G. and Mummé, K. I., Process Control - Structures and Applications, Van Nostrand Reinhold, New York, 1988.

[7] Barnett, S. and Cronin, T. M., Mathematical Formulae - for engineering and science students, 4th edition, Longman Scientific and Technical, Essex, England, 1986.

[8] Boutremans, C. and LeBoudec, J.-Y., “Adaptive joint playout buffer and FEC adjustment for Internet telephony“, 22nd Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM 2003), 30 March - 3 April 2003, Vol. 1, pp 652-662.

[9] Cisco’s homepage: www.cisco.com

Chapter 10 References

244

[10] DeLeon, P. and Sreenan, C. J., “An Adaptive Predictor for Media Playout Buffering“, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. 6, March 1999, pp. 3097-3100.

[11] Fulton, J. T., Processes in Biological Hearing {online} {Corona Del Mar, CA. USA} Hearing Concepts, {cited august 2006}, section 8.3.4.1.1. Available at: http://www.hear-ingresearch.net

[12] Gelb, A., Applied Optimal Estimation, the MIT Press, Cambridge, Massachusetts and London, England, sixteenth printing, 2001.

[13] Gade, K., (in Norwegian) “Integrering av treghetsnavigasjon i en autonom undervanns-farkost“, FFI/Rapport-97/03179, Norwegian Defence Research Establishment, Kjeller Norway, 1997

[14] ITU-T Recommendation G.114, One-Way Transmission Time, in Series G: Transmission Systems and Media, Digital Systems and Networks, Telecommunication Standartization Sector of ITU, May 2000

[15] ITU-T Recommendation P.800, Methods for subjective determination of transmission quality, in series P: Telephone transmission quality, Methods for objective and subjective assessment of quality, 1996.

[16] ITU-T Recommendation P.862, Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs, in series P: Telephone transmission quality, telephone instal-lations, local line networks, Methods for objective and subjective assessment of quality.

[17] Jung, Y. and Atwood, J. W., “ -Adaptive Playout Scheme for Voice of IP Applications“, IEICE Trans. Commun., Vol. E88-B, No. 5, May 2005, pp. 2189-2192.

[18] Kalman, M., Steinbach, E. and Girod, B., “Adaptive media playout for low-delay video streaming over error-prone channels“, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 14, Issue 6, June 2004, pp. 841-851.

[19] Kurose, J. F. and Ross, K. W., “Computer Networking, A Top-Down Approach Featuring the Internet“, second edition, Pearson Addison Wesley, 2002.

β

245

[20] Kreyszig, E., Advanced Engineering Mathematics, 6th edition, John Wiley & Sons, Inc., 1988.

[21] Laoutaris, N. and Stavrakakis, I., “An analytical design of optimal playout schedulers for packet video receivers“, Computer Communications (www.elsevier.com/locate/comcom), Vol. 26, No. 4, March 2003, pp. 294-303.

[22] Laoutaris, N., Van Houdt, B. and Stavrakakis, I., “Optimization of a packet video receiver under different levels of delay jitter: an analytical approach“, Performance Evaluation (www.elsevier.com/locate/peva), Vol. 55, No. 3-4, February 2004, pp. 251-275.

[23] LeCun, Y., Denker, J. S., Solla, S., Howard, R. E., and Jackel, L. D., "Optimal Brain Damage," in Advances in Neural Information Processing Systems 2 (NIPS*89), (David Touretzky, ed.), 1990, ISBN 1-55860-100-7, pp. 598-605.

[24] Liang, Y. L., Färber N. and Girod, B., “Adaptive playout scheduling using time-scale modification in packet voice communications”, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. 3, Salt Lake City, UT, May 2001, pp. 1445-1448.

[25] Liang, Y. L., Färber, N. and Girod, B., “Adaptive playout scheduling and loss conceal-ment for voice communication over IP networks”, IEEE Transactions on Multimedia, Vol. 5, No. 4, December 2003.

[26] Liu, F., Kim, J. and Kuo, C.-C. J., “Adaptive delay concealment for Internet voice appli-cations with packet-based time-scale modification”, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Salt Lake City, UT, May 2001, pp. 1461-1464.

[27] Liu, F., Kim, J. and Kuo, C.-C. J., “Quality enhancement of packet audio with time-scale modification“, Proceedings of SPIE Vol. 4861: ITCOM 2002: Multimedia Systems and Applications V, Boston, MA, July 2002, pp. 163-173.

[28] Maciejowski, J. M., Predictive Control with Constraints, Harlow : Prentice Hall, 2002, ISBN 0201398230

[29] Matlab’s homepage: www.mathworks.com


246

[30] Moldeklev, K., “Performance analyses and issues of end systems attached to high-speed networks”, PhD thesis, volume number 1996:42, NTH (now: NTNU - Norwegian Univer-sity of Science and Technology), Trondheim, Norway, 1996, ISBN: 82-7119-928-5

[31] Narbutt, M. and Murphy, L., “Adaptive playout buffering for audio/video transmission over the internet“, Proc. of the IEE 17th UK Teletraffic Symposium, Dublin, Ireland, May 2001, pp. 27/1 -27/6.

[32] Narbutt, M. and Murphy, L., “VoIP Playout Buffer Adjustment using Adaptive Estima-tion of Network Delays“, Proc. 18th Int. Teletraffic Congress - ITC-18, Berlin, Germany, Sept. 2003, pp. 1171-1180.

[33] Narbutt, M. and Murphy, L., “A new VoIP adaptive playout algorithm“, Telecommuni-cations Quality of Services: The Business of Success, QoS 2004, IEE, March 2004, pp. 99-103.

[34] Narbutt, M. and Murphy, L., “Improving Voice Over IP Subjective Call Quality“, IEEE Communications Letters, Vol. 8, No. 5, May 2004, pp. 308-310.

[35] Narbutt, M. and Davis, M., “An Assessment of the Audio Codec Performance in Voice over WLAN (VoWLAN) Systems“, Proceedings of the Second Annual International Conference on Mobile and Ubiquitous Systems: Networking and Services (MobiQui-tous’05), Vol. 00, pp. 461-470.

[36] Nokia‘s homepage: www.nokia.com

[37] Nordavinden og sola. A database of Norwegian dialect samples. Available at: http://www.ling.hf.ntnu.no/nos

[38] Olson, H. F., “Music, physics and engineering”, 2d ed, Dover Publications Inc, New York, 1967, p. 248, Fig. 7.4.

[39] Pinto, J. and Christensen, K. J., “An Algorithm for Playout of Packet Voice based on Adaptive Adjustment of Talkspurt Silence Periods“, Proceedings of IEEE Conference on Local Computer Networks, October 1999, pp. 224-231.

[40] Qin, S. J. and Badgwell, T. J., “An overview of industrial model predictive control tech-nology” , Chemical Process Control-V, Tahoe, California, 1997, pp. 232-256.

247

[41] Ramjee, R., Kurose, J., Towsley, D. and Schulzrinne, H., “Adaptive Playout Mechanisms for Packetized Audio Applications in Wide-Area Networks“, 13th IEEE Proceedings of INFOCOM '94, Networking for Global Communications, 12-16 June 1994, pp. 680 - 688 vol.2.

[42] Ranganathan, M. K. and Kilmartin, L., “Neural and fuzzy computation techniques for playout delay adaptation in VoIP networks“, IEEE Transactions on Neural Networks, Vol. 16, Issue 5, September 2005, pp. 1174-1194.

[43] Rawlings, J. B., “Tutorial overview of model predictive control”, IEEE Control Systems Magazine, 2000, volume 20, issue 3, pp. 38-52.

[44] Rosenberg, J., “Internet Telephony: A Research Agenda“, 1997. Available on Rosen-berg’s homepage: www.djrosen.net

[45] Rosenberg, J., “Integrating Packet FEC into Adaptive Voice Playout Buffer Algorithms on the Internet“, Proceedings of IEEE Infocom 2000, March 2000, Tel Aviv, Israel, pp. 1705-1714.

[46] Rottmann, K., “Matematisk formelsamling“ (in Norwegian), Spektrum forlag, 2003, ISBN 82-7822-005-0.

[47] Ryvarden, E., (in Norwegian) “Spår mørke skyer for alle mobilselskapene“, article published 16. march 2003 at www.digi.no

[48] Sreenan, C. J., Chen, J.-C., Agrawal, P., and Narendran, B., “Delay Reduction Techniques for Playout Buffering“, IEEE transactions on multimedia, Vol. 2, No. 2, June 2000, pp. 88-100.

[49] Steinmetz, R. and Nahrstedt, K., Multimedia: Computing, communications and applica-tions, Prentice Hall P T R, New Jersey, 1995. ISBN 0-13-324435-0.

[50] Voss, R. F. and Clarke, J., “1/f noise in music and speech“, Nature 258, November 1975, pp. 317-318.

[51] Background notes for Questacon, National Science and Technology Centre, Australian Government, URL: http://www.questacon.edu.au/html/assets/rtf/strike_a_chord_sounds_strange.rtf


248

A.1 Definition of terms

249

Appendix A Notation

A.1 Definition of terms

correct media speed: the constant speed of the media when played at the correct speed. This is defined for all types of continuous media, e.g. as a constant number of samples per second for sound or a constant number of pictures per second for video.

media-unit: The amount of media corresponding to a constant amount of time when playing at the correct media speed.

A.2 Time and timesteps

For readability in this thesis, a general variable x that is dependent upon time, is often written without the time dependency, i.e. x = x(t).

At the discrete timesteps k, k+1 and k+n, the time is denoted tk, tk+1 and tk+n. The values of x at

these specific timesteps are written as x(tk), x(tk+1) and x(tk+n). In this thesis, we also use the

following shorter notation: , and hence we have and

.

A.3 General notation rules

{x} means the fractional part of x

means the greatest

All variables are represented by italic letters, and all variables throughout the thesis comply with the following rules:


Non-bold letter x, M Scalar

Table A.1: Notation rules

xk x tk( )= xk 1+ x tk 1+( )=

xk n+ x tk n+( )=

x integer x≤

Appendix A Notation

250

A.4 Specific notation symbols

This section contains an alphabetic list of all specific symbols used in the thesis.

Lowercase bold letter x Vector

Uppercase bold letter A Matrix

Lower right subscript number Subscript denotes timestep number.

A dot on top of a variable Time derivative of the variable

A hat on top of a variable Updated Kalman filter estimate of the variable

A bar on top of a variable Time prediction of the variable, from the prediction part of a Kalman filter

A tilde on top of a variable Measured value of the variable

A delta in front of a variable Measurement error, estimation error or calculation error of a variable

A capital T at the upper right corner The transposed of a vector or matrix

Symbol Description Unit

a Symbol used to substitute an expression in the deduction of the anti-run-dry algorithm. Defined by equation (6.34) on page 117.

A System matrix of the continuous state space equation (first introduced in equation (3.29) on page 50). Given by equation (3.33) on page 51.

Table A.2: Main symbols used in the thesis


Table A.1: Notation rules

xk

x· tddx=

xk

xk

xk

δx x x–=

δz z z–=

δy y y–=

xT


251

Part of A. Defined by equation (3.34) on page 51.

Part of A. Defined by equation (3.35) on page 52.

System matrix for the transport segment state space model.

b Symbol used to substitute an expression in the deduction of the anti-run-dry algorithm. Defined by equation (6.35) on page 117.

B Control matrix for the continuous total state space model. Given by equation (3.31) on page 51

Part of the control matrix B. Defined by equation (4.30) on page 84.

c Symbol used as the number of sigma in the deduction of the anti-run-dry algorithm. Defined by equation (6.5) on

page 110: .

C Noise matrix for the continuous total state space model. Given by equation (3.32) on page 51.

Noise matrix for the transport segment state space model.

D Measurement matrix

Measurement matrix for the transport segment state space model. It’s value is given in equation (5.15) on page 95.

E Expectation operator

g1, g2, gRCV Elements of G for the optimal controller. Defined by equation (4.49) on page 87.

gSUM Symbol used to substitute an expression in the deduction of the anti-run-dry algorithm. Defined by equation (6.39) on page 118.

G Gain vector for a controller. Defined by equations (4.3) on page 75 and (4.49) on page 87.



A1 RCV,

A2 RCV,

ATRS

B1

P x cμ≤( ) p%=

CTRS

DTRS

Appendix A Notation

252

h The ideal length of a time period between two consecu-tive timesteps.

s

hk The actual time period between timesteps k and k+1, s

i Counting variable (mostly used for summation)

I Identity matrix

J Optimization criterion. Defined by equation (4.2) on page 75.

k Timestep counter, k = the current timestep, k+1 = next timestep, etc.

Kk Kalman filter gain matrix at timestep k. Defined by equation (5.7) on page 93.

Kalman filter gain matrix at timestep k for transport seg-ment state space model. Given by equation (5.20) on page 95.

l Symbol used to substitute an expression in the deduction of the anti-run-dry algorithm. Defined by equation (6.28) on page 115 for the Laplace plane (l(s)), and equation (6.36) on page 117 for the time domain (l(t)).

m Media-unit number. The first media-unit in a stream has

m=1.

M Amount of data in buffers or segments media-units

MPB (t) The number of media-units in the playoutbuffer at time

t,

media-units

MPBPLR (t) The number of media-units in the playoutbuffer and

player at time t, ,

media-units

MPLR (t) The number of media-units in the player at time t, media-units



hk tk 1+ tk–=

KTRS k,

m 1≥

MPB t( ) 0 t∀,≥

MPBPLR t( ) MPB t( ) MPLR t( )+=

MPBPLR t( ) 0 t∀,≥

MPLR t( ) 0 t∀,≥


253

MRCV (t) The number of media-units at the receiver (i.e. in the vir-tual buffer, playoutbuffer and player) at time t,

,

media-units

MRCV ,d Desired level of MRCV (t) media-units

MTRS(t) The number of media-units in the transport segment at

time t,

media-units

MVB(t) The number of media-units in the virtual buffer at time t, media-units

Number of states in the transport segment state space equation.

p Symbol used in the deduction of the anti-run-dry algo-rithm. Defined by equation (6.5) on page 110:

.

p Scalar version of P (see below). P will be one-dimen-sional if the control signal u is one-dimensional.

P Riccati weighting matrix for control signals.

P Probability operator.

q1 Symbol used to substitute an expression in the deduction of the anti-run-dry algorithm. Defined by equation (6.42) on page 118.






MRCV t( ) MVB t( ) MPB t( ) MPLR t( )+ +=

MRCV t( ) 0 t∀,≥

MTRS t( ) 0 t∀,≥

MVB t( ) 0 t∀,≥

nTRS

P x cμ≤( ) p%=

Appendix A Notation

254



Q Riccati weighting matrix for the state vector elements.

Part of the Riccati weight matrix Q. Defined by equation (4.16) on page 81.

r Media rate. Letter subscripts are used to identify which rate this is.

media-units/s

r Element of the Riccati matrix. Number subscripts are used to identify the specific element.

rPB (t) The media-unit rate out of the playoutbuffer, media-units/s

rPLR (t) The media-unit rate out of the player, media-units/s

rSNDR The constant media-unit rate out of the sender, media-units/s

rTRS (t) The media-unit rate out of the transport segment, media-units/s

Measurement of rTRS (t). media-units/s

rTRS ,m The mean rate of media-unit m from the transport seg-ment. Defined by equation (3.38) on page 54.

media-units/s

Measurement of rTRS ,m media-units/s

rVB (t) The media-unit rate out of the virtual buffer,

.

media-units/s

Element in row 1, column 1 of R.

Element in row 1, column 2 of R, or in row 2, column 1 of R, since R is symmetric.



QRCV

rPB t( ) 0 t∀,≥

rPLR t( ) 0 t∀,≥

rSNDR 0≥

rTRS t( ) 0 t∀,≥

rTRS t( )

rTRS m,

rVB t( ) 0 t∀,≥

r11

r12


255

First row, except the first two elements, of R. Defined by equation (4.19) on page 82.

Element in row 2, column 2 of R.

Second row, except the first two elements, of R. Defined by equation (4.20) on page 82.

R Riccati matrix.

Part of the Riccati matrix R. Defined by equation (4.25) on page 83

Part of the Riccati matrix R. Defined by equation (4.26) on page 83.



s The Laplace transformation of the time variable.

t Time variable. s

t0 The moment the sender starts sending the first media-unit.

s

tPB,m The time that media-unit m leaves the playoutbuffer. s

tPLR,m The time that media-unit m leaves the player. s

tSNDR,m The time that media-unit m leaves the sender. s

tTRS,m The time that media-unit m leaves the transport segment. s

tVB,m The time that media-unit m leaves the virtual buffer. s

The time that the last media-unit in the stream leaves the player.

s

The time of the measurement s

u Scalar control signal.

u Vector control signal for the continuous total state space model.

uk Control vector for the discrete total state space model.



r1B

r22

r2B

R1

R2

R4

RBB

tPLR last,

trTRS m, rTRS m,

Appendix A Notation

256

v or v(t) System noise vector for the continuous total state space model.

vk System noise vector for the discrete total state space model.

vTRS System noise vector for the transport segment state space model.

vTRS,k System noise vector at timestep k, for the discrete trans-port segment state space model.

Vk Covariance matrix for vk

V(t) Spectral density matrix for v(t).

VTRS,k Covariance of vTRS,k.

w1, w2, w3 Weight factors. Defined in Section 4.5 (p. 80).

wk Measurement noise vector at timestep k.

wTRS,k Measurement noise at timestep k, for the transport seg-ment measurement equation.

Wk Covariance matrix of wk.

Covariance matrix of wTRS,k.

x used to denote a general variable

x or x(t) State vector for the continuous total state space model. Given by equations (3.29) and (3.31) on page 51.

xk State vector for the discrete total state space model. Given by equation (3.64) on page 65.

The expected value of the initial state vector x.

Kalman filter prediction of state vector x at timestep k

xTRS State vector for the transport segment state space model. Defined by equation (3.25) on page 49.

Kalman filter prediction of transport segment state vec-tor for timestep k.

Measurement updated transport segment state vector at timestep k.



WTRS k,

x0

xk

xTRS k,

xTRS k,


257

Initial value of the transport segment state vector.

Estimate of the initial value of the transport segment state vector. Defined by equation (5.16) on page 95.

X Covariance matrix of the estimation error of the state vector x

Covariance matrix for .

Covariance matrix for



Covariance matrix for . Defined by equation

(5.17) on page 95.

y Scalar measurement

y Measurement vector

Time operator. Used only for integrals in Chapter 6 (p. 103).

s

System noise matrix for the discrete total state space model.

Noise matrix for the discrete transport segment state space model.

or Dirac delta function. Defined by equation (3.68) on page 66.

Operator used in the deduction of the anti-run-dry algo-rithm. Defined by equation (6.20) on page 114. Used only in Chapter 6.

Latency s

The latency that media-unit m experiences in the play-

outbuffer. .

s



xTRS 0,

xTRS 0,

X0x0

Xkxk

XTRS k,xTRS k,

XTRS k,xTRS k,

XTRS 0,xTRS 0,

β

Γ

ΓTRS

δ δ t( )

Δ

λ

λPB m,λPB m, 0 m∀,≥

Appendix A Notation

258

A.5 Variable names for Matlab code samples

Table A.3 contains the variable names used in the code samples in Chapter 7, and the corre-sponding mathematical symbols.

The latency that media-unit m experiences in the player.

.

s

The total latency that media-unit m experiences at the

receiver, .

s

The total end-to-end latency that media-unit m experi-ences between the sender and the receiver.

s

The latency that media-unit m experiences in the trans-

port segment. .

s

The latency that media-unit m experiences in the virtual

buffer. .

s

Control matrix for discrete state space models.

Mean of a stochastic variable.

Standard deviation of a stochastic variable.

Time operator. Used mostly for integrations. Used in

Chapter 6 as:

s

System matrix for the discrete total state space model.

System matrix at timestep k, for the discrete transport segment state space model.

Variable name Mathematical symbol

A_TRS

C_TRS

Table A.3: Variable names for Matlab code samples



λPLR m,λPLR m, 0 m∀,≥

λRCV m,λRCV m, λVB m, λPB m, λPLR m,+ +=

λtotal m,

λTRS m,λTRS m, 0 m∀,>

λVB m,λVB m, 0 m∀,>

Λ

μ

σ

ττ tk n+ tk–=

Φ

ΦTRS k,

ATRS

CTRS


259

calculated_number_of_mu_in_player

g_1

g_2

g_TRS

g_SUM

GAMMA

l_h_n_plus_1

LAMBDA

M_PB

M_PBPLR_k_minus_1

M_PLR

M_RCV_d

M_RCV_desired

M_RCV_estimate

M_VB_estimate

M_VB_k_estimate

M_VB_k_minus_1_estimate

media_unit_m.t_TRS

media_unit_m_minus_n.t_TRS

number_of_mu_in_player

n_TRS

playout speed



MPLR

g1

g2

gTRS

gSUM

Γ

l h n 1+( )⋅( )

Λ

MPB

MPBPLR k 1–,

MPLR

MRCV d,

MRCV d,

MRCV

MVB

MVB k,

MVB k 1–,

tTRS m,

tTRS m n–,

MPLR

nTRS

rPLR

Appendix A Notation

260

PHI

PHI_TRS

q1_h_n_plus_1

q2_hi

q2_hn

q3_hi

q4_hi

q5_nh

q6_nh

r11

r12

r22

r2B

r_PLR

r_PLR_k

r_PLR_k_minus_1

r_SNDR

r_TRS_k_prediction

r_TRS_k_minus_half_prediction

r_TRS_k_minus_1_prediction

r_TRS_timestamp_estimate

std_dev_v_TRS



Φ

ΦTRS

q1 h n 1+( )⋅( )

q2 h i⋅( )

q2 h n⋅( )

q3 h i⋅( )

q4 h i⋅( )

q5 n h,( )

q6 n h,( )

r11

r12

r22

r2B

rPLR

rPLR k,

rPLR k 1–,

rSNDR

rTRS k,

rTRS k 1 2⁄–,

rTRS k 1–,

rTRS tTRS m,( )

σvTRS


261

t0

t_k

t_timestamp

V_continuous

V_discrete

V_TRS

v_TRS_discrete

var_M_PBPLR_k_plus_n

var_M_VB_k_estimate

var_M_VB_k_minus_1_estimate

var_r_TRS_k_prediction

var_r_TRS_k_minus_1_prediction

var_r_TRS_timestamp_estimate

w_1

w_2

w_3

x_TRS_estimate

x_TRS_k

x_TRS_k_prediction

X_TRS_k_prediction



t0

tk

tTRS m,

V t( )

Vk

VTRS

vTRS k,

σMPBPLR k n+,

2

σMVB k,

2

σMVB k 1–,

2

σrTRS k,

2

σrTRS k 1–,

2

σrTRS tTRS m,( )2

w1

w2

w3

xTRS

xTRS k,

xTRS k,

XTRS k,

Appendix A Notation

262

x_TRS_k_minus_half_prediction

X_TRS_k_minus_half_prediction

x_TRS_k_minus_1

x_TRS_k_minus_1_prediction

X_TRS_k_minus_1_prediction

x_TRS_pred_matrix

x_TRS_timestamp_update

X_TRS_timestamp_update

x1_estimate

x2



xTRS k 1 2⁄–,

XTRS k 1 2⁄–,

xTRS k 1–,

xTRS k 1–,

XTRS k 1–,

xTRS k, … xTRS k n+,

xTRS tTRS m,( )

XTRS tTRS m,( )

x1

x2

263

Appendix B Acronyms

Acronym Explanation

avg average

BPM Beats Per Minute

CPU Central Processing Unit

DMOS Degradation Mean Opinion Score

FIFO First-In-First-Out

GHz Giga hertz

IP Internet Protocol

KF Kalman Filter

max maximum

MHz Mega hertz

min minimum

MOS Mean Opinion Score

PESQ Perceptual Evaluation of Speech Quality

QoS Quality-of-Service

TCP Transport Control Protocol

UDP User Datagram Protocol

Table B.1: Acronyms

Appendix B Acronyms

264

C.1 System variance of 2nd order Markov model

265

Appendix C System variance

This appendix deduces equations for the variance of two different transport segment models, as functions of the variance of x1. Since the variance of x1 can be found from measurements from

the transport segment, the equations can be used to find numerical values for the variance of the transport segments.

As described in the thesis, different transport segments can have different state space models. This appendix calculates the variance of the two models used in the simulations in this thesis.

Section C.1 deduces the system variance of a 2nd order Markov model, Section C.2 deduces the system variance of an oscillator model.


In this section, we will find the system variance of a second order Markov model (i.e. a Markov process where the white noise input is exchanged by the output of a first order Markov process). This model will be used as a transport segment state space model in some of the simulations in this thesis.

We have the following equation:

(C.1)

where:

, , and .

We want to find the variance of x1 = rTRS - rSNDR in this process.

If we write the equations explicit, we get:

(C.2)

and

x· Ax Cv+=

xx1x2

= x1 rTRS rSNDR–= A

1T1------– 1

0 1T2------–

= C 01

=

x·11

T1------x1– x2+=


266

. (C.3)

We transform this to the Laplace plane:

(C.4)

and

. (C.5)

Equation (C.5) can be written as:

, (C.6)

We insert this into the equation for x1, (C.4), and get:

, (C.7)

which leads to:

(C.8)

We now introduce two temporary variables a and b:

, (C.9)

where:

(C.10)

x·21

T2------x2– v+=

s x1 s( )⋅ 1T1------x1 s( )– x2 s( )+=

s x2 s( )⋅ 1T2------x2 s( )– v s( )+=

x2 s( )T2

sT2 1+------------------v s( )=

s x1 s( )⋅ 1T1------x1 s( )–

T2sT2 1+------------------v s( )+=

x1 s( )T1T2

T1T2s2 T1 T2+( )s 1+ +-----------------------------------------------------------v s( )=

T1T2

T1T2s2 T1 T2+( )s 1+ +----------------------------------------------------------- 1

s a–( ) s b–( )--------------------------------

l s( )

=

=

a 12T1T2--------------- T1 T2+( )– T1 T2+( )2 4T1T2–+� �

� �

1T1------–

=

=


267

and a similar calculation gives:

. (C.11)

Since , we get:

. (C.12)

The variance of is:

(C.13)

Since we assume that the autoeffectspecter V of v is constant, we get:

(C.14)

where is defined by Dirac’s delta-function, shown in equation (3.68) on page 66. By inserting equation (3.68), we get:

b 1T2------–=

x1 s( ) l s( )v s( )=

x1 t( ) l τ( )v t τ–( ) τd

0

t

ebτ eaτ–b a–

---------------------v t τ–( ) τd

0

t

=

=

x1

E x x–( )2( ) E ebτ eaτ–b a–

---------------------v t τ–( ) τd

0

t

� �� 2

� ��

E ebτ1 e

aτ1–b a–

--------------------------ebτ2 e

aτ2–b a–

--------------------------v t τ1–( )v t τ2–( ) τ1 τ2dd

0

t

0

t

� ��

ebτ1 e

aτ1–b a–

--------------------------ebτ2 e

aτ2–b a–

--------------------------E v t τ1–( )v t τ2–( )( ) τ1 τ2dd

0

t

0

t

=

=

=

var x1( ) ebτ1 e

aτ1–b a–

--------------------------ebτ2 e

aτ2–b a–

--------------------------Vδ τ2 τ1–( ) τ1 τ2dd

0

t

0

t

=

δ τ( )


268

(C.15)

The “stable“ (as t -> infinite) version of this is:

(C.16)

By inserting the values of a and b, we get:

(C.17)

Therefore, we also have the following relation between the variance of and the autoeffect-

specter of v(t):

(C.18)

C.2 System variance of an oscillator model

In this section, we will find the system variance of a oscillator model. This model will be used as a transport segment state space model in some of the simulations in this thesis.

We have the following equation:

(C.19)

where:

var x1( ) V ebτ2 e

aτ2–b a–

--------------------------ebτ2 e

aτ2–b a–

-------------------------- τ2d

0

t

V

b a–( )2------------------- e

2bτ2 2ea b+( )τ2– e

2aτ2+� �� τ2d

0

t

V

b a–( )2------------------- e2bt 1–( )

2b----------------------- 2 e a b+( )t 1–( )

a b+------------------------------------– e2at 1–( )

2a-----------------------+� �

� �

=

=

=

var x1( ) V

b a–( )2------------------- 1–

2b------ 2

a b+------------ 1

2a------–+� �

� �

V–2ab a b+( )---------------------------

=

=

var x1( )VT1

2T22

2 T2 T1+( )--------------------------=

x1

V2 T2 T1+( )

T12T2

2--------------------------var x1( )=

x· Ax Cv+=


269

, , and .

We want to find the variance of x1 = rTRS - rSNDR in this process.

If we write the equations explicit, we get:

(C.20)

and

. (C.21)

We transform this to the Laplace plane:

(C.22)

and:

. (C.23)

Equation (C.23) can be written as:

(C.24)

By inserting the above equation into Equation (C.22), we get:

(C.25)

The above equation can be written as:

: (C.26)

We now introduce two temporary variables c and g:

xx1x2

= x1 rTRS rSNDR–= ATRS

0 1

ωrTRS

2– damp–= C 0

1=

x·1 x2=

x·2 ωrTRS

2 x1– damp x2⋅( )– v+=

s x1 s( )⋅ x2 s( )=

s x2 s( )⋅ ωrTRS

2 x1 s( )– damp x2 s( )⋅( )– v s( )+=

x2 s( ) 1s damp+( )

--------------------------- ωrTRS

2 x1 s( )– v s( )+� �� =

s x1 s( )⋅ 1s damp+( )

--------------------------- ωrTRS

2 x1 s( )– v s( )+� �� =

x1 s( ) 1

s2 s damp⋅ ωrTRS

2+ +-----------------------------------------------------v s( )=


270

, (C.27)

where:

(C.28)

and:

(C.29)

since , we get:

. (C.30)

The variance of is:

(C.31)

Since we assume that the autoeffectspecter V of v is constant, we get:

1

s2 s damp⋅ ωrTRS

2+ +----------------------------------------------------- 1

s c–( ) s g–( )--------------------------------

l s( )

=

=

c 12--- damp– damp2 4ωrTRS

2–+� �� =

g 12--- damp– damp2 4ωrTRS

2––� �� =

x1 s( ) l s( )v s( )=

x1 t( ) l τ( )v t τ–( ) τd

0

t

egτ ecτ–g c–

---------------------v t τ–( ) τd

0

t

=

=

x1

E x x–( )2( ) E egτ ecτ–g c–

---------------------v t τ–( ) τd

0

t

� �� 2

� ��

E egτ1 e

cτ1–g c–

--------------------------egτ2 e

cτ2–g c–

--------------------------v t τ1–( )v t τ2–( ) τ1 τ2dd

0

t

0

t

� ��

egτ1 e

cτ1–g c–

--------------------------egτ2 e

cτ2–g c–

--------------------------E v t τ1–( )v t τ2–( )( ) τ1 τ2dd

0

t

0

t

=

=

=


271

(C.32)

where is defined by Dirac’s delta-function, shown in equation (3.68) on page 66. By inserting equation (3.68), we get:

(C.33)

The “stable“ (as t -> infinite) version of this is:

(C.34)

By inserting the values of c and g, we get:

(C.35)

Therefore, we also have the following relation between the variance of and the autoeffect-

specter of v(t):

var x1( ) egτ1 e

cτ1–g c–

--------------------------egτ2 e

cτ2–g c–

--------------------------Vδ τ2 τ1–( ) τ1 τ2dd

0

t

0

t

=

δ τ( )

var x1( ) V egτ2 e

cτ2–g c–

--------------------------egτ2 e

cτ2–g c–

-------------------------- τ2d

0

t

V

g c–( )2------------------- e

2gτ2 2ec g+( )τ2– e

2cτ2+� �� τ2d

0

t

V

g c–( )2------------------- e2gt 1–( )

2g----------------------- 2 e c g+( )t 1–( )

c g+-----------------------------------– e2ct 1–( )

2c-----------------------+� �

� �

=

=

=

var x1( ) V

g c–( )2------------------- 1–

2g------ 2

c g+------------ 1

2c------–+� �

� �

V–2cg c g+( )--------------------------

=

=

var x1( )

V–( ) 212--- damp– damp2 4ωrTRS

2–+� �� 1

2--- damp– damp2 4ωrTRS

2––� �� ⋅� �

� �� ⁄


2–+� �� 1


2––� �� +

---------------------------------------------------------------------------------------------------------------------------------------------------------------------⋅

V

damp 2ωrTRS

2⋅------------------------------------

=

=

x1


272

(C.36)V damp 2ωrTRS

2 var x1( )⋅=

D.1 Finding

273

Appendix D Deductions

D.1 Finding

This appendix section is meant to be read as a part of Chapter 6 (Section 6.2 (p. 113)), and the notation used here is defined in Chapter 6.

The optimal controller gives the time-derivative of rPLR as its output. The controller equation is:

(D.1)

where the constants g1 and g2, and the vector gTRS are defined by Equations (4.40), (4.41) on

page 86, (4.47) on page 87 and (4.49) on page 87.

Since x1(t) = MRCV(t) - MRCV,d(t) and x2(t) = rPLR(t) - rSNDR(t), Equation (D.1) can be written as:

(D.2)

The equation for is thus:

(D.3)

By changing the variables of integration, and assuming that MRCV,d is set to a constant value for

the interval , we get:

ΔrPLR tk s,( )

r·PLR t( ) g1 g2 gTRS

x1 t( )

x2 t( )

xTRS t( )

=

r·PLR t( ) g1 MRCV t( ) MRCV d, t( )–( ) g2 rPLR t( ) rSNDR–( ) gTRSxTRS t( )+ +=

rPLR tk τ+( )

rPLR tk τ+( ) rPLR tk( ) g1 MRCV β( ) MRCV d, β( )–( ) βd

tk

tk τ+( )

g2 rPLR β( ) rSNDR–( ) βd

tk

tk τ+( )

gTRS xTRS β( ) βd

tk

tk τ+( )

+

+ +

=

tk tk n+,� � tk tk τ+,� �=


274

(D.4)

We introduce a new variable using the delta operator from Equation (6.20) on page 114:

(D.5)

By inserting the variables from (6.21), (6.22) on page 114 and (D.5) into Equation (D.4), we get:

(D.6)

We transform this to the Laplace plane, with (not tk) as the time operator:

(D.7)

We collect the terms with on the left hand side, and get:

(D.8)

Since:

rPLR tk τ+( ) rPLR tk( )– g1 MRCV tk β+( ) βd

0

τ

τg1MRCV d,–

g2 rPLR tk β+( ) βd

0

τ

τg2rSNDR– gTRS xTRS tk β+( ) βd

0

τ

+ +

=

ΔxTRS t τ,( ) xTRS t τ+( ) xTRS t( )–=

ΔrPLR tk τ,( ) g1 ΔMRCV tk β,( ) MRCV tk( )+( ) βd

0

τ

τg1MRCV d,–

τg2rSNDR– g2 ΔrPLR tk β,( ) rPLR tk( )+( ) βd

0

τ

gTRS ΔxTRS tk β,( ) xTRS tk( )+( ) βd

0

τ

+ +

g1 ΔMRCV tk β,( ) βd

0

τ

τg1MRCV tk( ) τg1MRCV d,– τg2rSNDR–

g2 ΔrPLR tk β,( ) βd

0

τ

τg2rPLR tk( ) gTRS ΔxTRS tk β,( ) βd

0

τ

τgTRSxTRS tk( )

+

+ + + +

=

=

τ

ΔrPLR tk s,( )g1s

-----ΔMRCV tk s,( )g1

s2-----MRCV tk( )

g1

s2-----MRCV d,–

g2

s2-----rSNDR–

g2s

-----ΔrPLR tk s,( )g2

s2-----rPLR tk( ) 1

s---gTRSΔxTRS tk s,( ) 1

s2-----gTRSxTRS tk( )

+

+ + + +

=

ΔrPLR tk s,( )

ΔrPLR tk s,( ) 1g2s

-----–� �� g1

s-----ΔMRCV tk s,( )

g1

s2-----MRCV tk( )

g1

s2-----MRCV d,–

g2

s2-----rSNDR–

g2

s2-----rPLR tk( ) 1

s---gTRSΔxTRS tk s,( ) 1

s2-----gTRSxTRS tk( )

+

+ + +

=

D.2 Why the distribution curve cannot be integrated

275

, (D.9)

we get:

(D.10)

D.2 Why the distribution curve cannot be integrated

This section will explain why we cannot find a symbolically expression of the area under the distribution curve.

The integration of a probability density distribution can not be solved symbolically since it has

an integrand of the form .

For an explanation of why we get an integrand of the form , we first need to find the reason why we want to integrate the distribution curve. In our case, the reason is to use the integration to calculate P(MPBPLR(t) < 0), as shown in Figure 6.5 (p. 109). Based on our knowledge of the

optimal controller and of the transport segment model, we are able to find future values of MPBPLR as a function of the white noise of the transport segment state space model, vTRS, of the

form:

, (D.11)

where and are the two functions that can be used to

describe .

The probability that the buffer runs dry can then be written as:

1g2s

-----–s g2–

s--------------=

ΔrPLR tk s,( ) 1s g2–-------------- g1ΔMRCV tk s,( )

g1s

-----MRCV tk( )g1s

-----MRCV d,–

g2s

-----rSNDR–g2s

-----rPLR tk( ) gTRSΔxTRS tk s,( ) 1s---gTRSxTRS tk( )

+

+ + +

�

�

�

�

=

ex2

ex2

MPBPLR t τ+( ) f1 MPBPLR t( ) τ,( ) f2 MPBPLR t( ) τ,( ) vTRS t( )⋅+=

f1 MPBPLR t( ) τ,( ) f2 MPBPLR t( ) τ,( )

MPBPLR t τ+( )


276

(D.12)

The next step will then be to calculate the probability that the white noise will have a

lower value than the expression . Since is

white noise with standard deviation , this probability can be written as:

(D.13)

Because the integrand is of the form , the integral in Equation (D.13) can not be solved symbolically. Thus, we cannot find a general symbolic expression for P(empty buffer), as a function of our system models, and therefore we cannot find an expression out of which to solve MRCV,d. Thus we can not find a symbolical function of the form:

(D.14)

D.3 Equations for the fraction operator

This appendix section will deduce some equations for the fraction operator {�}. Equation (D.19) is used in Section 6.4, on page 127.

If x is positive, the fraction operator is defined by:

(D.15)

P MPBPLR t τ+( ) 0<( )

P f1 MPBPLR t( ) τ,( ) f2 MPBPLR t( ) τ,( ) vTRS t( )⋅+[ ] 0<( )

P f2 MPBPLR t( ) τ,( ) vTRS t( )⋅ f1 MPBPLR t( ) τ,( )–<( )

P vTRS t( )f1 MPBPLR t( ) τ,( )–

f2 MPBPLR t( ) τ,( )----------------------------------------------<

� ��

=

=

=

vTRS t( )

f1 MPBPLR t( ) τ,( )–( ) f2 MPBPLR t( ) τ,( )( )⁄ vTRS t( )

σvTRS t( )

P vTRS t( )f1 MPBPLR t( ) τ,( )f2 MPBPLR t( ) τ,( )-------------------------------------------–<

� �� 1

2πσvTRS t( )------------------------------- e

12--- x2

σvTRS t( )2

-------------------------–

xd

∞–

f1 MPBPLR t( ) τ,( )f2 MPBPLR t( ) τ,( )---------------------------------------------–

=

ex2

MRCV d, t( ) f t vTRS t( ) P MPBPLR 0<( ), ,( )=

x{ } x x–=

D.4 Calculating term 2 of Equation (6.70)

277

By using Equation (D.15), we find that:

, (D.16)

and thus:

. (D.17)

By using Equation (D.16) for two positive numbers x and y, we get:

(D.18)

and thus:

(D.19)


This appendix section will calculate term 2 of Equation (6.70) on page 127:

(D.20)

Since:

(D.21)

and:

, (D.22)

we have the relation:

(D.23)

And thus:

0 x{ } 1<≤

x{ }( )2 x{ }<

1– x{ } y{ }–( ) 1< <

x{ } y{ }–( )2 1<

E 1b a–( )


i 0=

n

�� 2

� ��

xTRS k 1+, I Ah+( )xTRS k, CvTRS k,+=

xTRS k 1+, I Ah+( )xTRS k,=

xTRS k 1+, xTRS k 1+,– I Ah+( ) xTRS k, xTRS k,–( ) CvTRS k,–=


278

(D.24)

The above equation can also be written as:

(D.25)

By inserting the above equation into equation (D.20), we get:

(D.26)

Since is not correlated with or with at timesteps later than k,

the above equation can be written as:

xTRS k n+, xTRS k n+,–

I Ah+( )n xTRS k, xTRS k,–( ) I Ah+( )n j– CvTRS k j 1–+,

j 1=

n

�–

=

xTRS k n i–+, xTRS k n i–+,–

I Ah+( )n i– xTRS k, xTRS k,–( ) I Ah+( )n i– j– CvTRS k j 1–+,

j 1=

n i–

�–=

E 1b a–( )


i 0=

n

�� 2

� ��

E 1b a–( )

---------------- q4 h i,( )

I Ah+( )n i– xTRS k, xTRS k,–( ) I Ah+( )n i– j– CvTRS k j 1–+,

j 1=

n i–

�–� �� 2

i 0=

n

��

�

�

�

�

�

�

�

1

b a–( )2-------------------E q4 h i,( ) I Ah+( )n i– xTRS k, xTRS k,–( )⋅

i 0=

n

��

q4 h i,( ) I Ah+( )n i– j– CvTRS k j 1–+,

j 1=

n i–

�� 2

i 0=

n

�–

�

�

�

�

�

�

�

�

=

=

xTRS k, xTRS k,–( ) vTRS k, vTRS


279

(D.27)

We first calculate the first of the terms in the above expression:

(D.28)

Since white noise at different timesteps are not correlated, the second term on the right hand side of Equation (D.27) can be written as:

(D.29)

E 1b a–( )


i 0=

n

�� 2

� ��

1


i 0=

n

�� 2

� ��

1

b a–( )2-------------------E q4 h i,( ) I Ah+( )n i– j– CvTRS k j 1–+,

j 1=

n i–

��

i 0=

n

�� 2

� ��

+

=

1


i 0=

n

�� 2

� ��

1

b a–( )2-------------------E q4 h i,( ) I Ah+( )n i–

i 0=

n

��

xTRS k, xTRS k,–( )

xTRS k, xTRS k,–( )T q4 h i,( ) I Ah+( )n i–

i 0=

n

�� T

1

b a–( )2-------------------E q4 h i,( ) I Ah+( )n i–

i 0=

n

��


i 0=

n

�� T

=

=

1


j 1=

n i–

��

i 0=

n

�� 2

� ��

1


j 1=

n i–

��

� �� 2

i 0=

n

�

1

b a–( )2------------------- q4 h i,( )E I Ah+( )n i– j– CvTRS k j 1–+,

j 1=

n i–

��

vTRS k j 1–+,T I Ah+( )n i– j– C( )

T

j 1=

n i–

��

�

�

�

�

�

�

�

�q4 h i,( )T

i 0=

n

�

=

=


280

By using once again the fact that white noise at different timesteps are not correlated, we get:

(D.30)

By combining Equations (D.27), (D.28) and (D.30), we get:

(D.31)

D.5 The square of the sum of three general scalars

The task of this section is to find an expression for the square of the sum of three general scalars c1, c2 and c3, to be used in Section 6.4, on page 126.

(D.32)

By writing c2 and c3 as and , where k1 and k2 are two scalars, the

sum of the first three terms on the right hand side of the above equation can be written as:

(D.33)

The sum of the three last terms on the right hand side of Equation (D.32) can be written as:

1


j 1=

n i–

��

i 0=

n

�� 2

� ��

1


T

j 1=

n i–

��

q4 h i,( )T

i 0=

n

�=

E 1b a–( )


i 0=

n

�� 2

� ��

1

b a–( )2-------------------E q4 h i,( ) I Ah+( )n i–

i 0=

n

��


i 0=

n

�� T

1


T

j 1=

n i–

��

q4 h i,( )T

i 0=

n

�+

=

c1 c2 c3+ +( )2 c12 c2

2 c32 2c1c2 2c1c3 2c2c3+ + + + +=

c2 c1 k1+= c3 c1 k2+=

c12 c2

2 c32+ + c1

2 c1 k1+( )2 c1 k2+( )2+ +

3c12 2c1k1 2c1k2 k1

2 k22+ + + +

=

=

D.5 The square of the sum of three general scalars

281

(D.34)

By using the fact that and thus for all

values of k1 and k2, the above two equations give the relation:

, (D.35)

By using the above equation together with Equation (D.32), we find the relation:

(D.36)

2c1c2 2c1c3 2c2c3+ +

2c1 c1 k1+( ) 2c1 c1 k2+( ) 2 c1 k1+( ) c1 k2+( )+ +

2 3c12 2c1k1 2c1k2 k1k2+ + +( )

=

=

k1k2 k1k2 max k12 k2

2,( ) k12 k2

2+≤ ≤ ≤ k1k2 k12 k2

2+≤

2 c12 c2

2 c32+ +( ) 2c1c2 2c1c3 2c2c3+ +≥

c1 c2 c3+ +( )2 3 c12 c2

2 c32+ +( )≤


282

283

Appendix E Paper presented at CCCT-03

The full reference for the paper is:

Hafskjold B, “Optimal Control of Playoutbuffers”, Proceedings from International Conference

on Computer, Communication and Control Technologies: CCCT '03, Orlando, Florida, USA,

31. July - 2. August 2003, Volume VI, pages 175-181. (Also published in: ACM digital library)

ABSTRACT

Receiver playoutbuffers are required to smooth network delayvariations for multimedia streams. The two most commonly usedplayoutbuffer algorithms are called Fixed Playout Delay, whereall packets have the same end-to-end delay and Adaptive PlayoutDelay, which uses between-talk-spurt-adjustment, where a newvalue of end-to-end delay is calculated for each new talkspurt.For long talkspurts and for streaming of multimedia, within-talk-spurt-adjustment, where the playout speed is controlled, can givea lower mean end-to-end delay and fewer packets that are lost dueto late arrivals. By the use of optimal control theory, we have cal-culated the statistically optimal control of the playout speed,based on three weight factors. The user or the application pro-grammer use the weight factors to state the importance of: 1) Lowdelay 2) Playout rate close to sender rate and 3) Slow change ofplayout rate. This optimal control solution should eliminate theneed for further work on ad-hoc methods in the field of playout-buffer algorithms. We demonstrate the performance of the opti-mal control algorithm by using a simulated network and a realInternet packet trace. This is compared to the performance ofAdaptive Playout Delay and of one of the ad-hoc within-talk-spurt-adjustment algorithms.

Keywords: Voice over IP, Video conferences, Playout schedul-ing, Jitter absorption, Mathematical modelling, Optimal control,Kalman filter

1. INTRODUCTION

For both interactive media (such as IP telephony and videophones) and streaming media (such as video-on-demand), a re-ceiver playoutbuffer is needed to smooth the jitter introduced bythe network. The two solutions to this problem that are commonlyused in the Internet today are Fixed Playout Delay and AdaptivePlayout Delay. Fixed Playout Delay gives every packet a con-stant end-to-end delay, d. All packets that arrive at the receiver intime, are played out at d seconds after they were sent from the re-ceiver. Packets arriving after their deadline, are considered lost.The drawback of this algorithm is that it does not take into con-sideration the delay change that most networks experience. If thedelay constant d is set to a value close to the mean network delayin a network with varying delay, a conversation can be impossibleto make, since almost half of the packets will arrive too late, andtherefore be considered lost (i.e. thrown away). If the delay on theother hand is set much larger than the mean network delay, un-necessary delay is introduced. Adaptive Playout Delay is an im-provement, where each talkspurt, numbered i, gets its own end-to-end delay di. The original Adaptive Playout Delay algorithmmeasures the mean and the variance of the network delay, andsets di to the mean of the network delay plus K*vi, where vi is themeasured variance of the network delay, and K is a constant, of-ten set to a value between 1 and 4. Much research has been doneon between-talk-spurt-adjustment to find a good value of di, i.e.to find an improved version of the original Adaptive Playout De-lay, see [2], [3], [9], [11] and [13]. Although these versions of be-tween-talk-spurt-adjustment are improvements compared to theoriginal Adaptive Playout Delay, they all have the same problemsas the Fixed Playout Delay for long talkspurts. One example of along talkspurt is an IP telephone conversation where the senderapplication sends voice with background music or backgroundnoise. Since the sound level will never be low enough to deter-mine the end of the talkspurt, the media stream from the senderapplication will be considered to be one long talkspurt. Anotherproblem of Adaptive Playout Delay is that it does not take into

consideration the dynamics of the network, and thus it cannot pre-dict the future behaviour of the network.

To solve the problems related to long talkspurts, some newerwork, [5], [6], [7] and [12], has been done on the topics of playoutspeed adjustment and within-talk-spurt adjustment.

In 1997, Rosenberg [10] wrote about playout algorithms: ”Todate, these algorithms have all been ad-hoc. Attempts have beenmade to develop some kind of theory or bounds on the perform-ance of such algorithms, but with limited success. We believe thatthere is room for additional theoretical and practical work in thisarea. In particular, we believe that adaptation of playoutbuffersneed not occur only at the beginning of talkspurts.” and ”We alsobelieve that application of traditional estimation techniques mayprove fruitful in helping to design better algorithms”. This seemsto describe the situation also in 2003.

As far as we have seen, there is still no theoretical foundation inthe field of playoutbuffer control. We have found no stringentmathematical notation or models. All presented solutions havebeen ad-hoc methods, and new ad-hoc methods keep showing up.Since the optimal performance is not known, there exists no ref-erence for evaluating these ad-hoc methods.

The main goal of the work described in this paper is to develop atheoretical foundation for multimedia receiver buffer systems,which is network independent, and hence valid for all future net-works. By developing a stringent mathematical notation to de-scribe the system, we can also develop mathematical models thatare network independent. We also hope to eliminate the need forfurther ad-hoc methods by deducing the network independent op-timal controller.

The optimal control algorithm deduced in this paper can be usedfor both interactive applications (such as IP telephony and videoconferences) and for streaming of media (e.g. video-on-demand).Since the controller developed is network independent, we haveassumed that the user/programmer provides the state space modelof his/her particular network behaviour. Today, one way to findthis model is by the use of Matlab’s System Identification Tool-box [8]. In the future, however, we hope to include an automaticdetection/identification of the network model in our algorithm.The network state space model is used by the algorithm to find theoptimal playout speed from the playoutbuffer. Even though thecontroller is optimal only with a correct state space model, the al-gorithm has proven to be very robust, thus an approximate modelis sufficient (see also section 5.2).

We assume that the playout speed can be set to any value, at anytime. For video, this is done by changing the holding time of eachpicture. For sound, one of the best ways to change the playoutspeed of for instance voice and music, is WSOLA [7], where theplayout speed can be changed without changing the pitch ([7] in-dicates good voice quality with a stretch or compression of 25%of the inter-packet-time).

2. NOTATION

We need a new term to express the amount of media in a flow. Wecannot use the terms bits or bytes to express this, since two equaltime intervals in a flow can contain a very different amount ofbytes (for instance, due to a different level of compression). An-other opportunity could be to use the term packet - but then wewould have to discuss at which protocol level to choose this pack-et. Furthermore, since most packets contain a constant number ofbytes, we get the same problem as above.

Optimal Control of PlayoutbuffersBrita H. Hafskjold

Norwegian Defence Research EstablishmentP.O.Box 25, N-2027 Kjeller, Norway

Therefore, we introduce the term media unit to define the amountof media corresponding to a constant amount of time from thesender. A media unit is the smallest amount of media that is usedby the player. One example is a 50 pictures/second-video, wherethe most intuitive would be to define a media unit as 20 ms of me-dia. When played at sender speed, all media units of the same me-dia type in a flow have the same playout-time. For readability, theterm media unit is abbreviated to unit for the rest of the paper.

Table 1 summarizes the notation used in this document.

As illustrated in Figure 1, a unit is first sent from the sender to thenetwork (where the term network in the rest of this paper corre-sponds to the network itself plus all protocols, at both the senderand receiver side, between the sender application and the receiverplayoutbuffer) with the constant unit rate rSNDR. The network in-troduces both latency ( ) and jitter, and delivers the units tothe playoutbuffer with the rate rNW (t). The function of the play-outbuffer is to smooth the jitter without introducing too much de-lay ( ), and then feed the player at the rate rPB (t). Theplayoutbuffer receives and delivers media in whole units, andtherefore contains an integer number of units, MPB (t). The playerreceives one unit at a time from the playoutbuffer, and plays themat the rate rPLR (t). The player usually has only a fraction of a unitin its buffer, MPLR(t), (for the mathematical correctness in this pa-per, we think of the player buffer content as the amount of mediathat the player has received, but not yet played out). It receives anew unit just before finishing the playout of the one in its buffer.The playoutbuffer keeps track of the player’s buffer content, tomake sure that it gets the new unit in time. The total buffer con-tent at the receiver (the amount of media in the playoutbuffer plusthe amount in the player), MPBPLR (t), will be an integer valueplus a fraction value, and can therefore be treated as a floating-point value.

3. SYSTEM MODELLING

3.1 Mathematical relationsThis section describes the relations between some of the quanti-ties from Table 1.

The total latency is:

(1)

The number of units in the playoutbuffer and player is:

(2)which means that:

(3)

In section 4.1, we will need an expression for . At time

tm,PLR, unit m has spent seconds in the playoutbufferand player. During this time period, new data (arriving after unitm) has filled up the playoutbuffer and the player, and at tm,PLR thereceiver buffers contain MPBPLR (tm,PLR) units of data. The timeinterval the new units were arriving from the network, is therefore

equal to the interval that unit m spent in the playoutbuffer andplayer:

(4)

3.2 State space modelTo be able to deduce the optimal controller, we need to expressour system knowledge in a state space model. To find this model,we will need Eq. (3). Further, we select our control variable u as:

(5)since this is the most convenient variable to control. To deducethe total state space model, we also need a state space model forthe network. This is given by the user or by the application pro-grammer as:

(6)

where xNW is the state vector, ANW is the system matrix and CNW vNW expresses the system noise, where vNW is a vector of un-correlated Gaussian white noise with zero mean and a variance of1. The first state of the state vector is , andthe rest of the states are given by the specific model used:

(7)

One can usually obtain a good network model with a low numberof states ( ) in . In the simulations of this paper, we have

used . By using Eq.s (3), (5), (6) and (7), we can findthe total state space model for our system:

Figure 1: Overview of the total system

SenderNetwork

and protocols

Playout- buffer Player

rSNDR rNW(t) rPB(t) rPLR(t)

λm,NW

λm,PB

λm total, λm NW, λm PB, λm PLR,+ +=

MPBPLR t( ) rNW τ( ) τdt0

t

rPLR τ( ) τdt0

t

–=

M· PBPLR t( ) rNW t( ) rPLR t( )–=

λm,PBPLRλm,PBPLR

symbol unit descriptionrSNDR units/s constant unit rate from sender, rNW (t) units/s unit rate out of the network, rPB (t) units/s unit rate out of playoutbuffer, rPLR (t) units/s unit rate out of the player, MNW (t) units number of units (integer) in the network at

time t, MPB (t) units number of units (integer) in the playoutbuffer

at time t, MPLR(t) units number of units (non-integer) in the player at

time t, MPBPLR(t) unitsm unit number, m=1 for first unit in stream,

s network latency for unit m. s playoutbuffer latency for unit m.

s player latency for unit m.

tm,SNDR s time when unit m leaves the sendertm,NW s time when unit m leaves the networktm,PB s time when unit m leaves the playoutbuffertm,PLR s time when unit m leaves the playert0 s time when the sender starts sending

Table 1: Notation

λm PBPLR,1

rNW tβ NW,( )------------------------------ βd

m

m MPBPLR tm PLR,( )+( )

=

rSNDR 0>rNW t( ) 0 t∀,≥

rPB t( ) 0 t∀,≥rPLR t( ) 0 t∀,≥

MNW t( ) 0 t∀,≥

MPB t( ) 0 t∀,≥

MPLR t( ) 0 t∀,≥MPBPLR t( ) MPB t( ) MPLR t( )+ 0 t∀,≥=

m 1≥λm,NW λm,NW 0 m∀,>λm,PB λm,PB 0 m∀,≥λm,PLR λm,PLR 0 m∀,≥λm,PBPLR λm,PBPLR λm,PB λm,PLR+ 0 m∀,≥=

u r·PLR=

x· NW ANWxNW CNWvNW+=

xNW rNW rSNDR–( )

xNWrNW rSNDR–

�=

nNW xNWnNW 2=

(8)Where is a zero matrix with dimension a times b. On theleft hand side, rSNDR is absent for state number 2, since it is con-stant ( ).

We introduce as the desired level of the sum of the

playoutbuffer and player. Since , we can write thesystem equation as:

(9)where:

, (10)

, (11)

and (12)

4. OPTIMAL CONTROL OF THE PLAYOUTBUFFER

4.1 The goal of the controllerA user of the system typically has three main demands:1. The total latency ( ) of each unit m should

be as small as possible2. The playout rate from the player should be as close as possible

to the send-out rate from the sender3. The change in playout speed should be as slow as possible

Mathematically, these three demands can be written as:1.

2.

3.

We need to elaborate a bit on demand 1. We have no control over, hence to minimize the total latency of each unit, we

need to minimize . From Eq. (4), we can see that this

sum can be minimized by minimizing for all

to minimize the length of the integral (since we cannotcontrol rNW(t), we cannot minimize the integrand).

Since these three demands indeed are conflicting, we introducethe following three weight factors:

• w1: the importance of minimizing MPBPLR (t)• w2: the importance of minimizing |rPLR (t) - rSNDR|• w3: the importance of minimizing

The user of the system, or the application programmer, will feedthese weight factors to the optimal controller to get the desiredQoS.

4.2 Deduction of the optimal controller For an introduction into optimal control theory and the Riccatiequation, see [1]. The Riccati equation states that the optimalcontrol of a state space model is:

(13)where u is the vector of control variables, R is the weight matrixcorresponding to the control variables, B is given by Eq. (12), Pis the Riccati matrix (to be deduced below), and x is the state vec-tor given in Eq. (10). The first state of the state vector x can bemeasured, and the second state is known. The rest (the networkpart, xNW) is estimated by using a Kalman filter. This Kalman fil-ter uses a measurement of rNW to estimate the network state vec-tor xNW, as illustrated in Figure 2.

To deduce P, we will need two weight matrixes, the matrix R, in-troduced above, and Q, which corresponds to the state vector x.By using the three weight factors from section 4.1, the Riccatiweight matrixes become:

and (14)By combining Eq.s (13), (14), (10) and (12), we can find an ex-pression for the optimal control:

(15)where:

, (16)

and

M· PBPLRr·PLRx· NW

0 1– 1 01 nNW 1–( )×

0 0 0 01 nNW 1–( )×

0nNW 1× 0nNW 1× ANW

MPBPLRrPLR rSNDR–

xNW

⋅01

0nNW 1×

u00

CNW

vNW+ +

=

0a b×

r·SNDR 0=

MPBPLR,d

M· PBPLR,d 0=

x· Ax Bu CvNW+ +=

xMPBPLR MPBPLR d,–

rPLR rSNDR–

xNW

=

A

0 1– 1 01 nNW 1–( )×

0 0 0 01 nNW 1–( )×

0nNW 1× 0nNW 1× ANW

=

B01

0nNW 1×

= C00

CNW

=

λm,NW λm,PBPLR+

Minimize λm NW, λm PBPLR,+( ) m∀,

Minimize rPLR t( ) rSNDR–( ) t t1 PLR , tlast PLR,,[ ]∈( )∀,

Minimize r·PLR t( )( ) t t1 PLR , tlast PLR,,[ ]∈( )∀,

λm NW,λm,PBPLR

MPBPLR tm,PLR( )

tm,PLR

Figure 2: System structure

r·PLR t( )

u R 1– BTPx–=

SenderNetwork

and protocols


rSNDR rNW(t) rPB(t) rPLR(t)

measurementof rNW(t)

Optimalcontrollerof xNW(t)

estimaterPLR(t)

measurementof MPBPLR(t)

Kalmanfilter

R w3=

Q

w1 0 01 nNW×

0 w2 01 nNW×

0nNW 1× 0nNW 1× 0nNW nNW×

=

u 1–w3------ 0 1 01 nNW×

p11 p21 p1Bp21 p22 p2B

p1BT p2B

T pBB

MPBPLR MPBPLR d,–

rPLR rSNDR–

xNW

⋅=

p1B p13 ... p1 nNW 2+( )= p2B p23 ... p2 nNW 2+( )=

. (17)From Eq. (15), we get:

(18)Thus, we do not have to calculate the entire P-matrix, we onlyneed row number 2, including p21, p22 and p2B.

The Riccati solution to the time-invariant regulator problem is[1]:

(19)where A and B are given in Eq.s (11) and (12), and R and Q canbe found from Eq. (14). By the use of Eq. (19), we can calculatep21, p22 and p2B. Only the results are presented here, for detailedcalculations, see [4].

, (20)

(21)where p11 is given by:

(22)The signs of p21 and p22 can be found by the use of Eq. (18).

5. PERFORMANCE COMPARISON AND RESULTS

This chapter compares the performance of three different algo-rithms by first testing their performance on a simulated network(sections 5.1 and 5.2), and then on a real network trace (section5.3).

Algorithm 1 is the original Adaptive Playout Delay algorithmexplained in section 1.

Algorithm 2 is one of the few playout speed adjusting algo-rithms. It is found in [5] and [12], and is described in the follow-ing. If the playoutbuffer content is above a target buffer level, theinter_packet_time is set to f*normal_inter_packet_time, where f< 1, and if the playoutbuffer content is below the target buffer lev-el, the inter_packet_time is set to s*normal_inter_packet_time,where s > 1. In our simulations, we use the suggested values s =1.25 and f = 0.75, i.e. a +/- 25% change of the inter_packet_time.This translates to a slowdown of 20%, (1/1.25 = 0.8), and a speed-up of 33% (1/0.75 = 1.33) of the playout rate.

Algorithm 3 is the optimal control of playout speed. To be ableto rightfully compare this with algorithm 2, we have used thesame target buffer level (which in our notation is MPBPLR,d) foralgorithms 2 and 3. To illustrate the use of weight factors, wehave used two different configurations, where 3a has an emphasison keeping the playout speed constant (high weight for low jitter),while 3b has a high weight for keeping the buffer level close tothe target buffer level. The weight factors for 3a are set to: w1 =0.5, w2 = 0.5 and w3 = 1, and the weight factors for 3b are set to:w1 = 20, w2 = 0.5 and w3 = 0.1.

To be able to run and compare different playoutbuffer algorithms,we have developed a network simulator and an implementation of

the total system, shown in Figure 2, in Matlab [8]. The total sys-tem implementation can be run on both simulated and real net-work data.

5.1 Simulated network, correct modelThe simulated network has a state space model like the one in Eq.(6), where:

and , where T1 =

0.3 s, T2 = 0.1 s, and is chosen such that the standard de-viation of rNW is 3 units/s. Thus, (rNW-rSNDR) behave like a Mark-ov process that is driven by another Markov process instead of bywhite noise. The sender rate is set to 50 units/s. The rest of thissection discusses the performance of each of the algorithms, withthis simulated network.

Algorithm 1: The Adaptive Delay algorithm in this sec-tion has been simulated with K=24. The network delay variancevi was measured to 0.0025 s2, and hence K*vi = 0.06s. As can beseen from Figure 4 (a), the delay of the packets is kept at a con-stant level. Figure 5 (a) shows that the buffer runs dry, and thatmany packets are lost because they arrive later than their sched-uled deadline. Figure 3 (a) shows a jump to zero in the playoutspeed when we get a silence period due to lost packets. If K hadbeen set to a lower value, more packets would have been lost.

Algorithm 2: In this section, the target buffer level hasbeen set to 100 ms. Figure 4 (b) shows an added delay of about100 ms, which is equal to the target buffer level. The amount inthe playoutbuffer is kept very close to the target buffer level, ascan be seen in Figure 5 (b). The drawback of this algorithm is theextra introduced jitter, as can be seen from Figure 3 (b).

Algorithm 3: In this section, the optimal control algo-rithm has been supplied with the correct network state spacemodel. As with algorithm 2, the target buffer level is set to 100ms. As can be seen from Figure 4 (c) and (d) and Figure 5 (c) and

pBB

p33 … p3 nNW 2+( )

� … �p3 nNW 2+( ) … p nNW 2+( ) nNW 2+( )

=

u 1w3------–� �

� � p21 MPBPLR MPBPLR d,–( )

p22 rPLR rSNDR–( ) p2BxNW+ +

[

]

=

PA ATP PBR 1– BTP– Q+ + 0 nNW 2+( ) nNW 2+( )×=

p212 w3w1= p22

2 w2 2p21–=

p2B p11 01 nNW 1–( )× p21 01 nNW 1–( )× ANW+� ��

1w3------p21InNW nNW× ANW

2– 1w3------p22ANW+� �

� � 1–⋅

=

p111

w3------p21p22–=

(a) Algorithm 1

(b) Algorithm 2

(c) Algorithm 3a

(d) Algorithm 3b

seconds Figure 3: Playout speed

ANW1 T1⁄– 1

0 1 T2⁄–= CNW

0σvNW

=

σvNW

0 2 4 6 80

20

40

units

/s

0 2 4 6 840

50

60un

its/s

0 2 4 6 848

49

50

51

units

/s

0 2 4 6 8

48

50

52

units

/s

(d), algorithm 3b performs better than 3a with regard to keepingthe buffer level close to the target level. In Figure 3 (c) and (d),we can see this algorithm’s real improvement compared to algo-rithm 2, in that the jitter is much less, the playout speed is muchcloser to the sender speed, and that the change of playout speed ismuch slower. Here we can also see that 3a performs better than3b with regard to jitter.

5.2 Simulated network, wrong modelWe have used the same simulated network as in section 5.1, butthis time the optimal control algorithm receives a wrong networkstate space model. In this model, the parameters have been set tentimes higher than the correct values (to T1 = 3 s and T2 = 1 s). Theperformance of algorithms 1 and 2 is not affected by this, hencethe graphs in figures 3 (a), 3 (b), 4 (a), 4 (b), 5 (a) and 5 (b) arestill valid. The performance of algorithms 3a and 3b are shown infigures 6, 7 and 8.

As can be seen from figures 7 and 8, the ability to keep the bufferlevel close to the target level is not much altered compared to thesimulations with a correct network state space model. AlthoughFigure 6 shows that the algorithm’s ability to keep the playoutspeed close to the sender speed is not as good as with the correctnetwork model, it still performs far better than algorithm 2.

5.3 Performance with a real network trace We collected packet delay traces over the Internet, by transmit-ting TCP streams between Kjeller, Norway and Oregon, USA.

(a) Algorithm 1

(b) Algorithm 2

(c) Algorithm 3a

(d) Algorithm 3b

packet number Figure 4: Network delay (bold) and end-to-end delay

(a) Algorithm 1

(b) Algorithm 2

(c) Algorithm 3a

(d) Algorithm 3b

seconds Figure 5: Total buffer level

0 100 200 300 400-0.1

-0.05

0

0.05

s ab

ove

avg

λ NW

0 100 200 300 400-0.1

0

0.1s

abov

e av

g λ N

W

0 100 200 300 400-0.1

0

0.1

s ab

ove

avg

λ NW

0 100 200 300 400-0.1

0

0.1

s ab

ove

avg

λ NW

0 2 4 6 80

0.05

0.1

0.15

leve

l [s]

0 2 4 6 80

0.05

0.1

leve

l [s]

0 2 4 6 80

0.050.1

0.15

leve

l [s]

0 2 4 6 80

0.05

0.1

0.15

leve

l [s]

(a) Algorithm 3a

(b) Algorithm 3b


(a) Algorithm 3a

(b) Algorithm 3b


(a) Algorithm 3a

(b) Algorithm 3b


0 2 4 6 845

50

55

units

/s

0 2 4 6 840

50

60

units

/s

0 100 200 300 400-0.1

0

0.1

s ab

ove

avg

λ NW

0 100 200 300 400-0.1

0

0.1

s ab

ove

avg

λ NW

0 2 4 6 80

0.05

0.1

0.15

leve

l [s]

0 2 4 6 80

0.05

0.1

leve

l [s]

Each packet contained one unit (which contained 30 ms of me-dia), and the packets were sent with a time-distance of 30 ms(which corresponds to a sender rate of 33.3 units/s). The perform-ance of each of the algorithms on one such Internet network traceis discussed in the rest of this section.

Algorithm 1: The Adaptive Delay algorithm in this sec-tion has been simulated with K=2. The network delay variance viwas measured to 0.03 s2, and hence K*vi = 0.06s. As can be seenfrom Figure 11 (a), many packets arrive too late, and are thereforeconsidered lost. As shown by the playout rate in Figure 10 (a),this would make the speech almost unintelligible during the firstsecond.

Algorithm 2: In this section, the target buffer level is setto 100 ms. Figure 11 (b) shows an added delay of about the sizeof the target buffer level, which corresponds to the buffer level,as can be seen in Figure 9 (b). We can see from these figures thatfew packets are lost, and that the end-to-end delay is kept fairlyconstant compared to the network delay. But when looking atFigure 10 (b), we can see that this algorithm introduces a lot ofjitter compared to other algorithms.

Algorithm 3: The network state space model used here,is:

and ,

where T1 = 0.03s, T2= 0.01s, and is chosen such that thestandard deviation of rNW is 0.55 units/s. As with algorithm 2, thetarget buffer level is set to 100 ms. Figure 10 (c) and (d) show thatthis algorithm introduces much less jitter than the other two algo-rithms. As expected, these figures also show that 3a gives less jit-ter than 3b. From Figure 11 (c) and (d) and Figure 9 (c) and (d),it is clear that the introduced delay is around the target buffer lev-el, without losing any of the packets

6. CONCLUSION

A stringent mathematical model of the receiver buffer system hasbeen found, and used to develop an optimal controller. An analy-sis tool has also been implemented for simulation and comparisonof playoutbuffer algorithms. The optimal control algorithm hasdemonstrated low jitter, low delay and a playout rate close to the

(a) Algorithm 1

(b) Algorithm 2

(c) Algorithm 3a

(d) Algorithm 3b


0 2 4 60

0.05

0.1

0.15

leve

l [s]

0 2 4 60

0.050.1

0.15

leve

l [s]

0 2 4 60

0.1

0.2

leve

l [s]

0 2 4 60

0.1

0.2

leve

l [s]

A1 T1⁄– 1

0 1 T2⁄–= C

0σvNW

=

σvNW

(a) Algorithm 1

(b) Algorithm 2

(c) Algorithm 3a

(d) Algorithm 3b


(a) Algorithm 1

(b) Algorithm 2

(c) Algorithm 3a

(d) Algorithm 3b


0 2 4 60

102030

units

/s

0 2 4 60

20

40

units

/s

0 2 4 6

33.5

34

34.5

units

/s

0 2 4 632

3436

38

units

/s0 50 100 150 200

-0.050

0.050.1

0.15

s ab

ove

avg

λ NW

0 50 100 150 200-0.05

00.05

0.10.15

s ab

ove

avg

λ NW

0 50 100 150 200

0

0.1

0.2

s ab

ove

avg

λ NW

0 50 100 150 200

0

0.1

0.2

s ab

ove

avg

λ NW

sender rate (with a deviation of less than 5% for the simulated net-work with correct model). It does not use deadlines for the pack-ets, and hence does not throw away any packets, as long as theyarrive in sequence. Therefore, the risk from Fixed Playout Delayand from long talkspurts of Adaptive Playout Delay, of losingmost of the packets during severe network jitter, is eliminated.Given the correct network model, it is the statistically optimalcontroller. Without the correct model, it is sub-optimal, but it hasshown, through many test runs, to be very robust with regard tomodelling errors. With the use of the optimal control algorithm,the amount of media in the playoutbuffer is not kept as constantas with some other algorithms, but this is not perceived by the us-er. The user only notices the playout rate, which is changed slow-ly and kept close to the sender rate, and he/she will thereforeexperience a high level of QoS.

7. FUTURE WORK

We intend to develop an algorithm (preferably a real-time algo-rithm) for automatic detection or identification of the networkstate space model (similar to a subset of Matlab’s System Identi-fication Toolbox [8]), and to combine this with the optimal con-trol algorithm. This total algorithm will then use a defaultnetwork model during the first few seconds of the conversation,and switch to the correct model when it has been found. The au-tomatic detection algorithm can be run regularly for long conver-sations to adapt the network model to changing conditions in thenetwork (e.g. due to increased network traffic at certain time pe-riods of the day).

8. REFERENCES

[1] B. D. O. Anderson and J. B. Moore, Optimal control, Pren-tice-Hall International Inc., 1989

[2] R. Ansari and A. R. Kaye, “Compressed Voice in IntegratedServices Frame Relay Networks: Voice Syncronization“,Proceesings of Canadian Conference on Electrical andComputer Engineering, Montreal, September 1995.

[3] P. DeLeon and C. J. Sreenan, “An Adaptive Predictor for Me-dia Playout Buffering“, Proceedings of IEEE InternationalConference on Acoustics, Speech and Signal Processing(ICASSP), Vol. 6, March 1999, pp. 3097-3100.

[4] B. Hafskjold, Doctoral thesis, Institute of Informatics, Uni-versity of Oslo, in progress, to be published in 2003.

[5] M. Kalman, E. Steinbach and B. Girod, “Adaptive MediaPlayout for Low Delay Video Streaming over Error-ProneChannels“, IEEE Transactions on Circuits and Systemsfor Video Technology, Special Issue on Wireless Video,submitted August 2001.

[6] Y. L. Liang, N. Färber and B.Girod, “Adaptive playoutscheduling and loss concealment for voice communicationover IP networks”, IEEE Transactions on Multimedia,April 2001.

[7] Y. L. Liang, N. Färber and B. Girod, “Adaptive playoutscheduling using time-scale modification in packet voicecommunications”, Proceedings of IEEE InternationalConference on Acoustics, Speech and Signal Processing(ICASSP), Vol. 3, Salt Lake City, UT, May 2001, pp. 1445-1448.

[8] Matlab’s homepage: www.mathworks.com[9] J. Pinto and K. J. Christensen, “An Algorithm for Playout of

Packet Voice based on Adaptive Adjustment of Talkspurt Si-lence Periods“, Proceedings of IEEE Conference on LocalComputer Networks, October 1999, pp. 224-231.

[10]J. Rosenberg, “Internet Telephony: A Research Agenda“,1997. Available on Rosenberg’s homepage:www.djrosen.net

[11]C. J. Sreenan, J.-C. Chen, P. Agrawal, and B. Narendran,“Delay Reduction Techniques for Playout Buffering“, IEEEtransactions on multimedia, Vol. 02, No. 02, June 2000.

[12]E. Steinbach, N. Färber, and B. Girod, “Adaptive Playout forLow Latency Video Streaming“, Proceedings of Interna-tional Conference on Image Processing (ICIP-2001), Thes-saloniki, Greece, October 2001,pp. 962-965.

[13]P. L. Tien and M. C. Yuang, “Intelligent Voice Smoother forVBR Voice over ATM Networks“, Proceedings of IEEEConference on Computer Communications (IEEE Info-com), San Francisco, California, March/April 1998, pp. 841.

291

Appendix F Paper presented at ISICT-03

The full reference for the paper is:

Hafskjold, B., Anti-Run-Dry Algorithm for Optimal Control of Playoutbuffers, Proceedings

from International Symposium on Information and Communication Technologies (ISICT03),

Dublin, Ireland, 24. - 26. September, 2003, pages 410-417.

Anti-Run-Dry Algorithm for Optimal Control of PlayoutbuffersBrita H. Hafskjold

Norwegian Defence Research Establishment ABSTRACTReceiver playoutbuffers are required to smooth network delay variations for multimedia streams. The twomost commonly used playoutbuffer algorithms are called Fixed Playout Delay, where all packets have thesame end-to-end delay and Adaptive Playout Delay, which uses between-talk-spurt-adjustment, where anew value of end-to-end delay is calculated for each new talkspurt. For long talkspurts and for streamingof multimedia, within-talk-spurt-adjustment, where the playout speed is controlled, can give a lower meanend-to-end delay and fewer packets that are lost due to late arrivals. By the use of optimal control theory,a statistically optimal control of the playout speed can be calculated. Like other buffer control algorithms,the optimal control solution has no guarantee that the buffer will not run dry. This paper introduces an anti-run-dry algorithm that gives a controllable run dry probability. We demonstrate the performance of theanti-run-dry algorithm together with the optimal control algorithm by using a simulated network and a realInternet packet trace. This is compared to the performance of Adaptive Playout Delay and of one of theexisting within-talkspurt-adjustment algorithms.Keywords: Voice over IP, Video conferences, Playout scheduling, Jitter absorption, Kalman filter

1. INTRODUCTIONFor both interactive media (such as IP telephony and video phones) and streaming media (suchas video-on-demand), a receiver playoutbuffer is needed to smooth the network introduced jitter.The two commonly used solutions to this problem in the Internet today are Fixed Playout Delay,where all packets get an equal end-to-end delay d, and Adaptive Playout Delay, where each talk-spurt, numbered i, gets its own end-to-end delay di. Much research has been done on between-talk-spurt-adjustment, see [2], [4], [12], [13] and [15]. For both algorithms, packets arriving aftertheir deadline d, are considered lost, thus a small d will lead to many lost packets, while a larged will give extra delay. For fixed playout delay and long talkspurts of adaptive playout delay, dcan thus be too large at some timeperiods and too small at others. To solve this problem, somenewer work, [7], [9], [10] and [14], has been done on playout speed adjustment.

In [5], a theoretical foundation of the field of playoutbuffer control, with introduction of a strin-gent mathematical notation and a development of network independent mathematical models aregiven. If the playoutbuffer does not run dry, the optimal control algorithm developed in [5] givesan optimal performance based upon the user’s quality preferences. Since this algorithm, like allother playoutbuffer algorithms, can run dry, we aim at finding an anti-run-dry algorithm for theoptimal control algorithm, to give the user a controllable run-dry-probability.

Since the anti-run-dry algorithm developed, just like the optimal controller from [5], is networkindependent, we have assumed that the user or programmer provides the state space model of his/her particular network behaviour. Today, one way to find this model is by the use of Matlab’sSystem Identification Toolbox [11]. In the future, however, we hope to include an automatic de-tection/identification of the network model in our algorithm. The network state space model isused by both the anti-run-dry algorithm and the optimal control algorithm. Both algorithms haveproven to be very robust, thus an approximate model is sufficient.

We assume that the playout speed can be controlled. This is done for video by adjusting the hold-ing time of pictures and for sound by for instance WSOLA [10], where playout speed is changedwith no change of pitch ([10] indicates good voice quality for a 25% inter-packet-time stretch/compression). 2. NOTATIONThis paper uses the notation introduced in [5], where the term media unit defines an amount ofmedia corresponding to a constant amount of time from the sender. One example is a 50 pictures/second-video, where a media unit can be defined as 20 ms of media. For readability, the term

media unit is abbreviated to unit for the rest of the paper.

Table 1 summarizes most of the notation used in this document. As illustrated in Fig. 1, a unit isfirst sent from the sender to the network1 with the constant rate rSNDR. The media stream throughthe network is modelled as a continuous stream. We therefore introduce a virtual buffer as amathematically converter from the continuous stream out of the network (with the rate rNW (t))to the whole packets into the playoutbuffer (with the rate rVB (t)). The function of the playout-buffer is to smooth the jitter, and feed the player at the rate rPB (t). The player plays the media atthe rate rPLR (t), and receives a new unit just before finishing the playout of the one in its buffer.

3. CONTROL OF PLAYOUT-SPEEDIn [5], the optimal controller was deduced as:

, (1)where MRCV,d is the desired level of the receiver buffers, , ,

. ANW is given by Eq. (2) and the weight factors w1 (weight of minimizing MRCV - MRCV,d), w2 (weight of minimizing rPLR - rSNDR) and w3 (weight of minimizing ) states the user’s quality pref-erence. The state space model for the network is given by the user or programmer as:

(2)where xNW is the state vector, ANW is the system matrix and CNW vNW expresses the system noise,where vNW is a vector of uncorrelated Gaussian white noise with zero mean and a variance of 1. Thefirst state of is , and the other states are given by the specific model used. By re-ceiving measurements of rNW (t), a Kalman filter can estimate xNW(t), as illustrated in Fig. 2.

Figure 1: Overview of the total system

1. where the term network in the rest of this paper corresponds to the network itself plus all protocols, at both the sender and receiver side, between the sender application and the receiver playoutbuffer

SYMBOL UNIT DESCRIPTION SYMBOL UNIT DESCRIPTION

rSNDR units/s rate from sender (constant) MNW (t) units units in network rNW (t) units/s rate out of network MVB(t) units units in virtual buffer rVB (t) units/s rate out of virtual buffer MPB (t) units units in playoutbuffer rPB (t) units/s rate out of playoutbuffer MPLR(t) units units in player rPLR (t)units/s rate out of player MPBPLR(t) units

Kalman filter measurement update of x MRCV(t) unitsKalman filter prediction of x h s Length of a discrete timestep, h=tk+1-tkmeasurement of x λNW (t) s latency in the network

Table 1: Notation

Figure 2: System structure with anti-run-dry mechanism. The area with grey background is from [5], while the rest is the anti-run-dry algorithm added in this paper.

SenderNetwork

and protocols


rSNDR rNW(t) rPB(t) rPLR(t) Virtual buffer

rVB(t)

rSNDR 0> MNW t( ) 0 t∀,≥rNW t( ) 0 t∀,≥ MVB t( ) 0 t∀,≥

rVB t( ) 0 t∀,≥ MPB t( ) 0 t∀,≥rPB t( ) 0 t∀,≥ MPLR t( ) 0 t∀,≥

rPLR t( ) 0 t∀,≥ MPBPLR t( ) MPB t( ) MPLR t( )+ 0 t∀,≥=x MRCV t( ) MPBPLR t( ) MVB t( )+ 0 t∀,≥=xx λNW t( ) 0 t∀,≥

u r·PLR g1 g2 gNW

MRCV MRCV d,–

rPLR rSNDR–

xNW

= =

g21

w3------–� �

� � w2 2 w1w3+= g1w1w3------=

gNW g– 1 g2 01 nNW 1–( )× 1 01 nNW 1–( )× ANW+� �� g1InNW nNW× ANW

2 g2ANW+ +� �� 1–

=

r·PLR

x·NW ANWxNW CNWvNW+=

xNW rNW rSNDR–( )

Sender Network ReceiverrSNDR rNW(t) rPLR(t)

Optimalcontroller

rPLR(t)

Kalmanfilter

Network statepredictor

Anti-run-dryalgorithm

and their covariance matrixesMRCV,d(t)

user/application

degree of

programmer

n future predictions of xNW

run-dry-probabilityrNW t( )

xNW t( )

MPBPLR t( )

4. ANTI-RUN-DRY ALGORITHMThe optimal controller deduced in [5] was dependent on the value MRCV,d given by the user. Thispaper aims to find the dynamical minimum value of MRCV,d, given that the run-dry-probabilityof the buffer is to be kept below a specified level. This anti-run-dry algorithm will work outsidethe system found in [5], as shown in Fig. 2. Since we know the optimal controller and can predictfuture values of xNW(t), we can also predict future values of MPBPLR(t) for specified values ofMRCV,d.This is illustrated in Fig. 3 for MRCV,d = 3 units and MRCV,d = 10 units, along with thedistribution of the prediction error. The growth of the prediction error variance over time is illus-trated at two timesteps; at the first timestep, the distribution is quite narrow, compared to the nexttimestep. The probability that the buffer will run dry at a particular point in time is equal to thearea under the illustrated distribution curve that lies below the line given by MPBPLR = 0.

The integration of the probability density distribution can not be solved symbolically (since theintegrand has the form ), thus we cannot find an exact function of the form

. However, as illustrated in Fig. 4, the area of parts of thetotal area can be calculated by iterations, and can be found in tables, for instance in [8], thus wecan find the parameters c (number-of-sigma) and p (probability) of the equation

, where and σ is the standard deviation of x.

The probability that the buffer runs dry can be written as . Thus, to keep the run-dry-probability n timesteps ahead below p, we must make sure that:

(3)Thus, we need expressions for , and , calculatedin Sections 4.1, 4.2 and 4.3, respectively. Section 4.4 combines the results to find the lower limitfor MRCV,d. Only the most important equations are shown here, for detailed calculations, see [6].

4.1 Finding By looking at Fig. 1, we can deduce the receiver buffer equation:

(4)

Figure 3: Prediction of MPBPLR and prediction error distribution

Figure 4: Density of the normal distribution. = E(x), = standard deviation of x.

Time

MPBPLR [units] Prediction of MPBPLR when MRCV,d = 10 units

Prediction of MPBPLR when MRCV,d = 3 units



This area = P(empty buffer)

3

10

Present time

ex2

MRCV d, f t vNW t( ) run-dry-probability, ,( )=

P x μ cσ–<( ) p= μ E x( )=

f(x)

μ

σμ −

σμ 2−

σμ 3−

34% of area

13.75% of area

2.1% of area 0.15%

of area

x

c = number of p3.29 0.05%

3 0.15%2 2.3%1 15.9%

0.5 30.8%0 50%

σ

μ σP MPBPLR 0≤( )

MPBPLR k n+, MRCV k n+, MVB k n+,– c var MPBPLR k n+,( )≥=

MRCV k n+, MVB k n+, var MRCV k n+, MVB k n+,–( )

MRCV k n+,

MRCV tk τ+( ) MRCV tk( )– rNW tk β+( ) βd0

τ rPLR tk β+( ) βd

0

τ –=

By defining a delta-operator as , we can use as the Laplace time op-erator and use calculations in the s-plane to find an expression for . By using

, a first order approximation gives . We use updated Kalman filter esti-mates of and , and our knowledge of rPLR and MPBPLR at time k-1 (rPLR,k has not yet been decided at time k, since the controller has not yet been run) to find :

(5)

where , , ,

, and g1, g2 and gNW are given in chapter 3.

4.2 Finding If h is small enough, the virtual buffer will always have less than two packets inside, and at anygiven timestep k (after a packet may be removed from the virtual buffer), it will have less thanone packet inside. Thus, the first-order approximation of the predicted value of MVB,k+n can bewritten as:

(6)where is the fractional part of x.4.3 Finding We use Eq.s (5) and (6), and assume no correlation between MVB,k-1, rNW,k-1 and future predic-tions of xNW, between the {x}-term and the other terms, and between the prediction errors of fu-ture predictions of xNW. By using the relation for positive x and y, we get:

(7)4.4 Finding the minimum value of MRCV,dBy inserting Eq.s (5) and (6) into Eq. (3), the limit of MRCV,d can be found:

(8)where is given by Eq. (7). Note that this is the minimum value of MRCV,d. If this minimum value is negative, we use MRCV,d = 0.4.5 Optimal predictor for the anti-run-dry mechanismEq. (8) shows that we need a network state predictor (illustrated in Fig. 2) that provides:

and (9)Since the Kalman filter is also an optimal predictor, it is used in the network state predictor.4.6 Anti-run-dry for each timestepThe minimum value of MRCV,d given by Eq. (8) ensures a maximum run-dry-probability p in ex-actly n timesteps. However, if half the time period of the network delay oscillations is smaller

Δx tk τ,( ) x tk τ+( ) x tk( )–= τMRCV tk τ+( )

tk n+ tk τ+= MRCV k n+,rNW k 1–, xNW k,

MRCV k n+,

MRCV k n+, b a– q1 hn h+( )g1–( ) MPBPLR k 1–, MVB k 1–,+( ) gSUM q2 h i,( )xNW k n i–+,i 0=

n 1–�–

gSUMq2 h n,( )xNW k,– q1 hn h+( )g1MRCV d, eb hn h+( ) ea hn h+( )–( ) rNW k 1–, rPLR k 1–,–( )+ + b a–( )⁄

=

q1 θ( )a ebθ 1–( ) b eaθ 1–( )–

ab---------------------------------------------------------= q2 h i,( )

aebhi ebh 1–( ) beahi eah 1–( )–ab

-------------------------------------------------------------------------------= ag2 g2

2 4g1–+2

-------------------------------------=

b g2 g22 4g1––( ) 2⁄= gSUM gNW g2 0… 0+=

MVB k n+,

MVB k n+, MVB k 1–, hrNW k 1–, h rNW i,i k=

k n 1–+�+ +

��

=

x{ }

var MPBPLR k n+,( )

x{ } y{ }–( )2 1<

var MPBPLR k n+,( ) q1 hn h+( )g1–( )2σMVB k 1–,2 gSUM q2 h n,( )2PNW k, gSUM

T

gSUM q2 h i,( )2PNW k n i–+, gSUMT

i 0=

n 1–� ebh n 1+( ) eah n 1+( )–( )

2σrNW k 1–,

2

+

+ + b a–( )2⁄ 1+

<

MRCV d, b a–( ) c var MPBPLR k n+,( )( ) b a–( ) q1 hn h+( )g1–( ) MPBPLR k 1–, MVB k 1–,+( )–

ebh n 1+( ) eah n 1+( )–( ) rNW k 1–, rPLR k 1–,–( )– gSUM q2 h i,( )xNW k n i–+,i 0=

n 1–�

gSUMq2

h n,( )xNW k, b a–( ) MVB k 1–, hrNW k 1–, h rNW i,i k=

k n 1–+

�+ + ��

+

+ + q1 hn h+( )g1( )⁄

≥

var MPBPLR k n+,( )

pred xNW( ) xNW k 1+, … xNW k n+,= covar xNW( ) PNW k 1+, … PNW k n+,=

than , the run-dry-probability at tk+m, 1<m<n, could be higher than p. To avoid this prob-lem, Eq. (8) is calculated for each timestep [tk+1 ... tk+n], and the resulting MRCV,d is set to themaximum MRCV,d of these timesteps.

5. PERFORMANCE COMPARISON AND RESULTSThis chapter compares the performance of four different algorithms by first testing their perform-ance on a simulated network (sections 5.1 and 5.2), and then on a real network trace (section 5.3).

Algorithm 1 is the Adaptive Playout Delay algorithm explained in chapter 1. Algorithm 2,found in [7] and [14], is one of the few existing playout speed adjusting algorithms. If the bufferlevel is above a target buffer level d, Alg. 2 sets ,where , and if the buffer level is below d, it sets where . Our simulations use the suggested values s =1.25 and f =0.75, and a target bufferlevel of 100ms. Algorithm 3 is the optimal control algorithm, found in [5]. The weight factorsare set to: w1 = 0.5, w2 = 0.5 and w3 = 1. The target buffer level is MRCV,d =100ms for Alg. 3a)and MRCV,d =50ms for Alg. 3b). Algorithm 4 is the anti-run-dry algorithm together with the op-timal control algorithm.

To be able to run and compare different playoutbuffer algorithms, we have developed a networksimulator and an implementation of the total system, shown in Fig.s 1 and 2, in Matlab [11]. Thetotal system implementation can be run on both simulated and real network data.5.1 Simulated network, correct model

The network simulator use the model in Eq. (2), with and ,

where T1=0.3 s, T2=0.1 s, and is chosen to give rNW a standard deviation of 3 units/s. Thus,

(rNW-rSNDR) behave like a Markov process that is driven by another Markov process instead ofby white noise. The sender rate is 50 units/s. In this section, Alg.s 3 and 4 have been suppliedwith the correct network state space model.Alg. 1: As Fig. 6(a) shows, Alg. 1 in this section use the end-to-end delay di = mean (λNW) +0.06s. As Fig. 7(a) shows, a low di-value can make the buffer run dry, and give a period of si-lence, as shown in Fig. 5(a).Alg. 2: Fig. 6(b) and Fig. 7(b) show an added delay of about the target buffer level. The drawbackof this algorithm is the extra introduced jitter, as can be seen from Fig. 5(b).Alg. 3: Fig. 5(c) and (d) show that this algorithm has very little jitter compared to algorithm 2.However, as shown in Fig. 6(d) and Fig. 7(d), a low value of MRCV,d can make the buffer run dry,leading to short silence periods, as shown in Fig. 5(d).Alg. 4: In this section, we have used n=30 and c=0.5. Fig. 7(e) shows that this algorithm avoidsthat the buffer runs dry, at the cost of a bit more turbulent playoutspeed, as shown in Fig. 5(e).5.2 Simulated network, wrong modelWe have used the same simulated network as in section 5.1, but this time the optimal control al-gorithm and the anti-run-dry algorithm receive a network state space model where the parame-ters have been set ten times higher than the correct values (to T1=3 s and T2=1 s). Since Alg.s 1and 2 are not affected by this, only the performance of Alg.s 3 and 4 is shown in Fig.s 8 and 9.Alg. 3: As can be seen from Fig. 9(a), the ability to keep the buffer level close to the target levelis not much altered by using the wrong network model. Fig. 8(a) shows that the playout speed ismore turbulent than with the correct model, but it still has far less jitter than Alg. 2. Fig.s 8(b)and 9(b) show that, as with the correct model, a low value of MRCV,d can make the buffer run dry. Alg. 4: We have used c=0.5 and n=30, like in section 5.1. Fig.s 9(c) and 8(c) show that this al-

τ h n⋅=

inter_packet_time f normal_inter_packet_time⋅=f 1< inter_packet_time s normal_inter_packet_time⋅=s 1>

ANW

1T1------– 1

0 1T2------–

=CNW

0σvNW

=

σvNW

gorithm still avoids that the buffer runs dry, but the playoutspeed is a bit more turbulent than withthe correct model.

5.3 Performance with a real network trace We collected packet delay traces over the Internet, by transmitting TCP streams between Kjeller,Norway and Oregon, USA. Each packet contained one unit, and the packets were sent with atime-distance of 30 ms (which corresponds to a 30 ms unit size and a sender rate of 33.3 units/s).

(a) Alg. 1 (a) Alg. 1 (a) Alg. 1

(b) Alg. 2 (b) Alg. 2 (b) Alg. 2

(c) Alg. 3a (c) Alg. 3a (c) Alg. 3a

(d) Alg. 3b (d) Alg. 3b (d) Alg. 3b

(e) Alg. 4 seconds (e) Alg. 4 packet number (e) Alg. 4 secondsFigure 5: Playout speed Figure 6: End-to-end delay

(bold) and network delay

Figure 7: Total buffer level

(a) Alg. 3a seconds (b) Alg. 3b seconds (c) Alg. 4 secondsFigure 8: Playout speed

(a) Alg. 3a seconds (b) Alg. 3b seconds (c) Alg. 4 secondsFigure 9: Total buffer level

0 2 4 6 80

20

40

units

/s

0 100 200 300 400-0.1

-0.05

0

0.05

s ab

ove

avg

λ NW

0 2 4 6 80

0.05

0.1

0.15

leve

l [s]

0 2 4 6 840

50

60

units

/s

0 100 200 300 400-0.1

0

0.1

s ab

ove

avg

λ NW

0 2 4 6 80

0.05

0.1

leve

l [s]

0 2 4 6 848

49

50

51

units

/s

0 100 200 300 400-0.1

0

0.1

s ab

ove

avg

λ NW

0 2 4 6 80

0.050.1

0.15

leve

l [s]

0 2 4 6 80

20

40

units

/s

0 100 200 300 400-0.1

0

0.1

s ab

ove

avg

λ NW

0 2 4 6 80

0.05

0.1

leve

l [s]

0 2 4 6 840

45

50

units

/s

0 100 200 300 400-0.1

0

0.1

s ab

ove

avg

λ NW

0 2 4 6 80

0.05

0.1

leve

l [s]

0 2 4 6 845

50

55

units

/s

0 2 4 6 80

20

40

60

units

/s

0 2 4 6 840455055

units

/s

0 2 4 6 80

0.05

0.1

0.15

leve

l [s]

0 2 4 6 80

0.05

0.1

leve

l [s]

0 2 4 6 80

0.05

0.1

leve

l [s]

Alg. 1: As Fig. 11(a) shows, Alg. 1 in this section use the end-to-end delay di = mean λNW +0.06s. This leads to too late arrival of many packets, which are therefore considered lost. Fig.10(a) shows that this would make the speech almost unintelligible during the first second.Alg. 2: Fig.s 11(b) and 12(b) show a fairly constant delay of about the target buffer level. Fewpackets are lost, but as shown in Fig. 10(b), a lot of jitter is introduced.

The network state space model used for Alg.s 3 and 4 has the same form as before, where T1 =0.03s, T2= 0.01s, and is chosen to give rNW a standard deviation of 0.55 units/s. Alg. 3: Fig.s 10(c) and (d) show that this algorithm introduces much less jitter than Alg. 2. Fig.s11(c) and 12(c) shows a total delay of about the target buffer level. However, as shown in Fig.s10(d) and 12(d), a low value of di can make the buffer run dry, leading to short silence periods.Alg. 4: This section uses n=30 and c=1.5. Fig.s 11(e) and 12(e) show that this algorithm avoidsthat the buffer runs dry, but as shown in Fig. 10(e), the playoutspeed is a bit more turbulent thanwith Alg.s 3a and 3b.

6. CONCLUSIONAn anti-run-dry algorithm for the optimal control algorithm has been developed. An analysis toolhas also been implemented for simulation and comparison of playoutbuffer algorithms. By usingthe anti-run-dry algorithm together with the optimal control algorithm, the run-dry-probabilitycan be controlled by the user. The run-dry-probability has been verified by simulations. The totalalgorithm has demonstrated low jitter, low delay and a playout rate close to the sender rate. It

(a) Alg. 1 (a) Alg. 1 (a) Alg. 1

(b) Alg. 2 (b) Alg. 2 (b) Alg. 2

(c) Alg. 3a (c) Alg. 3a (c) Alg. 3a

(d) Alg. 3b (d) Alg. 3b (d) Alg. 3b

(e) Alg. 4 seconds (e) Alg. 4 packet number (e) Alg. 4 secondsFigure 10: Playout speed Figure 11: End-to-end delay

(bold) and network delay

Figure 12: Total buffer level

σvNW

0 2 4 60

102030

units

/s

0 50 100 150 200-0.05

00.05

0.10.15

s ab

ove

avg

λ NW

0 2 4 60

0.05

0.1

0.15

leve

l [s]

0 2 4 60

20

40

units

/s

0 50 100 150 200-0.05

00.05

0.10.15

gλ N

W

0 2 4 60

0.050.1

0.15

leve

l [s]

0 2 4 6

33.5

34

34.5

units

/s

0 50 100 150 200

0

0.1

0.2

s ab

ove

avg

λ NW

0 2 4 60

0.1

0.2

leve

l [s]

0 2 4 60

102030

units

/s

0 50 100 150 200

0

0.1

0.2

s ab

ove

avg

λ NW

0 2 4 60

0.1

0.2

leve

l [s]

0 2 4 6

32

34

36

units

/s

0 50 100 150 200

0

0.1

0.2

s ab

ove

avg

λ NW

0 2 4 60

0.1

0.2

leve

l [s]

does not use deadlines for the packets, and hence does not throw away any packets, as long asthey arrive in sequence. Therefore, the risk from Fixed Playout Delay and from long talkspurtsof Adaptive Playout Delay, of losing most of the packets during severe network jitter, is elimi-nated. It has shown, through many test runs, to be very robust with regard to modelling errors.With the use of the anti-run-dry algorithm, the amount of media in the playoutbuffer is not keptas constant as with some other algorithms, but this is not perceived by the user. The user onlynotices the playout rate, which is changed slowly and kept close to the sender rate, and he/shewill therefore experience a high level of QoS.7. FUTURE WORKWe intend to develop an algorithm (preferably a real-time algorithm) for automatic detection oridentification of the network state space model (similar to a subset of Matlab’s System Identifi-cation Toolbox [11]), and combine this with the anti-run-dry algorithm together with the optimalcontrol algorithm. This total algorithm will use a default network model during the first few sec-onds of the conversation, and switch to the correct model when it is found. The automatic detec-tion algorithm can be run regularly for long conversations to adapt the network model tochanging network conditions (e.g. due to increased network traffic at certain time periods).8. REFERENCES[1]B. D. O. Anderson and J. B. Moore, Optimal control, Prentice-Hall International Inc., 1989.[2]R. Ansari and A. R. Kaye, “Compressed Voice in Integrated Services Frame Relay Networks:

Voice Syncronization“, Proceesings of Canadian Conference on Electrical and ComputerEngineering, Montreal, September 1995.

[3]S. Barnett and T. M. Cronin, Mathematical Formulae - for engineering and science students,4th edition, Longman Scientific and Technical, Essex, England, 1986.

[4]P. DeLeon and C. J. Sreenan, “An Adaptive Predictor for Media Playout Buffering“, Proceedingsof IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),Vol. 6, March 1999, pp. 3097-3100.

[5]B. Hafskjold, “Optimal Control of Playoutbuffers“, Proceedings of International Conferenceon Computer, Communication and Control Technologies, Orlando, Florida, July/Aug. 2003.

[6]B. Hafskjold, Doctoral thesis, Institute of Informatics, University of Oslo, in progress, to be pub-lished in 2003.

[7]M. Kalman, E. Steinbach and B. Girod, “Adaptive Media Playout for Low Delay Video Streamingover Error-Prone Channels“, IEEE Transactions on Circuits and Systems for Video Technol-ogy, Special Issue on Wireless Video, submitted August 2001.

[8]Kreyszig, E., Advanced Engineering Mathematics, 6th edition, John Wiley & Sons, Inc., 1988.[9]Y. L. Liang, N. Färber and B.Girod, “Adaptive playout scheduling and loss concealment for voice

communication over IP networks”, IEEE Transactions on Multimedia, April 2001.[10]Y. L. Liang, N. Färber and B. Girod, “Adaptive playout scheduling using time-scale modifica-

tion in packet voice communications”, Proceedings of IEEE International Conference onAcoustics, Speech and Signal Processing (ICASSP), Vol. 3, Salt Lake City, UT, May 2001.

[11]Matlab’s homepage: www.mathworks.com[12]J. Pinto and K. J. Christensen, “An Algorithm for Playout of Packet Voice based on Adaptive

Adjustment of Talkspurt Silence Periods“, Proceedings of IEEE Conference on Local Compu-ter Networks, October 1999, pp. 224-231.

[13]C. J. Sreenan, J.-C. Chen, P. Agrawal, and B. Narendran, “Delay Reduction Techniques for Play-out Buffering“, IEEE transactions on multimedia, Vol. 02, No. 02, June 2000.

[14]E. Steinbach, N. Färber and B. Girod, “Adaptive Playout for Low Latency Video Streaming“,Proceedings of International Conference on Image Processing, Thessaloniki, Greece, Oct.2001.

[15]P. L. Tien and M. C. Yuang, “Intelligent Voice Smoother for VBR Voice over ATM Networks“,Proceedings of IEEE Conference on Computer Communications (IEEE Infocom), San Fran-cisco, California, March/April 1998, pp. 841.

Appendix G Paper accepted by IET Communications

Gade, B. H. H.: “Results for a Statistically Optimal Algorithm for Multimedia Receiver Buff-

ers”, IET Communications, Volume 1, Issue 6, Dec 2007, pages 1095-1103.

Results for a Statistically Optimal Algorithm forMultimedia Receiver Buffers

B. H. Hafskjold Gade

Abstract For interactive multimedia and multimedia streams, receiver playout buffers are required to smooth network delay variations. Instead of using a constant playout speed, newer receiver buffer algorithms control the playout speed, which can give a lower end-to-end delay and fewer packets that are lost due to late arrivals. This paper presents a statistically optimal algorithm to control playout speed. The most significant difference to other published playout speed adjusting algorithms is the thorough mathematical approach that this work is based on. We have developed a stringent notation and stringent mathematical models of the media receiver system, which are generic and independent of the networks and protocols used. This has enabled us to deduce the statistically optimal controller for the playout speed, which is also independent of the networks and protocols used. We have identified three deviations from perfect playout: 1) Buffering delay 2) A playout rate different from the sender rate and 3) A change of playout rate. Our approach is statistically optimal by minimizing the three deviations, based on their relative importance. The importance will vary for different user and application requirements, and is thus freely tunable by means of three weight factors. The optimal control algorithm is easy to implement and has demonstrated very good results when evaluated by Perceptual Evaluation of Speech Quality (PESQ), an objective technique for measuring voice quality, and Degradation Mean Opinion Score (DMOS), a subjective listening test, for both simulated and real network measurement traces.

1 Introduction

For a user of a media stream, the perceived quality consists of the delay (which is especially important for interactivity) and the listening-only or viewing-only audio or video quality. When a stream of media is sent through a network, the packets in the stream will be individually delayed. Therefore, a reception buffer at the receiver machine is necessary to protect against playout interruptions due to variations in the data arrival rate. While the amount of protection offered grows with the size of the client’s buffer, so does the extra delay that is introduced. A playout buffer algorithm is used to find a compromise between the delay and the listening-only or viewing-only media quality. The most commonly used playout buffer algorithms for voice are Fixed playout delay and Adaptive playout delay. Fixed playout delay gives every packet a constant end-to-end delay d, and thus uses a constant playout speed. Packets arriving after their deadline are considered lost. This

algorithm does not take into consideration the delay change that most networks experience. If d is set to a value close to the mean network delay in a network with varying delay, a conversation may be impossible to make, since almost half of the packets may arrive too late, and therefore considered lost. If the delay on the other hand is set much larger than the mean network delay, unnecessary delay is introduced. Adaptive Playout Delay is an improvement, valid for speech only, where each talkspurt, numbered i, gets its own end-to-end delay di. Much research has been done on between-talk-spurt-adjustment to find a good value of di, see [1] - [12]. All these versions of between-talk-spurt-adjustment have the same problems as Fixed Playout Delay for long talkspurts and other media without pauses, like music. By modifying SOLA (Synchronised overlap-and-add) to scale individual voice packets, [13] has adapted the Adaptive Playout Delay algorithm, to enable it to handle delay spikes in the middle of a talkspurt. To solve the problems related to long talkspurts and other media without pauses, the

most recent playout buffer algorithms control the playout speed of the media to be able to find a better compromise between playout interruptions and added delay [14] - [17]. One of the best ways to change the playout speed of sound may be [18] where a time-domain interpolation (WSOLA - Waveform Similarity Overlap-Add) is modified to scale individual packets, and where the playout speed can be changed without changing the pitch ([18] indicates good voice quality with a stretch or compression of 25% of the inter-packet-time). For video, the playout speed can be controlled by changing the holding time of each picture. The algorithms presented in [14] and [15] (which are meant for packet video receivers) are both reported as having buffering delays above 0.8 seconds, and are thus not suited for interactive communication. The algorithm presented in [16], which uses fuzzy networks, is compared to the optimal control algorithm introduced in this paper, in Section 0. The algorithm presented in [17] calculates the scaling of each packet based on the network delay during the last w packets, where w is a parameter that is used as a trade-off between accuracy and responsiveness. The optimal control algorithm presented in this paper is more general, in that it gives the user or application the ability to control the playout quality by setting three different weight factors. A perfect playout, i.e., a playout with no buffering delay and with a perfect listening-only or viewing-only quality (where the playout rate is equal to the sender rate at all times), cannot be obtained as long as the network introduces jitter. However, the deviations from the perfect playout can be minimized. The deviations from the perfect playout that may be experienced by a user are: 1) Buffering delay, and listening-only or viewing-only quality deviations, consisting of 2) A playout rate different from the sender rate and 3) A change of playout rate. By using a thorough mathematical approach, we aim at finding the statistically optimal control of the playout speed that minimizes the three deviations from the perfect playout, based on

their relative importance. The two main steps towards the statistically optimal controller is the development of a strict notation and strict mathematical models, which are independent of the network and protocols, and general enough to fit any kind of playout buffer algorithm. The next step is to deduce the optimal controller. Much work has been performed on packet loss concealment techniques, a few examples are [19] - [24]. However, the optimal control algorithm is not a packet loss concealment technique, but a statistically optimal control of buffering delay by controlling playout speed, and is normally without packet loss. This paper presents results for voice and music, but the mathematics presented is independent of the medium.

2 Mathematical modelling

We cannot use the terms bits or bytes to express the amount of media in a flow, since two equal time intervals in a flow can contain a very different number of bytes (for instance, due to a different level of compression). Therefore, we introduce the term media-unit to define the amount of media corresponding to a constant period of time when playing the media at the correct media speed. One example is a 50 pictures/s-video, where it would be most intuitive to define a media-unit as 20 ms of media. Fig. 1 contains an illustration of the stringent mathematical models of the media receiver system that were needed to deduce the statistically optimal control of playout speed. For a more thorough motivation and description of the model, see [25]. The notation used in this paper is summarized in the appendix, where Table 2 gives an overview of the notation rules and Table 3 shows the specific symbols used. As illustrated in Fig.1, a media-unit is first sent from the sender to the transport segment (consisting of all networks and protocols between the sender application and the receiver playout buffer) with the correct media speed rSNDR. The media stream through the transport segment is modelled as a continuous stream.

Fig. 1 Total system with optimal controller

We therefore introduce a virtual buffer as a mathematical converter from the continuous stream out of the transport segment (with the rate rTRS (t)) to the whole packets into the playout buffer. The function of the playout buffer is to smooth the jitter, and feed the player at the rate rPB (t). The player plays the media at the rate rPLR (t). The virtual buffer (which does not represent any physical entity) and the player work as counterparts; the virtual buffer converts the continuous rate from the transport segment to whole packets, and the player converts the whole packets from the playout buffer to a continuous playout rate. The number of media-units in the receiver buffers is MVB (t) for the virtual buffer, MPB (t) for the playout buffer and MPLR (t) for the player buffer. The total number of media-units in the receiver buffers is

( ) ( ) ( ) ( )RCV VB PB PLRM t M t M t M t= + + . The total state space model1 for our system is (for a detailed derivation, see [25] or [30])

1 A state space model is a mathematical model of a system as a set of input, output and state variables related by first-order differential equations. The state variables are expressed as vectors and the differential and algebraic equations are written in matrix form. The state space model is a convenient and compact way to model and analyze general systems with multiple inputs and outputs. A good textbook on vectors and matrixes is [26]. Two textbooks on state space modeling are [27] and [28].

TRSu= + +x Ax B Cv� , where ( )ddt

=x x� ,

RCV

PLR SNDR

TRS

Mr r� �� = −� ��

xx

,

1

01

TRSn ×

� ��

= � ��

B0

,

00

TRS

� �� = � ��

CC

and 1, 2,

2TRS

RCV RCV

n TRS×

� �= � ��

A AA 0 A ,

where 1,0 10 0RCV

−� �= � ��

A ,

( )

( )

1 12,

1 1

1

0TRS

TRS

nRCV

n

× −

× −

� �� =� ��

0A

0 and TRSn is the

number of states in the state-space equation for the transport segment ( TRSx ), a b×0 is a zero matrix with dimension a times b, and the control variable u is PLRu r= . The time derivative of the first state MRCV(t) (i.e., the time derivative of the number of media-units in the receiver buffers) is equal to the difference between rTRS(t) (the rate from the transport segment into the receiver buffers) and rPLR(t) (the rate out of the receiver buffers). This is equal to the difference between state 2, ( )PLR SNDRr r− and state 3, ( TRS SNDRr r− , explained below). Thus, the first row of the

system matrix A is ( )1 10 1 1TRSn× −

� �−� �

0 .

Since rSNDR is constant, the time derivative of

Sender Virtual buffer

Transport segment Player

Playout buffer

rSNDR rTRS rVB rPB rPLR

Calculate measurement of rTRS

Optimal controller Measurement

of rTRS

Estimate of xTRS

Measurement of buffer level rPLR (t)

Kalman filter

Time when each media-unit enters playout buffer

state 2 is equal to the control variable

PLRu r= . Thus, the second row of A contains zeros and the second row of B is 1. The state space model for the transport segment is TRS TRS TRS TRS TRS= +x A x C v� , where xTRS is the state vector, ATRS is the system matrix and CTRS vTRS expresses the system noise, where vTRS is a vector of uncorrelated Gaussian white noise with zero mean and unit variance. The first state of the

state vector TRSx is ( )TRS SNDRr r− , and the rest of the states are given by the specific

model used: [ ]TTRS TRS SNDRr r= −x � . One can usually obtain a good transport segment model even with a low number of states in TRSx . The transport segment state space model can either be a general model (in this paper, we have used a simple model with

2TRSn = ), or given by the user of the optimal algorithm (e.g. the application programmer) who may use guidelines from [25] to find the model. As stated in Section 8, an automatic detection algorithm could be used to find the transport segment state space model.

3 Optimal control

Mathematically, the three deviations from perfect playout (mentioned in Section 1) can be minimized by minimizing

( ) ,RCV RCV dM t M− (where ,RCV dM is

the desired receiver buffer level), ( )PLR SNDRr t r− and ( )PLRr t� . Since these

minimizations are conflicting, we introduce the weight factors: w1: the importance of minimizing

( ) ,RCV RCV dM t M−

w2: the importance of minimizing ( )PLR SNDRr t r−

w3: the importance of minimizing ( )PLRr t�

The user of the system, or the application programmer, will feed these weight factors to the optimal controller to get the desired playout quality. The optimal control algorithm will find the optimal compromise between playout buffering delay (with the weight w1) and listening-only or viewing-only quality (with the weights w2 and w3). An optimal control guideline from [29] is to give the weight factors a magnitude relative to the

expected (or nominally acceptable) value of the variable to be minimized. We have used

( )21

ii

wx

=Δ

, where ixΔ is the nominally

acceptable value of ( ) ,RCV RCV dM t M−

for i = 1, of ( )PLR SNDRr t r− for i = 2 and of

( )PLRr t� for i = 3. The desired buffer level

can be set by the algorithm described in [31] or by the guidelines in [25]. Note that the words “statistically optimal“ in the title does not refer to the results presented, but to the statistically optimal control algorithm presented in this section. Statistical optimality means that no other algorithm will have smaller deviations from the perfect playout, i.e., the output of the algorithm is the playout speed that will give the statistically optimal results based on the three weight factors given by the user. The statistically optimal controller is given by (for a detailed derivation, see [25] or [30])

( ) ( )t t=u Gx , where

12 22 2

3 3 3

Br rw w w

� �− −= � ��

rG , where

12 3 1r w w= − , ( )22 3 2 3 12r w w w w= +

and

( )

( )

( )

11 1 12 3

12 1 1

1212 3 22

TRS

TRS

TRS TRS

nB

TRSn

n n TRS TRS

rw

r

r w r

× −

× −

−×

� �� = � �� +� ��

⋅ − +

0r

0 A

I A A

where 12 2211

3

r rrw

−= .

As shown in Fig.1, a Kalman filter [32] is used to find the estimate of xTRS(t), needed by the optimal controller. The input to the Kalman filter is a calculated measurement of rTRS, obtained by dividing the number of media-units arriving during a short time interval by the length of the time interval. We have developed a network simulator and an implementation of the total system, shown in Fig. 1, in Matlab [33], to produce results for both simulated and real network data.

4 Quality metrics

Subjective methods for measuring listening-only sound quality are the Mean Opinion Score (MOS) [34] and Degradation MOS (DMOS), described in Section 4.1. Objective methods include PESQ [35], described in Section 4.2 and late packet loss rate, described in Section 4.3. This paper also uses a dissimilarity measure, described in Section 4.4. Listening-only tests should be combined with the receiver buffer level when used to compare different algorithms, e.g. by assuring that the mean buffer levels of all algorithms are equal during the tests. For algorithms that use constant playout speed, existing quality metrics [36] [37] and performance bounds [38] that combine the effect of buffering delay and late packet loss rate, can be used.

4.1 MOS and DMOS

For MOS (defined by Annex B of [34]), subjects rate the voice quality as "excellent", "good", "fair", "poor", or "bad", on a scale from 5 to 1. The MOS score tends to lead to low sensitivity in distinguishing among good quality circuits. DMOS is a modified version, defined by Annex D of [34], which affords higher sensitivity. Here, the test persons hear the correct sound, followed by a short period of silence, and then the output sound from the system to be tested. The test subjects rate the degradation of the output sound as "inaudible", "audible but not annoying", “slightly annoying", "annoying", or "very annoying", on a scale from 5 to 1. We collected two male and two female voice samples from [39], and two music samples; one from Beethoven’s 9th symphony and one from David Byrne’s “Like humans do”. The samples were scaled (using WSOLA) according to the playout speed output of different algorithms. For incidents of packet loss or run-dry (where the playout speed is zero) silence was replaced by a low amplitude white noise. The test was performed by 14 test persons according to the DMOS standard described in Annex D of [34]. All sound files used 16-bit

mono PCM encoding, sampled at 44100 Hz2, because we would like our algorithms to work for all quality levels. VoIP and other sound transmitted over networks may also have higher quality in the future.

4.2 PESQ

ITU-T Recommendation P.862 [35] describes PESQ as an objective alternative to MOS for measuring voice quality. PESQ is a computer program that compares an original sound signal X(t) with a degraded signal Y(t). The output of PESQ is a prediction of the MOS score that test persons would give to Y(t). The PESQ scores in this paper were obtained by using 16000 Hz sound files, since we use the reference implementation of PESQ that works for 8000 and 16000 Hz sampling frequencies. The 16000 Hz sound files were obtained from the 44100 Hz files from [39] by using the Matlab [33] command “resample“. We did not use PESQ for music samples, since it is defined only for voice. According to [40], PESQ is very sensitive to stretching and compression of the sound signal. For a speech signal with much stretching and compression, where WSOLA was used to change the playout speed without changing the pitch, [40] reports that subjective listening tests showed very good hearing results, but that PESQ gave an average score of 3.2.

4.3 Packet loss and run-dry incidents

Many of the published playout buffer algorithms discard packets that arrive after a deadline. The rate of discarded packets to the total number of packets is called the late packet loss rate. The optimal control algorithm normally does not lose packets, but may experience incidents where the buffer runs dry. The corresponding run-dry rate is used as a quality metric in this paper. A run-dry incident with the duration of one media-unit will affect the sound quality less than the loss of a packet containing one media-unit, since no information is lost during a run-dry incident. Thus, with otherwise equal quality, a speech signal with x% run-dry rate will probably have a higher quality than a speech signal with x% late packet loss.

2 CD quality is 44100 Hz, 16-bit stereo PCM encoding and regular telephone quality is 8000 Hz 8-bit mono PCM encoding

4.4 Arentz dissimilarity measure

Content-based retrieval is an active research area, where methods are developed for searching for contents contained in digital text, sound, music, image and video, etc. One of the research areas within content based musical retrieval is Query-by-Humming systems. Arentz et al. [41] have developed the following dissimilarity measure (for Query-by-Humming systems) between two pieces of music a and b:

( ) ( )21 1

1, , , ,

i

j j j jj

d a b a a b bω − −=

=�

(1) where i is the number of notes in the tune and

( ), , ,k l m na a b bω represents the cost of

pairing up the note pair ( ),k la a in tune a with

the note pair ( ),m nb b in tune b. The cost function is defined as3

( ) ( ) ( )( ) ( ) ( )( ), , ,k l m n l k n ma a b b t a t a t b t bω = − − −

, where ( )it s is the timestamp for the given

note is s∈ . We use this measure to calculate the dissimilarity between the original sound played at the correct media speed rSNDR, and the resulting sound with playout speed rPLR(t). We calculate the cost function for an integer (i) number of media-units, as the time difference between the correct playout time period and the actual time period used to play the i media-units. Equation (1) is used to calculate the total dissimilarity measure for the playout period as the sum of these costs. The dissimilarity measure given by Equation (1) is dependent upon the length of the two tunes a and b. Therefore, in this paper, the dissimilarity per second will be used as the quality measure.

5 DMOS, PESQ and Arentz tests

In this section, we have run the listening-only tests DMOS, PESQ and Arentz on three different algorithms. To be able to rightfully compare the three algorithms, both the

3Arentz et al. used a constant scaling factor to compensate for tempo differences between the two tunes. Since we compare two traces with identical long term tempo, the scaling factor is not used (i.e., it is set equal to 1) in this paper.

buffering delay and the listening-only quality must be taken into consideration. We have adjusted the parameters of all three algorithms to make their mean buffer levels equal, to be able to compare the algorithms by comparing the results of the listening-only tests.

Fig. 2 Transport segment delay for simulated transport segment

5.1 DMOS and PESQ tests

This section compares the results from three different algorithms. Algorithm 1 is one of the most commonly used algorithms (Fixed Playout Delay), with a constant playout speed, which may drop to zero if packets arrive after their deadline. Algorithm 2 was published in a “to be submitted“ version of [17], and is chosen here because it is the only playout speed adjusting algorithm we have found that is documented well enough to be implemented. For playout buffer levels above a target level, the inter packet time (IPT) is set to f*normal_IPT, where f < 1, and for buffer levels below the target level, the IPT is set to s*normal_IPT, where s > 1. We use the suggested values s = 1.25 and f = 0.75. Algorithm 3 is the optimal control of playout speed. One simulated and one real transport segment trace are used. The transport segment delay for the simulated transport segment is shown in Fig. 2. The real trace will be presented as trace 1 in Section 6. Fig. 3 shows the DMOS and PESQ results for algorithms 1, 2 and 3, where all algorithms have the same mean buffer level. DMOS results are presented for two music samples, four speech samples and the mean of the speech samples. PESQ results are calculated for each of the four voice samples. The DMOS results are presented using markers, connected by lines, at the minimum value, the mean value and the maximum value.

Fig. 3 DMOS and PESQ results

The PESQ results include only four calculated scores, and are therefore presented by markers for each value. Since the rating of the perfect sound is not dependent upon the transport segment used, the results shown for perfect sound in the two graphs of Fig. 3 are equal. Note that DMOS and PESQ use different quality scales, see sections 4.1 and 4.2. Algorithm 1 discards packets that arrive after their deadline. For the simulated transport segment, one second of sound was lost due to late packet arrivals, but for the UDP transport segment, only short periods of sound were lost. These short periods happened during periods of no sound (between talkspurts) or low sound

for three of the voice samples, which therefore received high DMOS scores. The music samples did not contain any low-sound periods, and thus the information loss was easily heard, giving lower DMOS scores. The one second period where algorithm 1 discarded all packets from the simulated transport segment resulted in low DMOS scores for both speech and music. PESQ gave a higher score than DMOS, thus it seems that PESQ is less sensitive to loss of sound or information than DMOS. For algorithm 2, the rate change caused by the transport segment is small compared to the rate change caused by the algorithm, since algorithm 2 switches the playout speed very

frequently between 20% below and 33% above the correct media speed. Thus, as shown by Fig. 3, algorithm 2 received low DMOS scores for the voice samples for both transport segments. The music samples received higher DMOS scores than the voice samples (with large variations), thus for most test subjects, the frequent speed changes were less disturbing for music than for voice. As expected (since PESQ is sensitive to stretching and compression, as explained in Section 4.2), PESQ gave a lower score than DMOS for algorithm 2 for both transport segments. The optimal control algorithm (algorithm 3) received high DMOS scores for both voice and music, which for the UDP transport segment were comparable to the scores of the perfect sound. The average scores of the voice samples are also equal to the corresponding score of the perfect sound. Algorithm 3 uses stretching and compression, but without the frequent changes of playout speed that are present in algorithm 2. As expected, PESQ gave a lower score than DMOS also for algorithm 3, because changes in playout speed are still present. For the UDP transport segment, algorithm 1 received a slightly higher PESQ score than algorithm 3. Algorithm 1 received a high PESQ score because the late packet loss happened during periods of low sound or no sound for the voice samples, and algorithm 3 received a lower PESQ score because the PESQ algorithm is very sensitive to the stretching and compression of algorithm 3. In the DMOS test, however, algorithm 3 received a 0.6 point higher score than algorithm 1.

5.2 Results for Arentz dissimilarity measure

Since, as described in Section 4.4, Arentz dissimilarity measure costs are calculated based on the output results from running the different algorithms, and not based on sound files, only one cost is calculated for each combination of transport segment and algorithm.

To be able to roughly compare the scores from DMOS, PESQ and Arentz dissimilarity measure, we have used a common scale from 0 to 1, where 1 represents the best quality. The PESQ and DMOS scores are divided by 5, and the following equation is used for the Arentz dissimilarity measure:

dissimilarity measurenew score 1max dissimilarity measure

= −

(2) The maximum dissimilarity measure was approximately 0.2. Fig. 4 shows scaled DMOS and PESQ scores for the mean of the voice samples (equal to the mean values shown in Fig. 3) and for Arentz dissimilarity measure. Fig. 4 shows that Arentz dissimilarity measure is relatively close to the DMOS score for algorithms 1 and 3, but for algorithm 2, the closeness between DMOS and Arentz dissimilarity measure is very dependent upon the cost period. This is because algorithm 2 changes the playout speed very frequently. Short cost periods lead to high dissimilarity values, since many such periods will have a shorter or longer duration than the perfect duration. Long cost periods lead to low dissimilarity values since many shorter periods of stretching and compression occurs within a long cost period, which will thus have a duration that is relatively close to the perfect duration. For the 60 ms cost period, the Arentz dissimilarity measure is relatively close to the DMOS score for all algorithms, even closer than the PESQ score. Thus it seems that Arentz dissimilarity cost with a 60 ms cost period may be a good prediction for the DMOS score. Fig. 4 shows only 6 different combinations of algorithms and networks, thus to draw a better conclusion regarding the use of Arentz dissimilarity measure to predict the DMOS score, more algorithms and transport segments need to be tested with both DMOS and Arentz dissimilarity.

Fig. 4 Comparison of DMOS, PESQ and Arentz dissimilarity measure

6 Comparison with fuzzy network results

This section uses the same measurement traces as Ranganathan and Kilmartin [16]. They measured the Internet packet delays by transmitting packet streams from a host located at National University of Ireland, Galway (NUIG), Ireland to two other hosts, the first located at University of New South Wales (UNSW), Sydney, Australia, and the other at Dublin City University (DCU), Ireland. The trace details are given in Table 1. The media-unit size was chosen equal to the inter packet interval. For each of these four traces, Ranganathan and Kilmartin [16] evaluated their fuzzy network with PESQ. To compare results, we have run the optimal control algorithm on the same four traces, and evaluated the resulting voice files with PESQ. Ranganathan and Kilmartin [16] let the user or application choose a ‘history size’ to be used by the fuzzy network and a sensitivity parameter � used to control the responsiveness

of the system for decreasing network delays. Their results are reproduced in Figs. 5a-c. They consist of 3D graphs with the ‘history size’ and � along the horizontal axes and the results that we want to compare our algorithm to, on the vertical axis. Table 1: Internet delay traces from Ranganathan and Kilmartin [16]Trace no.

Internet path Inter packet interval

Trace date

Trace 1 NUIG - DCU 20 ms 28 April 2003

Trace 2 NUIG - UNSW 20 ms 30 April 2003

Trace 3 NUIG - DCU 40 ms 7 May 2003

Trace 4 NUIG - UNSW 40 ms 28 April 2003

Since PESQ is sensitive to stretching and compression of the sound signal (and thus also to the changes made by WSOLA), we can think of PESQ as a user and application that requires the player rate to be close to the correct media speed. In this section, we have

therefore used a relatively high value for 1xΔ and lower values for 2xΔ and 3xΔ (see section 3).

6.1 Results for trace 1

The results shown in Fig. 5d and e are obtained by running the optimal controller with 1xΔ = 1 media-unit, 2xΔ = 5 media-units/s and 3xΔ = 0.1 media-units/s2. Figs. 5a and b use the same range of � and history size. Thus, for each combination of � and ‘history size’, the PESQ score shown in Fig. 5a and the additional buffering delay (i.e., the delay introduced by receiver buffering) shown in Fig. 5b belongs to the same run. Each such combination of PESQ score and buffering delay can be compared to Fig. 5d, which shows the results from the optimal control algorithm with the mean playout buffer level along the x-axis and the PESQ score along the y-axis. This comparison shows that the optimal control algorithm has a higher PESQ score for most receiver buffer levels. At the highest

buffer level (at � = 10 and ‘history size’ = 100 in Figs. 5a and b), the PESQ score is close to equal, and at the lowest buffer level (at � = 50 and ‘history size’ = 20 in Figs. 5a and b) the optimal control algorithm has one point higher PESQ score than [16]. Fig. 5e shows the additional buffering delay on the x-axis and the run-dry rate on the y-axis. This can be compared to Fig. 5b, where the additional buffering delay is on the vertical axis, and Fig. 5c, where the late packet loss rate is on the vertical axis, in the same way as explained above. Fig. 5e shows that the maximum run-dry rate of the optimal control algorithm is less than 0.002. Figs. 5b and c show a late packet loss rate with a minimum value of 0.005, which is more than double the run-dry rate of the optimal control algorithm. The optimal control algorithm has a higher PESQ score (close to one point better for most buffer levels) and a much lower run-dry rate than the corresponding numbers from [16], and can thus be said to be a considerably better playout algorithm for Trace 1.

Fig. 5 Results from the optimal controller and from [16] for the NUIG-DCU trace with 20 ms packetization interval

6.2 Results for traces 2, 3 and 4

The results shown in Fig. 6 are obtained by running the optimal controller with 1xΔ = 1 media-unit, 2xΔ = 5 media-units/s and 3xΔ = 0.1 media-units/s2 for trace 2 and 3 (equal to the weight factors used for trace 1), and 1xΔ = 10 media-units, 2xΔ = 1 media-unit/s and

3xΔ = 0.1 media-units/s2 for trace 4. For trace 2, the optimal control algorithm has a PESQ score that is equal to or higher than the PESQ score from [16], while the run-dry rate of the optimal control algorithm is lower than the packet loss rate from [16]. Thus, the optimal control algorithm is a better playout algorithm for Trace 2. For trace 3, the optimal control algorithm has a PESQ score that is on average one point higher than the PESQ score from [16] for equal buffer levels, and a run-dry-rate that is slightly below the late packet loss rate of [16] for equal buffer levels. Thus, the optimal control algorithm is a considerably better algorithm for Trace 3.

a

b

Fig. 6 PESQ and run dry rate for traces 2, 3 and 4 For trace 4, the PESQ score is on the same level and the run-dry rate is comparable to the late packet loss rate from [16] and will thus (as

explained in Section 4.3) have a lower impact than the late packet loss rate. Thus, the optimal control algorithm is slightly better than [16] for Trace 4.

7 Summary and Conclusion

The optimal controller is based on a stringent notation and stringent mathematical models of the media receiver system. The notation and mathematical models are network and protocol independent, and can also be used as a basis for developing any kind of playout buffer algorithms. Our approach is statistically optimal by minimizing three deviations from the perfect playout, based on their relative importance: 1) Buffering delay 2) A playout rate different from the sender rate and 3) A change of playout rate. The importance will vary for different user and application requirements, and are thus freely tunable by means of weight factors. The optimal control algorithm has demonstrated very good results when compared to other algorithms in an objective technique for measuring voice quality (PESQ) and in a subjective listening test (DMOS), for both simulated and real network measurement traces. A comparison with an advanced fuzzy network algorithm [16] on real network data showed that the optimal control algorithm gave clearly better results.

8 Open problems

It is shown in [25] and [30] that the optimal control algorithm works very well even when it uses a wrong transport segment model, i.e., it is very robust. Section 6 demonstrated very good results for the optimal controller with a general model of the transport segment. However, an improved transport segment model could give even better results. An automatic real-time identification or detection of the transport segment state space model (to combine it with the optimal control algorithm) could lead to even better results than shown in this paper. This identification procedure could have parts similar to a subset of Matlab’s System Identification Toolbox [33]. The presented mathematics is independent of the medium, but this paper has validated only the audio case. The effects on video need to be investigated and validated.

A new quality metric for variable playout speed is needed, that combines the effects of buffering delay and listening-only or viewing-only media quality. Today, such metrics [36] [37] and performance bounds [38] exist only for constant playout speed.

9 Acknowledgements

I would like to thank M. K. Ranganathan at Sasken Communication Technologies Limited, Bangalore, India and L. Kilmartin at National University of Ireland, Galway, Ireland, for letting me use their measurement traces, used in [16], and for giving me permission to reproduce their results. I would also like to thank the anonymous reviewers for a number of critical and fruitful remarks that led to substantial improvements, and the editor for letting me revise the manuscript twice.

10 References

[1] Ramjee, R., Kurose, J., Towsley, D., and Schulzrinne, H.: “Adaptive Playout Mechanisms for Packetized Audio Applications in Wide-Area Networks”, 13th IEEE Proc. INFOCOM '94, Networking for Global Communications, Vol.2, Toronto, Canada, June 1994, pp. 680 - 688

[2] DeLeon, P., and Sreenan, C. J.: “An Adaptive Predictor for Media Playout Buffering”, Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Vol. 6, Phonenix, Arizona, March 1999, pp. 3097-3100

[3] Pinto, J., and Christensen, K. J.: “An Algorithm for Playout of Packet Voice based on Adaptive Adjustment of Talkspurt Silence Periods”, Proc. IEEE Conf. Local Computer Networks, Lowell, Massachusetts, October 1999, pp. 224-231

[4] Atzori, L., and Lobina, M. L.: “Speech Playout Buffering Based on a Simplified Version of the ITU-T E-model”, IEEE signal processing letters, 2004, Vol. 11, No. 3, pp. 382-385

[5] Atzori, L., Lobina, M. L., and Isola, M.: “Playout Buffering in IP Telephony: a Quality Maximization Approach”, 1st Int. Conf. Multimedia Services Access Networks, Orlando, Florida, June 2005, pp. 49-53

[6] Jung, Y., and Atwood, J. W.: “�-Adaptive Playout Scheme for Voice over IP Applications”, IEICE Trans. Commun., May 2005, Vol. E88-B, No. 5, pp. 2189-2192

[7] Jung, Y., and Atwood, J. W.: “Dynamic adaptive playout algorithm using interarrival

jitter and dual use of α”, IEE Proc. Commun., April 2006, Vol. 153, Issue 2, pp. 279-287

[8] Narbutt, M., and Murphy, L.: “Adaptive playout buffering for audio/video transmission over the internet”, Proc. IEE 17th UK Teletraffic Symposium, Dublin, Ireland, May 2001, pp. 27/1 -27/6

[9] Narbutt, M., and Murphy, L.: “VoIP Playout Buffer Adjustment using Adaptive Estimation of Network Delays”, Proc. 18th Int. Teletraffic Congress - ITC-18, Berlin, Germany, Sept. 2003, pp. 1171-1180

[10] Narbutt, M., and Murphy, L.: “A new VoIP adaptive playout algorithm”, IEE Telecommunications Quality of Services: The Business of Success (QoS 2004), London, March 2004, pp. 99-103

[11] Narbutt, M., and Murphy, L.: “Improving Voice Over IP Subjective Call Quality”, IEEE Commun. Letters, May 2004, Vol. 8, No. 5, pp. 308-310

[12] Narbutt, M., and Davis, M.: “An Assessment of the Audio Codec Performance in Voice over WLAN (VoWLAN) Systems”, Proc. 2nd Annual Int. Conf. Mobile and Ubiquitous Systems: Networking and Services (MobiQuitous’05), Vol. 00, San Diego, California, July 2005, pp. 461-470

[13] Liu, F., Kim, J., and Kuo, C.-C. J.: “Adaptive delay concealment for Internet voice applications with packet-based time-scale modification”, Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, Salt Lake City, Utah, May 2001, pp. 1461-1464

[14] Laoutaris, N., and Stavrakakis, I.: “An analytical design of optimal playout schedulers for packet video receivers”, Computer Communications (www.elsevier.com/locate/comcom), March 2003, Vol. 26, No. 4, pp. 294-303

[15] Laoutaris, N., Van Houdt, B., and Stavrakakis, I.: “Optimization of a packet video receiver under different levels of delay jitter: an analytical approach”, Performance Evaluation (www.elsevier.com/locate/peva), February 2004, Vol. 55, No. 3-4, pp. 251-275

[16] Ranganathan, M. K., and Kilmartin, L.: “Neural and fuzzy computation techniques for playout delay adaptation in VoIP networks”, IEEE Trans. Neural Networks, September 2005, Vol. 16, Issue 5, pp. 1174-1194

[17] Liang, Y. L., Färber, N., and Girod, B.: “Adaptive playout scheduling and loss concealment for voice communication over IP networks”, IEEE Trans. Multimedia, December 2003, Vol. 5, No. 4, pp. 532-543

[18] Liang, Y. L., Färber, N., and Girod, B.: “Adaptive playout scheduling using time-

scale modification in packet voice communications”, Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Vol. 3, Salt Lake City, Utah, May 2001, pp. 1445-1448

[19] ITU-T Recommendation G.711 Appendix I, “A high quality low complexity algorithm for packet loss concealment with G.711”, 1999

[20] ANSI Recommendation T1.521a-2000 (Annex B), “Packet loss concealment for use with ITU-T recommendation G.711”, 2000

[21] Gündüzhan, E., and Momtahan, K.: “A linear prediction based packet loss concealment algorithm for PCM coded speech”, IEEE Trans. Speech Audio Process., Nov. 2001, Vol.9, No.8, pp.778–785

[22] Rodbro, C.A., Murthi, M.N., Andersen, S.V., and Jensen, S.H.: “Hidden Markov model-based packet loss concealment for voice over IP”, IEEE Trans. Audio, Speech and Language Processing, Sept. 2006, Vol. 14, No. 5, pp.1609 – 1623

[23] Rodbro, C.A., Christensen, M.G., Andersen, S.V., and Jensen, S.H.: “Compressed domain packet loss concealment of sinusoidally coded speech”, Proc. 2003 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, 6-10 April 2003, Vol. 1, pp. 104-107

[24] Sanneck, H., Stenger, A., Younes, K., and Girod, B.: “A new technique for audio packet loss concealment”, IEEE Proc. Global Internet, Nov. 1996, pp 48-52

[25] Gade, B. H. H., PhD thesis, University of Oslo, to be printed in 2007.

[26] Strang, G., “Linear algebra and its applications” (Brooks/Cole Thomson learning, 3rd edn. 1988)

[27] Chen, C.-T.: “Linear System Theory and Design” (Oxford University Press, 3rd. edn., 1999)

[28] Hinrichsen, D. and Pritchard, A. J.: “Mathematical Systems Theory I, Modelling, State Space Analysis, Stability and Robustness” (Springer, 2005)

[29] Balchen, J. G., and Mummé, K. I.: “Process Control - Structures and Applications” (Van Nostrand Reinhold, New York, 1988), pp. 60-66

[30] Hafskjold, B.: “Optimal Control of Playoutbuffers”, Proc. Int. Conf. Computer, Communication and Control Technologies (CCCT '03), Orlando, Florida, USA, July/August 2003, Volume VI, pages 175-181

[31] Hafskjold, B.: “Anti-Run-Dry Algorithm for Optimal Control of Playoutbuffers”, Proc. Int. Symposium on Information and Communication Technologies (ISICT03),

Dublin, Ireland, 24. - 26. September, 2003, pages 410-417

[32] Gelb, A.: “Applied Optimal Estimation” (The MIT Press, Cambridge, Massachusetts and London, England,1974, 16th printing, 2001)

[33] Matlab’s homepage: www.mathworks.com [34] ITU-T Recommendation P.800: “Methods

for subjective determination of transmission quality”, in series P: Telephone transmission quality, Methods for objective and subjective assessment of quality, 1996

[35] ITU-T Recommendation P.862: “Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs”, in series P: Telephone transmission quality, telephone installations, local line networks, Methods for objective and subjective assessment of quality, 2001

[36] Narbutt, M., Kelly, A., Murphy, L., and Perry, P.: “Adaptive VoIP Playout Scheduling: Assessing User Satisfaction”, IEEE Internet Computing, July-Aug. 2005, Vol. 9, Issue 4, pp. 28-34

[37] Cole, R. G., and Rosenbluth, J. H.: “Voice over IP performance monitoring”, ACM SIGCOMM Computer Communication Review, April 2001, Vol. 31, Issue 2, pp. 9 – 24

[38] Moon, S., Kurose, J., and Towsley, D.: “Packet Audio Playout Delay Adjustment: Performance Bounds and Algorithms”, Multimedia Systems, Feb. 1998, Vol. 6, No. 1, pp. 17-28

[39] Nordavinden og sola, Norwegian dialect samples. A database of identical text read by different people. Available at: http://www.ling.hf.ntnu.no/nos, accessed Jan. 2006

[40] Liu, F., Kim, J., and Kuo, C.-C. J.: “Quality enhancement of packet audio with time-scale modification”, Proc. SPIE Vol. 4861: ITCOM 2002: Multimedia Systems and Applications V, Boston, Massachusetts, July 2002, pp. 163-173

[41] Arentz, W. A., Hetland, M. L., and Olstad, B.: “Retrieving musical information based on rhythm and pitch correlations”, Submitted to IEEE Trans. Pattern Analysis and Machine Intelligence. Printed in Arentz, W. A.: “Searching and Classifying Non-Textual Information”, PhD Thesis, Norwegian University of Science and Technology, 2004

11 Appendix

Table 2 gives an overview of the notation rules used in the paper, and Table 3 gives an overview of the specific symbols used. Table 2: Notation rules

Symbol Description Example from paper

Lowercase letter Scalar variable Right subscript Specification of the value rSNDR

Bold lowercase letter

Vector x (state vector)

Bold uppercase letter

Matrix A (System matrix) Dot above a variable

The time derivative of the variable ( )d

dt=x x�

Vertical lines on each side of a variable

2x x= , i.e. the absolute value of the variable x

( )PLRr t�

Right superscript T Matrix transpose, [ ]T a

a bb� �

= � ��

[ ]TTRS TRS SNDRr r= −x �

a b×0

A zero matrix with a rows and b columns, [ ]1 2 0 0× =0 1TRSn ×0

Table 3: Specific symbols used Symbol Description rSNDR Constant media-unit rate out of the sender, equal to

the correct media speed. rTRS (t) Media-unit rate out of the transport segment. rVB (t) Media-unit rate out of the virtual buffer rPB (t) Media-unit rate out of the playout buffer rPLR (t) Media-unit rate out of the player MVB (t) Number of media-units in the virtual buffer at time t MPB (t) Number of media-units in the playout buffer at time t MPLR (t) Number of media-units in the player at time t MRCV (t) Number of media-units in the receiver buffers at time

t, ( ) ( ) ( ) ( )RCV VB PB PLRM t M t M t M t= + +

MRCV,d (t) Desired number of media-units in the receiver buffers x State vector for the total state space model

TRSx State vector for the transport segment state space model

TRSn Number of states in TRSx

A System matrix for the total state space model

1,RCVA A sub-matrix of A

2,RCVA A sub-matrix of A

TRSA System matrix for the transport segment state space model

B Control matrix for the total state space model

Symbol Description u Control variable for the total state space model C Process noise matrix for the total state space model

TRSC Process noise matrix for the transport segment state space model

vTRS Noise vector for the transport segment state space model

w1 Weight factor 1: the importance of minimizing additional latency

w2 Weight factor 2: the importance of minimizing difference between playout speed and correct media speed

w3 Weight factor 3: the importance of minimizing the time derivative of the playout speed

G Optimal control gain matrix

12r , 22r , 2Br Variables used to simplify the presentation of G

a statistically optimal algorithm for multimedia receiver buffers

Documents