evaluation of packet video quality in real time using random neural networks samir mohamed gerardo...

Evaluation ofEvaluation ofpacket video qualitypacket video quality

in real timein real timeusingusing

Random Neural Random Neural NetworksNetworks

Samir MohamedSamir MohamedGerardo RubinoGerardo Rubino

IRISA, Rennes, FRANCEIRISA, Rennes, FRANCE

22Workshop RNN, ICANN’02, Madrid, 27/8/02

The problemThe problem

• How to automatically quantify the quality of the stream, as perceived by the as perceived by the receiverreceiver?

Source

Receiver

stream ofvoice,music,videovideo,multimedia,…

IP network


OUTLINEOUTLINE

1. Subjective tests2. Objective tests3. Our approach in 5 steps4. The obtained performances5. ANN vs RNN6. Application: analyzing parameters impact7. A view of our demo tool8. Related work in audio9. Ongoing research


1: 1: The “The “voie royalevoie royale””

• To quantify the quality at the receiver side …

• just put a human there,

• give her/him a scale (say, from 1 to 10),

• and ask her/him to evaluate the qualityof a (short) part of the stream (say, some seconds).


(1:) (1:) The “The “voie royalevoie royale”” (cont.)(cont.)

• Better:– put n humans at the receiver side,– ask them to evaluate a (short) given

sequence,– take the average as the quantified quality.

• Still better:– give to the set of humans the sequence to

evaluateplus several other sequences, in order to allow each member to adjust her/his scale


(1:) (1:) Subjective testsSubjective tests

• The previous procedure is in fact standardized:see, for instance, the norm ITU-R BT.500-10 (March 2000).

• Basically, the procedure is as previously described + a statistical filter of the results.

• Goal of the statistical filter:to eliminate bad observers.

• Bad observer:“one disagreeing with the majority”


(1:) (1:) DrawbacksDrawbacks

• The previous procedure gives “the truth”but doesn’t do the job automatically (so, a fortiori, not in real time neither).

• Moreover, performing subjective tests is costly– in logistics (you need about 20 people with

appropriate characteristics)

– in resources (you need an appropriate place or environment to put your set of human subjects)

– and in time to perform the tests.


(1:) Drawbacks (cont.)(1:) Drawbacks (cont.)

• Suppose you decide to analyze the way some factor (the loss rate in the network, for instance) affects quality:– you must then evaluate the function q = f(lr) where q

is the quality and lr is the loss rate in many pointsin many points and for many sequencesfor many sequences: VERY EXPENSIVEVERY EXPENSIVE.

• Suppose you want now to study q = g(lr,br,d) where br is the source bit rate and d the mean end-to-end delay: TOO EXPENSIVE TO BE TOO EXPENSIVE TO BE DONEDONE


2: 2: Another possibilityAnother possibility

• The other way is obviously to look for an explicit formula (or algorithm) giving q as a function of the considered factors (lr, d, etc.) in, say, O(1) time.

• This appears to be a formidable task, because of two factors:– no formal definition of what we want to quantify

(quality) is available

– the “intuitive concept”of quality depends a priori on too many factors and in a complex way


(2:) Objective tests(2:) Objective tests

• There is an area called “objective tests” where different procedures are proposed to evaluate the degradationdegradation of a sequence.

• They consist of specific metrics that compare the original and the degradated streams.

• Even if this is not the goal here, the results of these objective tests are not satisfactory so far.

• How do we know? by comparing the results obtained using these techniques with “the truth”, that is, with those coming from subjective tests.


(2:) An example(2:) An example

• The MNB metric:– developped at the US Dep. of Commerce, 1997

– for voice (VoIP applications)

– based on a cognition model

– has been shown to behave correctly in some cases

• In next slide, some results of a test against subjective evaluations(from T. A. Hall, Objective speech quality measures for Internet telephony, Proc. of SPIE 2001).


(2:) Performance of MNB 2(2:) Performance of MNB 2


(2:) (2:) The only exceptionThe only exception

• There is one trial of measuring quality without referring to the original sequence:

ITU “E-model” www.itu.intwww.itu.int

• It has been proposed for the specific case of VoIP applications.

• However, the comparison with subjective evaluation says that the performance of this metric can be very poor (in next slide, some results from T. A. Hall, Objective speech quality measures for Internet telephony, Proc. of SPIE 2001).


(2:) Performance(2:) Performanceof the E-modelof the E-model


(2:) An example in video:(2:) An example in video:the the ITS ITS mmetricetric

Source: S. Voran and S. Wolf, The development and evaluation of an objective video quality assesment system that emulates human viewing panels, in IBC 1992.


3a: 3a: Our approachOur approach: first step: first step

• Our goal is to take, in some sense, the best of subjective and objective approaches.

• As a first stepfirst step, we select a set of factors or parameters that we think are important to the final perceived quality.

• Even if this is an a priori task, it is not a difficult one.

• For each parameter, we select a few representative or important values.Our goal here is to discretize the problem.


(3a:) (3a:) Our approach: Our approach: first step first step (cont.)(cont.)

• In our case, we selected 5 parameters:– BR: Bit RateBR: Bit Rate.

The normalized rate of the encoder.In our environments, we considered 4 possible values (256, 512, 768 and 1024 KBps), normalized with respect to the maximal value.

– FR: Frame RateFR: Frame Rate, in fps (frames per sec).The rate at which the original stream is encoded.Four selected values: 6, 10, 15 and 30 fps.



– LR: loss rateLR: loss rate, in % (loss probability).Selected values: 0, 1, 2, 4 and 8 %.

– CLP: number of consecutive loss packetsCLP: number of consecutive loss packets.We consider packets dropped in bursts of size 1, 2, 3, 4 or 5.

– RA: ratio between intra macro-blocs to inter RA: ratio between intra macro-blocs to inter macro-blocsmacro-blocs.It (indirectly) measures the redundancy in the sequence.We used for this parameter five values between 0.05 and 0.45.



• Observe that there can be interactions (in general, complex) between the parameters– for instance, here, RA is used by the encoder to

protect the stream against losses (measured by LR)

• It must also be underlined that our approach does not depend on the specific set of chosen parameters.

• Observe that BR, FR and RA are source source parametersparameters, and that LR and CLP are network network parametersparameters.


3b: 3b: Our approach: Our approach: second stepsecond step

• We have a set of parameters

P = {1, 2 , …, P}

• For parameter i we have a set of possible values:

i {vi1, vi2, …}

• Any vector of the form

(v1i , v2j , …, vPk)

is then called a configuration.


(3b:) (3b:) Our approach: Our approach: second second step (cont.)step (cont.)

• In our experiments, we had P = 5 parameters• The # of selected values for them were

respectively 4, 4, 4, 5, 5, leading to

44455 = 1600 configurations• Call C the set of all possible configurations.• The second stepsecond step consists of selecting a reduced

part of C, trying to have a “good coverage” of this set.


(3b:) (3b:) Our approach: Our approach: second second step (cont.)step (cont.)

• In our experiments, we selected about 100 configurations.

• The method we followed was not to use something like a low discrepancy sequence of points in a hypercube, but– to take into account the characteristics of the

parameters– and the extreme values.

• Call SC the set of selected configurations


3c: 3c: Our approach: Our approach: third stepthird step

• For the third stepthird step, we must be able to reproduce an environment (source + network) where the selected parameters can be put in any configuration we want.

• In our case, we achieved this using simulators and appropriated controlled (lab conditions) networks.

• The third step consists of reproducing each configuration from the set SC and to send a fixed original sequence from source to receiver.


(3c:) (3c:) Our approach: Our approach: third step third step (cont.)(cont.)

1

1

2

2

3

3

SC = { 1, 2, … }


3d: 3d: Our approach: fourth stepOur approach: fourth step

• The result is a set of versions of , each having encountered different conditions at the source and in the network.

• We have now a set of sequences (in our tests, about 100 sequences), and a configuration associated with each one.

• The fourth stepfourth step consists of performing a standard subjective test on each sequence, to build a value assumed to be, by definitionby definition, the quality of the sequence.


(3d:) (3d:) Our approach: Our approach: fourth fourth step (cont.)step (cont.)

• In symbols, we have S sequences {1,…, S} and, associated with sequence i, the configuration (xi1, xi2 ,…, xiP).

• In other words, xik is the value of the kth parameter which led to the sequence i at the receiver.

• Together with this, we have the quality value of each version, coming from the subjective test;

i is the value of i.


(3d:) (3d:) Our approach: Our approach: fourth fourth step (cont.)step (cont.)

1 2 3 …

LRLR 2% 0% 1% …

FRFR 15 6 30 …

…

3.53.5 2.52.5 3.23.2 ……


3e: 3e: Our approach: Our approach: fifth (last) fifth (last) stepstep

• The S sequences are now divided into two sets, randomlyrandomly. For instance, our 100 sequences were divided into one set of about 80, and one set of about 20.

• To simplify, assume we renumber things such that the first set is {1, …, K}.

• The idea is then to train a Neural Network The idea is then to train a Neural Network (NN) to learn (NN) to learn “the” function with function with PP inputs and inputs and 1 output, which associates with the input (1 output, which associates with the input (xxk1k1, ,

xxk2 k2 ,…, x,…, xkPkP) the output ) the output kk, for , for kk = 1,…, = 1,…,KK..


(3e:) (3e:) Our approach: Our approach: fifth (last) fifth (last) step (cont.)step (cont.)

• The second set of sequences is then used to validate the obtained NN (standard approach).

• The hope is the following: if we give to the function an input (x1, x2 ,…, xP) which is not in the data base (taht is, not in SC), the output y should be close to the subjective evaluation of any sequence degradated through a system (source + network) where the configuration was (x1, x2 ,…, xP).


(3e:) (3e:) Our approach:Our approach:key implicit assumptionkey implicit assumption

• Our approach implies the following implicit assumption:

For any sequence For any sequence (or for any sequence (or for any sequence

belonging to a given family or class),belonging to a given family or class),the subjective quality depends the subjective quality depends onlyonly

on the values of the set of chosen on the values of the set of chosen characterizing parameterscharacterizing parameters.


(3e:) (3e:) Our approach:Our approach:key implicit assumption (cont.)key implicit assumption (cont.)

• This means in particular that given, say, three video sequences perhaps with very different contents, if the system configuration is the same (source bit rate, loss rate, …) then the perceived quality will be roughly the same.


(3e:) (3e:) Our approachOur approach at work at work

Source

Receiver

stream ofvoice,music,videovideo,multimedia,…

IP network

NNNN

asking the sourcefor BR, FR, RA

measuringLR, CLP


(3e:) (3e:) Our approachOur approach at work at work (cont.)(cont.)

Receiver

sourceparameters

module

BR

FR

RAnetwork

parametersmodule

LR

CLP

NN measureof

quality


4: Our results in a nutshell4: Our results in a nutshell

• Once trained, the NN is supposed to behave like an average human observer face to a received sequence.

• Observe that the performance of the method depends on the selected parameters, but that there is no restriction in such a choice.

• Moreover, if a posteriori some new parameter appears to be important, it can be easily added following the same procedure.


(4:) Our results in a nutshell (4:) Our results in a nutshell (cont.)(cont.)

• The trained NN (classical and RNN) correlated remarquably well against human evaluations.

• They (obviously) run in negligible time.• They allow us to study the behavior of quality

as a function of several parameters.• We did the same work for audio, with the same

results.• We also developped an example of application

using our tool for control purposes (in audio).


(4:) Performances (RNN) (4:) Performances (RNN) during the training phaseduring the training phase


(4:) Another view(4:) Another view

1,0

2,0

3,0

4,0

5,0

6,0

7,0

8,0

9,0

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77

Sample number

MO

S s

core

s

Actual Predicted


(4:) Performances (RNN) (4:) Performances (RNN) during the validation phaseduring the validation phase


(4:) Another view(4:) Another view

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Sample number

MO

S sc

ores

Actual Predicted


5: ANN vs RNN5: ANN vs RNN

• We used the toolbox of MATLAB implementing several standard Neural Networks techniques (Artificial Neural Networks, ANN) and a specific software for RNN.

• We compared their performances mainly in the respective learning abilities.

• We will present here some of the obtained results, which show that RNN behave better for our applications.


(5:) An interesting example:(5:) An interesting example:left: RNN right: ANNleft: RNN right: ANN


(5:) (5:) RNN vs. ANNRNN vs. ANN::# of hidden neurons# of hidden neurons


(5:) (5:) RNN vs. ANNRNN vs. ANN:: # of hidden neurons # of hidden neurons


6: Analyzing parameters impact6: Analyzing parameters impact


(6:) Analyzing parameters impact (6:) Analyzing parameters impact (cont.)(cont.)


(6:) (6:) MPQMMPQM (well-known (well-known metric)metric) when applying loss when applying losseses


(6:) (6:) ITS ITS (well-known metric)(well-known metric)when bit rate is variedwhen bit rate is varied


7: Our demo tool7: Our demo tool


8: Our work on8: Our work oncontrol schemes for audiocontrol schemes for audio

Receiver Sender

network

trained neural network

RTP/RTCP

MOS

statistics

codec,playback, ...

payload

RTP/RTCP

codec,packetization, ...

Hybridcontroller

TFRC

suggested ratenetwork state,

MOS

payload

control

stats.


(8:) Some simulation results(8:) Some simulation results

BW needed by PCM

BW needed by GSM

BW needed by PCM

BW needed by GSM


(8:) Illustrating our control (8:) Illustrating our control applicationapplication

0

1

2

3

4

5

time(sec.)

MO

S

MOS without CM MOS with CM


9: Ongoing work9: Ongoing work

• Refining our initial models for quantitying the quality of audio and video transmission:– better loss models– exploration of new parameters

(for instance, FEC)– trying to characterize streams types


(9:) Ongoing work (cont.)(9:) Ongoing work (cont.)

• Coupling our approach with traffic prediction (also using neural techniques)

• Exploration of other applications (diffserv

• Work on learning algorithms for RNN (numerical analysis)

evaluation of packet video quality in real time using random neural networks samir mohamed gerardo...

Documents

workshop rnn

vs rnn

france slide

majority slide

subjective tests

quantified quality

complex way slide

ip network slide