human perception-based distributed architecture for scalable video conferencing services:...

Ann. Telecommun.DOI 10.1007/s12243-013-0355-x

Human perception-based distributed architecturefor scalable video conferencing services: theoretical modelsand performance

Tien Anh Le · Hang Nguyen

Received: 17 July 2012 / Accepted: 9 January 2013© Institut Mines-Telecom and Springer-Verlag France 2013

Abstract This research work proposes a human perception-based distributed architecture for the multiparty video con-ferencing services. The new architecture can effectivelyreduce the unnecessary traffic of the multilayer videostreams on the overlay network. Rich theoretical models ofthe three different architectures: the proposed perception-based distributed architecture, the conventional centralizedarchitecture, and perception-based centralized architecturehave been constructed by using queuing theory to reflectthe traffic generated, transmitted, and processed by the threearchitectures. The performance has been considered in dif-ferent aspects from the total waiting time to the requiredservice rates. Together, the modeling tools, the analysis,and the numerical results help to answer the common con-cern about advantages and disadvantages of the centralizedand distributed architectures. Overall, the proposed humanperception-based distributed architecture can maintain asmaller total waiting time with a much smaller require-ment of service rate in comparison with the conventionalcentralized architecture and perception-based centralizedarchitecture.

Keywords Waiting time · End-to-end delay ·Architecture analysis · Video conference ·Service architecture · Distributed architecture ·Centralized architecture · Scalable video coding ·Application layer multicast

T. A. Le (�) · H. NguyenDepartment of Wireless Networks and Mobile MultimediaServices, Telecom Sud Paris, Evry, Ile-de-France, 91011, Francee-mail: [email protected]

H. Nguyene-mail: hang [email protected]

1 Introduction

Video conferencing service has been applied for a longtime in business and everyday life. In general, centralizedarchitectures are mainly used to build video conferenc-ing services and distributed architectures are finding theirways to be widely recognized. Centralized and distributedarchitectures have their own advantages, and questions areusually asked about which architecture is preferable in cer-tain network and usage conditions. Main players in thecentralized video conferencing architectures are often spe-cial equipment-based solutions [6, 12], web-based services[14], or IMS-based architectures [15]. When more partici-pants want to join the conference (e.g., at big events), thecost of the centralized architecture increases sharply. There-fore, distributed architectures have been proposed in [3,4]. Both IP multicast and application layer multicast havebeen applied to build the distributed video conferencingarchitecture [5, 13]. In [8, 13], scalability has been con-sidered and scalable video coding (SVC) has been appliedto support various types of participants’ terminals. Fromthe best of our knowledge, none of the proposed dis-tributed architectures has compared its performance withthe conventional centralized architecture. Thus, no conclu-sion can be made on whether a distributed architecture isreally better than the centralized architecture and whetherservice users should bother changing their current central-ized conferencing systems with a new distributed system.So far, [9] is an early attempt to compare the two typesof conferencing architectures using its proposed evalua-tion platform [16]. The simulation results show that, whileusing a much higher computational capacity, the multi-point control unit (MCU) of the centralized architecturecan only support a similar bit rate with the distributedarchitecture at the trade-off of a much higher delay. These

mailto:[email protected]

mailto:[email protected]

Ann. Telecommun.

results were interesting but they need to be validated bytheoretical analysis in order to be applied in more generalconditions.

The rest of the paper is organized as follows. In Section 2,we will propose an enriched human perception-based dis-tributed scalable video conferencing service architecture.Theoretical analysis and mathematical expressions for com-paring two different criteria among three different archi-tectures: centralized, perception-based centralized, and dis-tributed scalable video conferencing services using queuingtheory are constructed and the numerical results are calcu-lated and presented in Section 3. Conclusion with futurework is explained in Section 4.

2 Proposed architecture of human perception-baseddistributed scalable video conferencing services

In general, any cost function can be applied to build themedia distribution tree for scalable video coding contents.In our first proposal [10] and in this research work, theapplication-aware multivariable cost function proposed in[11] can be used as the optimal cost function to build themedia distribution tree for the perception-based distributedarchitecture.

Figure 1 displays the main characteristics of our pro-posal. A cluster is a group of k peers which have the nearestdistance to each other according to the optimization of theapplied cost function. When a peer wants to join the over-lay group, it will first try to explore its nearest cluster to joininto by measuring the costs to reach the leaders of all clus-ters. A cluster has the maximum size of k peers dependingon the network conditions. A group’s leader is the one whohas the minimum total cost to reach to all other peers inthe cluster. Here, we assume that all the important criteriasuch as processing and computational capacity, the avail-able memory, and bandwidth of the group leader candidatehave been fully considered in the applied multivariable costfunction before calculating the cost. All leaders, from thefirst layer, will then use the same cost function to calculateits costs to reach to all other leaders at layer 1. These costswill then be applied to form clusters and a second layer. Thecalculations are made until a maximum number of layers isreached (lmax = logkN). A leader l will receive bit streamsfrom its cluster peers j with a throughput of λj→l and a traf-fic variation represented by C2

j→l . The leader makes (k − 1)duplications of the arriving bit stream before multicastingthe traffic back to other peers at a variation of C2

ψ→l and tothe upper layer’s leaders (each peer within the cluster willreceive bit streams from all other peers in the same clusterbut not its own bit stream). At the same time, it receives thebit streams from the upper layer’s leader and forwards themto its cluster members.

The proposed architecture is applied when scalable videocoding (or other kinds of multilayer video coding) is usedfor the video conferencing service. The main advantage ofmultilayer video coding in general and scalable video cod-ing in particular is that the video can be encoded into a baselayer bit stream and several enhancement layer bit streams.

We call the maximum predefined number of enhance-ment video layers at each participant ne

max. Each participantcan send an arbitrary number of enhancement video lay-ers to the overlay group, but this should not exceed themaximum number of enhancement video layers.

In a video conferencing session, at any given time, thereare normally one or just a few active speakers (active speak-ers are the conference participants who are giving the speechor actively participating into the argument/discussion). Theactive speaker can be automatically found by comparing theparticipant microphones’ output power. A simple reason isthat, if all participants are to be displayed with full qual-ity in a conference session, a human being will not haveenough perception capacity to follow all of them. Their ter-minals also have the difficulty in displaying all of the fullquality video bit streams from too many users. From themulticast tree’s point of view, an auto active speaker detec-tor (AASD) can easily reduce the unnecessary traffic for theentire distributed system. The AASD is a functional blockplaced at each peer to automatically detect whether the peeris an active speaker or not by comparing its input audiopowers, by visual information, or by any combination ofaudio–visual methods.

We call rbhl and r

ei

hl the traffic rectification coefficients onbase and ith enhancement video layers from hth participantto a cluster leader l. Let r

ei

hl = 1 if the member wants toreceive ith enhancement video layer from hth participant; itequals to 0 otherwise.

A peer sends its base layer SVC stream with a bit rateof γ b

hl to its cluster leader. Beside sending base video lay-ers, active speakers also send their ith enhancement layersto the cluster leader (at a bit rate of γ ei

hl ). The enhance-ment layer’s bit stream from an inactive but interesting usersmay be desirable for some peers. In this case, those partic-ular peers can inform its cluster leader about the interestinguser(s) they want to receive enhancement video layers from.Through a network of leaders, this information will be noti-fied to the interesting users and to all the cluster leaders.After receiving the notification, the interesting users willsend their enhancement video layers to the group as if theyare active speakers. Each leader maintains a conferee pref-erence table (CPT) and exchanges it with all other clusterleaders. This is a record table of N rows and N columns(N is the number of participants). The AASD at each par-ticipant can continuously probe its cluster leader to seewhether it is an active speaker or not so that the clusterleader can update to the CPT. Therefore, the default value

Ann. Telecommun.

Fig. 1 General model analysisof the perception-baseddistributed video conferencingservice architecture whenN = 8 and k = 2

of the CPT can be determined by the AASD at each par-ticipant. Another option for updating the values in the CPTis that each conferee can also select the interesting partic-ipant(s) it wants to receive enhancement video layers fromby updating 1 to the corresponding place(s) (CPT[h,h’])of the table. Each participant can also select whether ornot it wants to have the CPT automatically updated by itsAASD. Thus, the CPT can be dynamically updated by theAASD or it can be manually maintained by the users’ selec-tions. The CPT’s content is synchronized among all theleaders of the perception-based distributed architecture. Asa result, rei

hl = 1 if it is an active speaker detected bythe AASD or it is an interesting user registered by at least

another participant; it equals to 0 otherwise. A participantsends its enhancement video layers if its corresponding traf-fic rectification coefficient r

ei

hl = 1 and does not send itotherwise. On the other hand, after receiving the enhance-ment video layers from its upper layer leader, each leaderdecides whether or not it should forward an enhancementvideo layer to its cluster members based on the updatedinformation it has in its CPT. A private point-to-point videochat session can be established and maintained using thisCPT’s mechanism.

Each peer in the multicast group will contribute only aportion of its computational capacity to support the con-ference according to the number of enhancement video

Ann. Telecommun.

layers required by the conference in order to maintain asteady state of the multicast system (otherwise, there will becongestion at peers). This portion of contributing computa-tional capacity is flexible and should be variable when moreenhancement video layers are required. However, there arerequired service rates at the leaders (Ml) of the perception-based distributed architecture, at the central leader (Mp)of the perception-based centralized architecture, and at thecentral MCU (Mm) of the centralized architecture. These arethe required services rates to maintain steady states at thequeues processed by each architecture so that no congestionoccurs.

These three different architectures will be mathemati-cally modeled and analyzed in the next section. Their perfor-mance will be considered by two different criteria. The wait-ing time performance compares the processing and queuingmechanisms of the three architectures. The required servicerates explain how much computational capacity the impor-tant nodes have to equip in order to support a steady state ofthe queue formed by each architecture. By approaching theperformance from different aspects, a completed view of theproposal is archived.

3 Theoretical analysis

In order to compare the queuing delay of the three differentarchitectures, three queuing models are constructed.

According to [7], the approximated waiting time of eachservice architecture (GI/GI/1 queue of general distributionof interarrival time, general distribution of service time, oneparallel server) is calculated by

Wq ≈(

ρ

1 − ρ

)(C2

A + C2B

2

)(1

μ

)g

(ρ, C2

A, C2B

)(1)

where

g(ρ, C2

A, C2B

)

=

⎧⎪⎨⎪⎩

exp

[−2(1 − ρ)

3ρ·(1 − C2

A

)2

C2A + C2

B

], if C2

A < 1

1, if C2A ≥ 1

(2)

Formula 1 has a nice “product form” of three terms: (1)a traffic-intensity factor, (2) a variability factor, and (3)a time-scale factor (each packet requires 1

μ[unit time] of

service).C2

B is a fixed value determined by the type of hard-ware used at the queuing nodes. The Squared Coefficient of

Variation (SCV) value of the distribution of arriving trafficat each node

(C2

Ai

)can be calculated by:

C2Ai = 1

λi

⎛⎝γiC

2Oi +

k∑j=1

λjC2ji

⎞⎠ (3)

The SCV value of the distribution of departing traffic fromeach node

(C2

Di

)can be calculated by:

C2Di = ρ2

i C2Bi +

(1 − ρ2

i

)C2

Ai (4)

Then,

C2ij = λi→j

λDi

C2Di +

(1 − λi→j

λDi

)(5)

By replacing Eq. 4 into Eq. 5, we have the final form ofthe SCV of service rate for the traffic generated by node i

departing for node j :

C2ij = λi→j

λDi

[ρ2

i C2Bi +

(1 − ρ2

i

)C2

Ai

]+

(1 − λi→j

λDi

)(6)

For all of the following models, we use an abstract conceptof traffic arriving from the “multicast duplicator.” This isdue to the fact that, in a multicast session, each transmittingnode has to duplicate the arriving traffic before forwardingthem to several multicast receivers. We model that extra traf-fic as the traffic from an abstract multicast duplicator (ψ) atthe multicast transmitter.

3.1 Theoretical expression of the waiting time in queues

Each peer encodes a scalable video stream with a base layerand several enhancement layers. To calculate the waitingtime in the three architectures, we need to properly calculatethe value of CoV 2 of the combined stream in these threecases from the available values of the base and enhancementvideo layers (component streams). We apply the derivationfrom [7]. This derivation is motivated by (a) approximat-ing the component streams as independent renewal pro-cesses and (b) approximating the combined stream also as arenewal process. In the case of perception-based distributedarchitecture, we have:

CoV 2 (X

phl

) = 1

γ ahl

nemax∑

i=0

γ ihl · CoV 2

(Xei

hl

)(7)

The perception-based centralized architecture can bedefined as a centralized video conferencing service architec-ture in which the server owns the perception-based optionof controlling its output bit rates based on the requirementsfrom its video conferees. In the case of perception-basedcentralized architecture, we have:

CoV 2(X

php

)= 1

γ ahp

nemax∑

i=0

γ ihp · CoV 2

(Xei

hp

)(8)

Ann. Telecommun.

In the case of centralized architecture, we have:

CoV 2 (Xa

hS

) = 1

γ ahS

nemax∑

i=0

γ ihS · CoV 2

(Xei

hS

)(9)

3.1.1 Perception-based distributed architecture

For the perception-based distributed architecture, the dis-tributed model is shown in Fig. 1. The throughput arrivingto a leader at layer l (λAl) is comprised of:

– Throughput arriving from all peers j of the cluster tothe leader l (λj→l),

– Throughput from the multicast duplicator ψl to theleader l (γψ→l). It is considered as the external trafficat the leader l.

The throughput arriving from all peers of the cluster totheir leader l (λj→l) is composed of the throughput fromall the video bit streams of all the conferees belong-ing to the sub-trees whose root is the leader l to theleader l:

λj→l =jkl−1∑

h=1+(j−1)kl−1

γphl (10)

The throughput from the multicast duplicator ψl to theleader l (γψ→l) comprises of (k − 1) copies of throughputfrom all the peers:

γψ→l = (k − 1)

N∑h=1

γphl (11)

From Eqs. 10 and 11, we can calculate the overall through-put arriving to the leader l:

λAl = k

N∑h=1

γphl (12)

The purpose is to form an equation for calculating the SCVof times between arrivals to the leader l because it will deter-mine the waiting time of the traffic queue at the leader l.The general form of C2

Al is:

C2Al = vAl

λAl

, (13)

In which, vAl is calculated by:

ϑAl = γψ→l · C2oψl +

N∑j=1

λj→l · C2j→l (14)

Applying Eqs. 3, 13, and 14, we can obtain the SCV of theservice rate at the leader l:

C2Al = 1

λAl

·⎛⎝γψ→l · C2

oψl +N∑

j=1

λj→l · C2j→l

⎞⎠ (15)

We need to calculate the SCV of times between departures

from peer j to the leader l(C2

j→l

)and from the abstract

multicast duplicator to the leader l(C2

oψl

). The general

form of the SCV of times between departures from the

abstract multicast duplicator to the leader l(C2

oψl

)is:

C2oψl =

k−1∑j

C2j→l (16)

At the first layer, since all the peers are directly the videoconferees, we have:

C2j→l=1 = C2

h=j→l = CoV 2 (X

phl

)(17)

The value of CoV 2(X

phl

)is given by Eq. 7. We can calcu-

late the SCV of times between departures from the abstractmulticast duplicator to the leader l as the combination of theSCV of times between departures from (k − 1) peers j tothe leader l:

C2oψl=1 =

k−1∑j=1

C2j→l=1 (18)

In Eq. 15, for the first layer, all of the parameters have beengiven in Eq. 12 (λAl), Eq. 11 (γψ→l), Eq. 10 (λj→l), Eq. 18(C2

oψl

), and Eq. 17

(C2

j→l

). Therefore, C2

Al=1 is known for

the first layer. At the second layer, according to Eqs. 4 and5, we have:

C2j→l=2 = λj→l

λDj

[ρ2

j · C2Bj +

(1 − ρ2

j

)· C2

Aj

]

+(

1 − λj→l

λDj

)(19)

The traffic intensity (traffic congestion) at each peer (ρj ) is

calculated by ρj = λj

Ml, λj = λAj = k

N∑h=1

γphl . λDj is the

throughput departing from a peer j , calculated by:

λDj = k ·N∑

h=1

γphl (20)

For the second layer, the SCV of service rate for the traffic

arriving to the peer j(C2

Aj

)at layer 2 can be calculated

from C2Al=1 at the first layer as:

C2Aj = C2

Al=1 (21)

In Eq. 19, all of the parameters are known (λj→l in Eq. 10,λDj in Eq. 20, ρj , C2

Aj in Eq. 21). Therefore, C2j→l=2 is

known. We can calculate the SCV of times between depar-tures from the abstract multicast duplicator to the leader l

Ann. Telecommun.

of the second layer as the combination of the SCV of timesbetween departures from (k − 1) peers j to the leader l atthe second layer:

C2oψl=2 =

k−1∑j=1

C2j→l=2 (22)

In Eq. 15, for the second layer, all of the parameters havebeen given in Eq. 12 (λAl), Eq. 11 (γψ→l), Eq. 10 (λj→l),

Eq. 22(C2

oψl=2

), and Eq. 19

(C2

j→l=2

). Then, we can cal-

culate the SCV of service rate for the traffic arriving to theleader l at layer 2

(C2

Al=2

). Recursively, we can calculate the

value of C2Al on all the upper layers by applying the same

method of the first and the second layers and using Eq. 15.From the value of C2

Al of all layers obtained from Eq. 15, wecan calculate the waiting time at a leader in layer l accordingto Eq. 1; thus, we have:

Wql ≈(

ρl

1 − ρl

)(C2

Al + C2Bl

2

) (1

μl

)(23)

The required service rate and the congestion rate at eachleader l (μl, ρl) are explained and calculated in more detailsin Section 3.3 and Eq. 41. Since k members in a clusterhave to wait in sequence to be served by the cluster leader,the total waiting time of the perception-based distributedarchitecture is calculated by:

Wqd =lmax∑l=1

kWql (24)

In which, lmax is the maximum number of layers in theperception-based distributed architecture.

3.1.2 Centralized architecture

For the centralized architecture, the centralized model isshown in Fig. 2. The throughput arriving to the centralizedMCU (λAS) is composed of:

– Throughput arriving from each peer j = h to the MCU(λj→S),

– Throughput generated by the multicast duplicator ψS

for the MCU (γψ→S). It is considered as the externaltraffic at the MCU.

The overall throughput arriving to the centralized MCU istherefore:

λAS = γψ→S +N∑

j=1

λj→S (25)

Fig. 2 General model analysis of the centralized video conferencingservice architecture when N = 6

Applying Eq. 3, we can obtain the SCV of the service rateat the MCU server:

C2AS = 1

λAS

·⎛⎝γψ→S · C2

oψS +N∑

j=1

λj→S · C2j→S

⎞⎠ (26)

From the value of C2AS obtained from Eq. 26, we can calcu-

late the waiting time at the MCU according to Eq. 1; thus,we have:

Wqm ≈(

ρS

1 − ρS

) (C2

AS + C2BS

2

)(1

μm

)(27)

The required service rate and the congestion rate at theMCU (μm, ρm) are explained and calculated in more detailsin Section 3.3 and Eq. 42. Since N participants have to waitin sequence to be served by the MCU, the total waiting timeof the centralized architecture is calculated by:

Wqc = NWqm (28)

3.1.3 Perception-based centralized architecture

The perception-based centralized architecture can be con-sidered as a special case of the perception-based distributedarchitecture in which the cluster size is equal to the totalnumber of participants. Therefore, there is only one sin-gle cluster of (N − 1) cluster members and one cluster

Ann. Telecommun.

leader. The perception-based centralized architecture canalso be seen as a special case of the centralized architec-ture in which the participant can have an option to receiveboth the base and enhancement video layers (full qual-ity) if the content is interesting and only the base videolayer if the content is not. The throughput arriving to theperception-based centralized server p (λAp) is composed of:

– Throughput arriving from each peer j to the perception-based centralized server p (λj→p),

– Throughput generated by the multicast duplicator ψp tothe top leader (γψ→p). It is considered as the externaltraffic at the perception-based centralized server p.

The overall throughput arriving to the perception-basedcentralized server p is therefore:

λAp = γψ→p +N∑

j=1

λj→p (29)

Applying Eq. 3, we can obtain the SCV of the service rateat the perception-based centralized server p:

C2Ap = 1

λAp

·⎛⎝γψ→p · C2

oψp +N∑

j=1

λj→p · C2j→p

⎞⎠ (30)

After obtaining C2Ap from Eq. 30, we can calculate the

waiting time at the perception-based centralized server p

according to Eq. 1; since N members in a cluster haveto wait in sequence to be served by the cluster leader,the total waiting time of the perception-based centralizedarchitecture is calculated as:

Wqp ≈ N

(ρp

1 − ρp

)(C2

Ap + C2Bp

2

) (1

μp

)(31)

The required service rate and congestion rate at theperception-based centralized server p (μp, ρp) are explainedand calculated in more details in Section 3.3 and Eq. 43.

3.2 Numerical calculation of the waiting time

The newly proposed theoretical models and mathematicalexpressions formed in the previous subsection 3.1 allowus to evaluate the performance in the two different per-formance criteria for the three architectures in real time,with an arbitrary number of participants and with veryheterogeneous contexts of peers such as peers with termi-nals of different screen sizes, different available bandwidth,different computational capacities, and a variety of users’preferences. They can be applied to model very complexvideo conferencing services. Our first step here is to havea first comparison of the three architectures on the two dif-ferent criteria. We apply the available traffic models of theSVC video streams which are currently limited to the mean

values of a video session presented in [1, 2] for a simplecase of the distributed scalable video conferencing services.In this subsection, to provide first numerical results of thewaiting time in the three architectures, we apply an averag-ing of some video conferencing sessions’ values presentedin the previous subsection 3.1 so that we can apply the dataprovided in [1, 2].


Figure 1 shows the queuing model of a random cluster onlayer l in the perception-based distributed architecture. Eachcluster has k peers, one of them is elected to be the clusterleader based on its cost to reach all cluster members. Wehave:

⎧⎪⎪⎨⎪⎪⎩

C2Al = CoV 2

p

[(k − 1)2 + kl−1

]k

at first layer

C2Al = (k − 1)2CoV 2

p + kl−1C2Al−1

kat higher layers

(32)

From the value of C2Al obtained from Eq. 32, we can calcu-

late the waiting time at a leader in layer l according to Eq. 1,we have:

Wql ≈(

ρl

1 − ρl

) (C2

Al + C2Bl

2

) (1

μl

)(33)

The required service rate and the congestion rate at eachleader l (μl, ρl) are explained and calculated in more detailsin Section 3.3 and Eq. 41. Since k members in a clusterhave to wait in sequence to be served by the cluster leader,the total waiting time of the perception-based distributedarchitecture is calculated by:

Wqd =lmax∑l=1

kWql (34)

In which, lmax is the maximum number of layers in theperception-based distributed architecture.


Figure 2 shows the queuing model for the MCU-basedarchitecture. Here, all N peers are generating a mediastream with a mean data rate of γa (in megabyte per second)(this is the mean over the entire video conference sessionof the video traffic generated by the video participant j ,γa = Eh

(γ ahl

)). Each peer sends all its encoded video to

the MCU (e.g., peer’s output is all sent to the commonMCU). The MCU then routes back N media streams to N

Ann. Telecommun.

participating peers, each contains data from (N − 1) otherpeers (assuming that each peer will not receive its ownstream). We have:

C2AS = CoV 2

a [(N − 1)(N − 2) + 1](N − 1)

(35)

From the value of C2AS obtained from Eq. 35, we can calcu-

late the waiting time at the MCU according to Eq. 1; thus,we have:

Wqm ≈(

ρS

1 − ρS

)(C2

AS + C2BS

2

) (1

μm

)(36)

The required service rate and the congestion rate at theMCU (μm, ρm) are explained and calculated in more detailsin Section 3.3 and Eq. 42. Since N participants have to waitin sequence to be served by the MCU, the total waiting timeof the centralized architecture is calculated by:

Wqc = NWqm (37)


When the cluster size of the perception-based distributedarchitecture equals to the total number of participants(k = N), we obtain a new perception-based centralizedarchitecture. The advantage of this new architecture isthat it can inherit the perception-based ideas of the pro-posed perception-based distributed architecture such as theflexibility of reducing the unnecessary traffic due to thehuman-perception limitation. It requires a good leader nodeto manage all participants. The SCV of service rate for theperception-based centralized architecture is therefore:

C2Ap = CoV 2

p [(N − 1)(N − 2) + 1](N − 1)

(38)

After obtaining C2Ap from Eq. 38, we can calculate the

waiting time at the perception-based centralized server p

according to Eq. 1; since N members in a cluster haveto wait in sequence to be served by the cluster leader,the total waiting time of the perception-based centralizedarchitecture is calculated by:

Wqp ≈ N

(ρp

1 − ρp

)(C2

Ap + C2Bp

2

) (1

μp

)(39)

3.2.4 Result analysis

Figures 3 and 4 show the comparison among the totalwaiting time of the three architectures when the cluster sizesare k = 3, k = 5, and k = 7. The video streams are encodedwith spatial SVC and two configurations of enhancementvideo traffic are considered. In the centralized architecture,we assume that the MCU can support of up to Nmax partici-pants at the same time, with all participants sending both its

Fig. 3 Comparison of queuing waiting time among centralized,perception-based centralized, and perception-based distributedarchitectures at minimum traffic when rb = 1, re = 0.1, andne = 3

base and enhancement video layers. In the figures, there arein average three enhancement video layers transmitted fromall the participants (ne = 3). In the perception-based cen-tralized architecture and the perception-based distributedarchitecture, all peers send their base layers and somepeers send their enhancement video layers. Each leader cansupport at least k peers from its own cluster.

Regarding the waiting time performance comparisonbetween the centralized and distributed architectures,we can compare the waiting time performance of thecentralized and perception-based centralized architectures

Fig. 4 Comparison of queuing waiting time among centralized,perception-based centralized, and perception-based distributed archi-tectures at full traffic when rb = 1, re = 1, and ne = 3

Ann. Telecommun.

with the perception-based distributed architecture. The totalwaiting time of both the centralized and perception-basedcentralized architectures increases exponentially with theincreasing number of participants. Meanwhile, the totalwaiting time of the perception-based distributed architectureincreases at a much lower logarithmic speed. In Fig. 4, whenfull enhancement video traffic is sent to the perception-based centralized server, the perception-based centralizedarchitecture is identical to the centralized architecture. Inthis figure, we can clearly see the gain in the total waitingtime performance between the centralized and distributedarchitectures. The centralized architecture has the high-est total waiting time followed by the perception-basedcentralized architecture. The perception-based distributedarchitecture outperforms the other two in terms of the totalwaiting time.

The cluster size also plays a role in the waiting time.There is an interesting conclusion which we can withdrawafter analyzing the results in Figs. 3 and 4: when the num-ber of peers in a cluster (k) increases, the total waiting timein the distributed queue increases. Therefore, for a certainnumber of participants (N = 50), it is recommended to usea smaller cluster size to maintain a lower total waiting time.

When we make comparisons among the total waitingtime of the two figures (re = 0.1 and re = 1), the moreunnecessary traffic is reduced by applying our newly pro-posed perception-based function, the lower the total waitingtime is. The newly proposed perception-based function canactually limit the unnecessary traffic and reduce the totalwaiting time.From the results, we can conclude that the dis-tributed architecture gives a smaller queuing waiting timethan the two centralized architectures. Especially, when thenumber of participants increases, the distributed architec-ture outperforms the two centralized architectures in termsof the queuing waiting time. The proposed mathematicalmodels and expressions can be used for determining theoptimal value of the cluster size to minimize the total wait-ing time. As the waiting time depends on the total traffictransmitted on the overlay network, by reducing the unnec-essary traffic using our proposed human perception-basedfunction, we can greatly reduce the total waiting time of themultiparty video conferencing service.

3.3 Required service rate

In general, a required service rate is necessary to main-tain stability in a queue. According to the queuing theory,this required service rate can be calculated by the followingcondition:

ρ = λ

μ< 1 (40)

We will calculate and analyze these requirements in detailsfor each architecture. If an architecture can fulfill its

required service rate, it can maintain the stability of theservice queues.


The traffic intensity at a leader of layer l is ρl = λAl

μl. In

order for the queue at the leader to be in steady-state con-ditions, we must have ρl < 1 or μl > λAl . Assumingthat the perception-based leader l is designed to support athroughput of λAl and the system can support of up to Nmax

participants, we have:⎧⎨⎩

μl = Ml = (k · Nmax + 1) · γphl

ρl = λAl

Ml

(41)


The traffic intensity at the MCU is ρm = λAS

μm(assuming that

only one server is used as the MCU). In order for the queueat the MCU to be in steady-state conditions, we must haveρm < 1 or μm > λAS . Assuming that we have designed aMCU to support of up to Nmax participants and a throughputof λAS , then the maximum throughput to be managed at theMCU is:⎧⎨⎩

μm = Mm = (N2

max − Nmax + 1) · γ a

hS

ρm = λAS

Mm

(42)


The perception-based centralized server is designed to sup-port a maximum of Nmax participants. A similar require-ment of the service rate is applied to maintain a steady state:

⎧⎨⎩

μp = Mp = (N2

max − Nmax + 1) · γ

php

ρp = λAp

Mp

(43)

Mp is the required service rate of the perception-basedcentralized architecture.

3.3.4 Result analysis

Figures 5 and 6 show the required service rates among thecentralized, perception-based centralized, and distributedarchitectures for three different traffic configurations(re = 0.1, re = 1). They are obtained from Eqs. 42,41, and 43. These are the required service rates at theleader on layer l of the perception-based distributed archi-tecture, the perception-based centralized server p, and atthe MCU, respectively, in order to maintain steady states oftheir queues. If a steady state is maintained in a queue, thewaiting time can be high but never be infinitive meaning

Ann. Telecommun.

Fig. 5 Comparison of required service rates among centralized,perception-based centralized, and perception-based distributed archi-tectures at minimum traffic when rb = 1, re = 0.1, andne = 3

that all of the traffic will be definitely processed. Other-wise, if the steady state is not maintained, the queues willenter a blocked state in which no more traffic can be pro-cessed and congestion happens. We are going to analyzethe results to show the effect of three main aspects onthe required service rates: (1) comparison between the dis-tributed and the centralized architectures, (2) impacts ofthe cluster size on the performance, and (3) the newly pro-posed perception-based function’s performance. It is clearfrom the three figures that the required service rate at theperception-based distributed architecture is much smallerthan it is at the centralized and perception-based centralizedarchitectures. In terms of the required service rate, the dis-tributed architecture requires a much lower service rate than

Fig. 6 Comparison of required service rates among centralized,perception-based centralized, and perception-based distributed archi-tectures at full traffic when rb = 1, re = 1, ne = 3

the centralized architectures. With the same perception-based distributed architecture, and with the total number of50 participants, the required service rate increases in propor-tional to the cluster size. Therefore, for this configuration ofvideo conference, it is recommended to use a smaller clustersize. All three figures show that, when the newly proposedperception-based function is applied, the perception-basedcentralized architecture can reduce the required service ratein comparison with the centralized architecture. The moreunnecessary traffic is reduced by applying our newly pro-posed perception-based function, the lower the requiredservice rate is. There is an obvious relationship betweenthe required service rate and the price of the solution builtfrom each architecture. This relationship can be exponential.We can conclude that the distributed architecture and thenewly proposed perception-based function, when applied,can reduce the required service rate and therefore the costof the video conferencing service.

4 Conclusion

In this research, a new enriched distributed videoconferencing architecture considering the limitation ofhuman’s perception has been proposed. Mathematical anal-ysis and models have been built and compared for thethree architectures using queuing theory in terms of totalwaiting time and required service rates. It is worth notic-ing that all the enriched features of the proposed videoconferencing architecture have been modeled in detailsand included in the mathematical analysis and expres-sions. The theoretical models and mathematical expressionsallow us to evaluate the performance in all two differentcriteria for all three different architectures in real time,with an arbitrary number of participants and with veryheterogeneous contexts of peers. Our mathematical mod-els and expressions can be used to determine the optimalcluster size for a given number of participants of the dis-tributed video conference. Numerical simulations obtainedfrom the theoretical analysis models and the off-line sta-tistical data have been done in the context of a multi-party multilayer video conferencing service. The resultsshow that the newly proposed perception-based distributedarchitecture outperforms over the performance criteriaof both the centralized and perception-based centralizedarchitectures.

References

1. Van der Auwera G, David PT, Reisslein M (2008) Traffic andquality characterization of single-layer video streams encodedwith the H. 264/MPEG-4 advanced video coding standard and

Ann. Telecommun.

scalable video coding extension. IEEE Trans Broadcast 54(3 part2):698–718

2. Van der Auwera G, David PT, Reisslein M, KaramLJ (2008) Traffic and quality characterization of theH.264/AVC scalable video coding extension. Adv MultiMedia2008:1–27. doi:10.1155/2008/164027. http://dx.doi.org/http://dx.doi.org.gate6.inist.fr/10.1155/2008/164027

3. Baset SA, Schulzrinne H (2004) An analysis of the skype peer-to-peer internet telephony protocol. Arxiv preprint cs/0412017

4. De Cicco L, Mascolo S, Palmisano V (2008) Skype video respon-siveness to bandwidth variations.. In: Proceedings of the 18thinternational workshop on network and operating systems supportfor digital audio and video. ACM, New York, pp 81–86

5. Deering SE (1991) Multicast routing in a datagram internetwork.Ph.D. thesis, Stanford University

6. Akkus IE, Ozkasap O, Reha Civanlar M (2010) Peer-to-peer mul-tipoint video conferencing with layered video. J Netw ComputAppl 34:137–150

7. Gross D (2008) Fundamentals of queueing theory. Wiley, India8. Jeong H, Abuan J, Normile J, Salsbury R, Tung BS (2011) Het-

erogeneous video conferencing. US Patent 8,243,905, 14 Aug2012

9. Le TA, Nguyen H (2010) Centralized and distributed architecturesof scalable video conferencing services. The second internationalconference on ubiquitous and future networks (ICUFN 2010). JejuIsland, Korea, pp 394–399

10. Le TA, Nguyen H (2011) Perception-based application layermulticast algorithm for scalable video conferencing. IEEEGLOBECOM 2011—Communication software, services, andmultimedia applications symposium (GC’11 - CSWS). Houston,TX, USA

11. Le TA, Nguyen H, Zhang H (2010) Multi-variable cost functionfor application layer multicast routing. IEEE Globecom 2010—communications software, services and multimedia applicationssymposium (GC10—CSSMA), Miami, FL, USA

12. Lu Y, Zhao Y, Kuipers F, Van Mieghem P (2010) Measure-ment study of multi-party video conferencing. NETWORKING2010:96–108

13. Ponec M, Sengupta S, Chen M, Li J, Chou PA (2009) Multi-rate peer-to-peer video conferencing: a distributed approach usingscalable coding.. In: Proceedings of the 2009 IEEE internationalconference on multimedia and expo. IEEE Press, Piscataway,pp 1406–1413

14. Silver MS (2006) Browser-based applications: popular butflawed? Inform Syst E Bus Manag 4(4):361–393

15. Spiers R, Ventura N (2009) An evaluation of architectures for IMSbased video conferencing. University of Cape Town, Rondebosch,South Africa

16. Le TA, Nguyen H, Zhang H (2010) EvalSVC—an evaluation plat-form for scalable video coding transmission. In: 14th internationalsymposium on consumer electronics (ISCE 2010). Braunschweig,Germany, pp 85–90

http://dx.doi.org/10.1155/2008/164027

http://dx.doi.org/http://dx.doi.org.gate6.inist.fr/10.1155/20 08/164027

http://dx.doi.org/http://dx.doi.org.gate6.inist.fr/10.1155/20 08/164027

http://arxiv.org/abs/cs/0412017

human perception-based distributed architecture for scalable video conferencing services:...

Documents