a context-aware framework for reducing bandwidth usage of ...qyang/paper/lbvc_tmm_2016.pdf · a...

13
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. XX, XXXX 20XX 1 A Context-aware Framework for Reducing Bandwidth Usage of Mobile Video Chats Xin Qi, Qing Yang, David T. Nguyen, Ge Peng Student Member, IEEE, Gang Zhou, Senior Member, IEEE, Bo Dai, Daqing Zhang, Senior Member, IEEE, and Yantao Li, Member, IEEE Abstract—Mobile video chat apps offer users an ap- proachable way to communicate with others. As the high- speed 4G networks being deployed worldwide, the number of mobile video chat app users increases. However, video chatting on mobile devices brings users financial concerns, since streaming video demands high bandwidth and can use up a large amount of data in dozens of minutes. Lowering the bandwidth usage of mobile video chats is challenging since video quality may be compromised. In this paper, we attempt to tame this challenge. Techni- cally, we propose a context-aware frame rate adaption framework, named LBVC (Low-bandwidth Video Chat). It follows a sender-receiver cooperative principle that smartly handles the trade-off between lowering bandwidth usage and maintaining video quality. We implement LBVC by modifying an open-source app - Linphone and evaluate it with both objective experiments and subjective studies. Index Terms—Mobile Video Chats, Context Awareness, Frame Rate Adaption, Frame Interpolation, Bandwidth Reduction. I. I NTRODUCTION M OBILE video chat apps provide users with ap- proachable ways to communicate with others. It has been presented that mobile video chat apps are Copyright (c) 2013 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs- [email protected]. X. Qi is with NSX Group, VMware Inc., Palo Alto, CA 94304, USA (e-mail: [email protected]) Q. Yang, D. Nguyen, G. Peng, and G. Zhou are with the De- partment of Computer Science, The College of William and Mary, Williamsburg, VA 23187, USA (e-mail: {xqi, qyang, dnguyen, gpeng, gzhou}@cs.wm.edu) B. Dai is with Computational Science and Engineering, College of Computing, Georgia Institute of Technology, USA (e-mail: bo- [email protected]) D. Zhang is with the Institute of Software, Peking University, China (e-mail: [email protected]) Y. Li is with the College of Computer and Information Sci- ences, Southwest University, Chongqing 400715, China (e-mail: [email protected]) Manuscript received XXXX XX, 20XX; revised XXXX XX, 20XX. among the most popular smartphone apps [10]. Mean- while, the number of mobile video chat users contin- uously grows as mobile devices, such as smartphones, become prevalent [6] [7]. As 4G high-speed network services are deployed worldwide, network speed is no longer a major bottleneck of performing mobile video chats. Companies, such as Snapchat, that have mobile video chat apps as products draw large attractions from investigators because of their large user populations and potential commercial values. However, “mobile video chat apps consume much more bandwidth than other apps” [11]. Take Skype as an example, the upload and download bandwidth require- ments for its low quality video chats are both 300Kbps [8]. The bandwidth usage at this level can quickly use up any existing limited data plan within couple of hours. As alleviations, video encoding techniques [12] [13] [14] [15] have been proposed for reducing video streaming bandwidth. Other works [9] [16] propose solutions that further lower the bandwidth usage of video downloading based on existing encoding techniques. Our previous work [50] indicates that bandwidth usage of mobile video chats can be further reduced by dynamically adapt- ing video frame rate based on smartphone vibration. In this paper, we extend our previous design by proposing a general context-aware frame rate adaption framework, named LBVC (Low-bandwidth Video Chat). It introduces guarding context as a replaceable module that could be set as whatever context developers feel proper for video frame rate adaption. With guarding context configured, it lets the video sender adapt frame rate with respect to user-selected guarding context and frame rate to lower bandwidth usage and video receiver interpolate ‘missing’ frames to maintain video quality. LBVC also provides a user-friendly interface to help users select a guarding context to control frame rate and set an appropriate frame rate to save bandwidth. To be informative, the interface provides the estimated bandwidth usage and video quality degradation of each candidate frame rate compared to the default frame rate. To save bandwidth and alleviate video quality degrada- tion, LBVC introduces a context-aware sender-receiver

Upload: others

Post on 19-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Context-aware Framework for Reducing Bandwidth Usage of ...qyang/paper/LBVC_TMM_2016.pdf · A Context-aware Framework for Reducing Bandwidth Usage of Mobile Video Chats Xin Qi,

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. XX, XXXX 20XX 1

A Context-aware Framework for ReducingBandwidth Usage of Mobile Video Chats

Xin Qi, Qing Yang, David T. Nguyen, Ge Peng Student Member, IEEE, Gang Zhou, SeniorMember, IEEE, Bo Dai, Daqing Zhang, Senior Member, IEEE, and Yantao Li, Member, IEEE

Abstract—Mobile video chat apps offer users an ap-proachable way to communicate with others. As the high-speed 4G networks being deployed worldwide, the numberof mobile video chat app users increases. However, videochatting on mobile devices brings users financial concerns,since streaming video demands high bandwidth and canuse up a large amount of data in dozens of minutes.Lowering the bandwidth usage of mobile video chats ischallenging since video quality may be compromised. Inthis paper, we attempt to tame this challenge. Techni-cally, we propose a context-aware frame rate adaptionframework, named LBVC (Low-bandwidth Video Chat). Itfollows a sender-receiver cooperative principle that smartlyhandles the trade-off between lowering bandwidth usageand maintaining video quality. We implement LBVC bymodifying an open-source app - Linphone and evaluate itwith both objective experiments and subjective studies.

Index Terms—Mobile Video Chats, Context Awareness,Frame Rate Adaption, Frame Interpolation, BandwidthReduction.

I. INTRODUCTION

MOBILE video chat apps provide users with ap-proachable ways to communicate with others.

It has been presented that mobile video chat apps are

Copyright (c) 2013 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposesmust be obtained from the IEEE by sending a request to [email protected].

X. Qi is with NSX Group, VMware Inc., Palo Alto, CA 94304,USA (e-mail: [email protected])

Q. Yang, D. Nguyen, G. Peng, and G. Zhou are with the De-partment of Computer Science, The College of William and Mary,Williamsburg, VA 23187, USA (e-mail: {xqi, qyang, dnguyen, gpeng,gzhou}@cs.wm.edu)

B. Dai is with Computational Science and Engineering, Collegeof Computing, Georgia Institute of Technology, USA (e-mail: [email protected])

D. Zhang is with the Institute of Software, Peking University, China(e-mail: [email protected])

Y. Li is with the College of Computer and Information Sci-ences, Southwest University, Chongqing 400715, China (e-mail:[email protected])

Manuscript received XXXX XX, 20XX; revised XXXX XX,20XX.

among the most popular smartphone apps [10]. Mean-while, the number of mobile video chat users contin-uously grows as mobile devices, such as smartphones,become prevalent [6] [7]. As 4G high-speed networkservices are deployed worldwide, network speed is nolonger a major bottleneck of performing mobile videochats. Companies, such as Snapchat, that have mobilevideo chat apps as products draw large attractions frominvestigators because of their large user populations andpotential commercial values.

However, “mobile video chat apps consume muchmore bandwidth than other apps” [11]. Take Skype asan example, the upload and download bandwidth require-ments for its low quality video chats are both 300Kbps[8]. The bandwidth usage at this level can quickly useup any existing limited data plan within couple of hours.As alleviations, video encoding techniques [12] [13] [14][15] have been proposed for reducing video streamingbandwidth. Other works [9] [16] propose solutions thatfurther lower the bandwidth usage of video downloadingbased on existing encoding techniques. Our previouswork [50] indicates that bandwidth usage of mobilevideo chats can be further reduced by dynamically adapt-ing video frame rate based on smartphone vibration.

In this paper, we extend our previous design byproposing a general context-aware frame rate adaptionframework, named LBVC (Low-bandwidth Video Chat).It introduces guarding context as a replaceable modulethat could be set as whatever context developers feelproper for video frame rate adaption. With guardingcontext configured, it lets the video sender adapt framerate with respect to user-selected guarding context andframe rate to lower bandwidth usage and video receiverinterpolate ‘missing’ frames to maintain video quality.LBVC also provides a user-friendly interface to helpusers select a guarding context to control frame rateand set an appropriate frame rate to save bandwidth.To be informative, the interface provides the estimatedbandwidth usage and video quality degradation of eachcandidate frame rate compared to the default frame rate.

To save bandwidth and alleviate video quality degrada-tion, LBVC introduces a context-aware sender-receiver

Page 2: A Context-aware Framework for Reducing Bandwidth Usage of ...qyang/paper/LBVC_TMM_2016.pdf · A Context-aware Framework for Reducing Bandwidth Usage of Mobile Video Chats Xin Qi,

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. XX, XXXX 20XX 2

cooperative approach. The sender utilizes the loweruser input frame rate to encode video chats and savebandwidth unless the user selected guarding contextis detected. Once the guarding context is detected,it switches to the higher default frame rate. In thisway, LBVC aggressively reduces bandwidth usage atvideo sender and does not provide the accurate qualityguarantee of outgoing videos. The receiver makes upthe missing information in the received video chats byinterpolating intermediate frames when the instant framerate is smaller than the default one. According to existingliterature [23], frame interpolation technique adopted byreceiver may produce strong artifacts when the scenechange between consecutive frames is large. However,the frame rate adaption at the sender is designed tokeep the scene change between consecutive frames smallin order to prevent strong artifacts from the frameinterpolation.

The main contributions of this paper are summarizedas follows:

• We introduce guarding context for frame rate adap-tion in mobile video chat apps. Based on guardingcontext, we propose LBVC, a general context-awareframe rate adaption framework, to reduce bandwidthusage of mobile video chats.

• We implement LBVC by modifying an existingopen-source mobile video chat app, Linphone. Theexperiment results demonstrate that LBVC savesbandwidth by 35% compared to the default solutionwithout obviously compromising video quality.

The rest of the paper is organized as follows. We firstpresent the design of the LBVC framework. Next, wepresent its implementation and evaluation. We finallyelaborate existing works and conclude the paper.

II. LBVC DESIGN

Based on the measurement results in [50], loweringframe rate efficiently reduces bandwidth usage below thetypical frame rate range (12∼20 fps) adopted by existingmobile video chat apps. Video compression techniquestake limited effects on compressing video chats in thisframe rate range. The LBVC (Low-bandwidth VideoChat) framework is designed to exploit the opportunityfor reducing bandwidth usage by adapting frame ratewithin this low frame rate range as well as not obviouslycompromising video quality.

A. Guarding Context

In LBVC, the video quality at the receiver is deter-mined by how well the frame interpolation algorithmworks. Usually, the algorithm works well when the

scene changes between consecutive frames are small. Aguarding context in LBVC should be carefully designedby app developers to indicate whether the scene changesbetween the consecutive frames are large or not. Thescene changes are large only when the guarding contextis detected.

In LBVC, we give app developers the freedom todesign guarding contexts. However, we particularly con-sider two guarding contexts in this paper, quick objectmovement and severe device vibration. A guarding con-text may work well in one usage scenario but may not inanother. In the rest of this subsection, we will introducehow LBVC detects each guarding context and explainthe reasons of why we define them through stating theirworking and non-working usage scenarios.

Quick Object Movement. LBVC detects the quickobject movement by computing the metric based on theframe difference within a time period (See Equation 1).In the equation, || ∗ ||F denotes the Frobenius Norm of amatrix and start and end denote the starting and endingtime of each measuring period. Each frame I is of sizeM × N pixels. An object movement is considered asquick if the metric value of a time period is beyonda threshold. Otherwise, it is considered to be not quick.This metric measures the overall effects of scene changesduring a time period and approximately reflects whetherobjects within a scene move quickly or not. Althoughmore complicated object detection algorithms exist, thissimple approximate algorithm is chosen since it is fastand able to meet the real-time requirement of videostreaming.

A =end∑

t=start+1

(||It − It−1||F√

M ∗N) (1)

The reason we define this guarding context is thatit works well when the object in a video frequentlymoves but the smartphone is relative stationary. However,it requires multiple frames to detect object movement.Waiting for multiple frames adds response delay toscene changes. For example, LBVC needs to wait 260msfor even one frame at 4 fps. Therefore, this guardingcontext does not work well when a user hold the deviceon his/her unstable hands, since the frequently scenechanges are not responded in a timely manner at thesender and the frequent added delays impair perceivedvideo quality at the receiver.

Severe Device Vibration. Inertial sensor readingsfrom both accelerometer and gyroscope on smartphonesare leveraged by LBVC to detects severe device vibra-tions. Technically, we define two metrics based on thesensor reading changes [24] to measure device vibrations

Page 3: A Context-aware Framework for Reducing Bandwidth Usage of ...qyang/paper/LBVC_TMM_2016.pdf · A Context-aware Framework for Reducing Bandwidth Usage of Mobile Video Chats Xin Qi,

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. XX, XXXX 20XX 3

Fig. 1. LBVC Architecture.

(See Equation 2 and 3). In the equations, L1 normsof the three-dimension changes in both accelerationand (∂Ax(t), ∂Ay(t) and ∂Az(t)) and angular velocity(∂Gx(t), ∂Gy(t) and ∂Gz(t)) are calculated. In theequation, start and end denote the starting and endingtime of each measuring period.

mA =end∑

t=start

[|∂Ax(t)|+ |∂Ay(t)|+ |∂Az(t)|] (2)

mG =end∑

t=start

[|∂Gx(t)|+ |∂Gy(t)|+ |∂Gz(t)|] (3)

In LBVC, we set the sensors sampling rate as 50Hz and the measuring period as 60 milliseconds. Thismeasuring period is small enough to timely capturesmartphone vibrations. A device vibration is consideredsevere when the value of either mA or mG is beyond athreshold. Otherwise, it is considered not severe.

We adopt L1 norm because it captures the smart-phone vibrations signaled by sensor reading changesin any measured dimension. We have also investigatedthe L2 norm of the inertial sensor reading changes asthe guarding context. It achieves similar experimentalresults as L1 norm. We finally choose L1 norm since itscomputation is simpler than L2 norm and hence ableto make a quicker decision. We also consider morecomplicated models such as Support Vector Machine(SVM) or other classification models. However, we donot choose them for the following reasons, although theyhave the potential of achieving higher accuracy than L1

norm. First, their higher accuracy depends on the trainingdata, which users need to manually collect and label invarious scenarios. Second, we already achieve satisfyingresults with L1 norm, which is simple and general to allusers. The performance gain of complicated models is

limited. Third, complicated model needs more compu-tation, which results in more execution time. Therefore,we choose L1 norm, which is computationally efficient,simple and user-friendly.

The reason we define this guarding context is thatit works well when a user hold the device on his/herunstable hands, since it responses to scenes changesquickly and the delay is constantly 60 ms. However, itdoes not work well when the object in a video frequentlymoves but the device does not, since the scene changes inthis case are not caused by device vibrations and henceignored.

Finally, frame interpolation quality has different sensi-tivities to smartphone vibrations at different frame rates.For instance, the interpolation algorithm may produceintermediate frames with limited artifacts under typicaldevice vibrations when the input frame rate is set as largeas 8 fps. However, it may generate intermediate frameswith strong artifacts under typical device vibrations whenthe input frame rate is set as small as 1 fps. Therefore,app developers should carefully set the threshold valuesfor each guarding context at different frame rates. Wesuggest the threshold values are determined experimen-tally with the objective of preventing obvious artifacts.

B. LBVC Architecture

Figure 1 depicts the architecture of LBVC with thecommon video chat app components in red and newlyintroduced components described in white rectangles andtriangles. During a video chat, the communication is bi-direction and a device is both a sender and receiver.Before going to design details, we claim that the LBVSdesign is generic for typical video chat apps on mobiledevices, and is independent of lower layer componentssuch as codecs and network protocols.

Page 4: A Context-aware Framework for Reducing Bandwidth Usage of ...qyang/paper/LBVC_TMM_2016.pdf · A Context-aware Framework for Reducing Bandwidth Usage of Mobile Video Chats Xin Qi,

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. XX, XXXX 20XX 4

At Sender Side. Users can select a guarding con-text and set the frame rate for a video chat througha user-friendly interface. The interface describes whateach guarding context is. To be comprehensive, it alsoprovides the information for the scenarios under whichit works well and the ones under which it does not.A user selected frame rate is usually lower than thedefault frame rate. A lower candidate frame rate reducesbandwidth usage at the expense of degrading the videoquality on conversation partner side. Users not only needto know how much bandwidth they can save by selectinga frame rate, but also they should be aware of howmuch damage the lower frame rate will bring to videoquality. Therefore, we design the interface to provide theestimated bandwidth usage and associated video qualitydegradation of each candidate frame rate. The baselineis the bandwidth usage and video quality at the defaultframe rate.

A Frame Rate Controller controls the frame rate ofthe media recorder switching between the higher defaultframe rate and lower user input frame rate with respectto a user-selected guarding context. It utilizes the userinput frame rate as the target frame rate and continuouslydetects whether the selected guarding context happens ornot in run-time. Different guarding context detections re-quire different resources. For example, detecting guard-ing context of severe device vibration requires collectinginertial sensor readings, while detecting guarding contextof quick object movement requires logging recent framedifference. As we introduce in section II-A, the scenechanges are determined to be larger when the guardingcontext is detected. The Controller adopts the targetframe rate to save bandwidth and only adopts the defaultframe rate, which is usually higher than the user inputframe rate, when the guarding context is detected.

At Receiver Side. Video frames and audio samplesare decoded from the data packets received over theSIP/RTP network. To rescue video quality, we designthe receiver to interpolate intermediate frames betweenthe received frames whenever the received frame rate islower than default frame rate. The frame rate after frameinterpolation is not smaller than the default frame rate.

The frame interpolation algorithm at receiver sideplays a key role in LBVC. At the beginning, we in-vestigate two optical-flow [23] [25] based algorithms.However, their complicated assumptions and algorithmdesigns result in long execution time, which is not practi-cal for real-time video chats. For example, the algorithmin [25] takes 825 seconds on average to interpolate oneframe on a laptop, whose hardware configuration are In-tel Core i7-2760QM CPU @ 2.4GHz ×8 and a memoryof 12 Gigabytes. The input frame resolution is similar

to that in smartphone video chats. This configuration ismore advanced than state-of-the-art smartphones such asNexus 5, but optical-flow based algorithms still take sucha long execution time on it. More execution time resultsof these algorithms are listed in [26].

After the aforementioned investigation, we turn tothe cross dissolve algorithm [27]. The cross dissolvealgorithm is simple and shown in Equation 4. In theequation, there are two input frames, I1 and I2. Theinterpolated frame is the weighted average of the twoinput frames, where the weights depend on the time pointwhich the interpolated frame should locate at. From theequation, the algorithm only involves the operations ofadding two frames and multiplying frames and constants.

Iinterpolated = (1− t) ∗ I1 + t ∗ I2 (4)

Theoretically, we investigate when this algorithm isable to generate smooth object motion among the inputframes and interpolated frames. Particularly, we focuson analyzing when the algorithm produces smooth objectedge motion, which is a dominate factor of smooth objectmotion. According to existing literature [27], real imageedge tends to be smooth during image process as a resultof the blurring effects [36]. Mathematically, a blurringedge could be represented by a sine curve and the crossdissolve algorithm is converted into Equation 5 [27].

ζ sin (αx+ κ) = (1− t)η sin (αx) + t sin (αx+ θ) (5)

On the right hand side of this equation, there aretwo edges represented by two sine curves. The leftedge (η sin (αx)) is the first input edge and the rightedge (sin (αx+ θ)) is the second input edge. η is theamplitude scale and θ is the spatial translation (or phaseshift). The interpolated edge is ζ sin (αx+ κ), whichrepresents a sequence of sine curves. The parameterdetails of the interpolated edge are listed in Equation6 and 7 [27].

κ = arctant sin θ

(1− t)η + t cos θ(6)

ζ2 = η2(1− t)2 + t2 + 2(1− t)ηt cos θ (7)

The parameter κ controls the motion speed of interpo-lated edge and it is an approximately increasing functionof θ [27], which is the spatial transition between the twoinput edges. Thus, small κ (or small θ) results in smoothtransition between two input edges. The parameter ζ isthe image contrast and it is an decreasing function of θ.Small ζ (or larger θ) results in unclear edge, which istermed as ghosting effect [46]. To sum up, the smallerthe spatial transition θ is, the smoother and clearer the

Page 5: A Context-aware Framework for Reducing Bandwidth Usage of ...qyang/paper/LBVC_TMM_2016.pdf · A Context-aware Framework for Reducing Bandwidth Usage of Mobile Video Chats Xin Qi,

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. XX, XXXX 20XX 5

Input Frame 2

Original Frames

Interpolated Frames by Cross Dissolve

T=1.0T=0.8T=0.6T=0.2 T=0.4

TITIIT

*)1(* 211I 2I

T=0

Input Frame 1

Fig. 2. Cross Dissolve Example. The time interval between the two input frames is 300 ms.

interpolated frames are. Therefore, we design the FrameRate Controller on the sender to prevent large scenechanges between consecutive frames. In Figure 2, weprovide an intuitive example of the input and output ofthe cross dissolve algorithm.

Another thing to be considered is the delay introducedby frame interpolation. Usually a frame interpolationtechnique requires two frames as input. When one frameis received, it cannot start interpolation until the nextframe arrives. At the media player, in order to ensurea constant video playing speed, the first frame cannotbe played until the arrival of the next frame and theaccomplishment of the frame interpolation. Thus, we adda delay to the media player at the beginning of each videochat to accommodate the delay introduced by frameinterpolation. For synchronization purpose, we add thesame delay to audio playing. The length of the commondelay is determined based on the sender-side input framerate. For example, the receiver obtains the selectedframe rate (4ps) from the sender through SIP negotiationparameters [22] at the beginning of each video chat andconfigure the delay (250ms) with tolerant considerations(10ms) about dynamic networking. Overall, a 260ms(250ms+10ms) delay is added by receiver when 4 fpsis selected by users.

III. LBVC IMPLEMENTATION AND EVALUATION

A. Implementation and Configuration

We have investigated several open-source mobilevideo chat apps, including Linphone, SipDroid and CSip-

Simple. We choose Linphone [17] as the base of ourimplementation because its software structure allows usto focus on the implementation of the video processingfunctionalities. We implement the sender side interfacewith Android SDK 4.2 in Java and integrate it to theAndroid version of Linphone. The modified Linphoneis installed on a Galaxy Nexus and Nexus 4 for experi-ments.

Linphone adopts the Mediastreamer2 library for videostreaming. In the native Mediastreamer2 library, thevideo streaming procedure is implemented as a sequenceof filter graphs. Each filter in a graph is in charge of onetask such as video capturing, encoding, or displaying.At the beginning of each video stream, a thread calledTicker is scheduled to wake up every 10ms and executesthe filter graphs. We add the Frame Rate Controller asa new filter at the beginning of the Ticker thread andthe Frame Interpolation Component as another new filterbetween the video decoding and displaying filters.

The Frame Rate Controller filter collects the nec-essary resources for guarding context detection. Forthe guarding context of severe device vibration, itcollects accelerometer and gyroscope readings throughthe Android NDK sensor library API. It determineswhether the calculated vibration metrics are larger thanthe specific thresholds or not every 60ms,. The severedevice vibration is detected when the metrics are largerthan the thresholds. For the guarding context of quickobject movement, it logs the most recent frame andcomputes its difference from the next coming frame.

Page 6: A Context-aware Framework for Reducing Bandwidth Usage of ...qyang/paper/LBVC_TMM_2016.pdf · A Context-aware Framework for Reducing Bandwidth Usage of Mobile Video Chats Xin Qi,

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. XX, XXXX 20XX 6

Frame Rate (fps)1 2 4 6 8

To

tal B

and

wid

th (

MB

it/s

)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8Default 12 fps with LBVC DisabledSelected fps with LBVC EnabledSelected fps with LBVC Disabled

Fig. 3. Bandwidth Usage vs. Frame Rate on Galaxy Nexus. Fig. 4. Power Consumption vs. Frame Rate in a CellularNetwork on Galaxy Nexus.

Within each frame interval, it determines whether theframe difference is beyond a threshold or not. The quickobject movement is detected when the frame differenceis beyond the threshold. For both guarding contexts, ifthey are detected, the default frame rate (12 fps) is setfor the video capturing and encoding filters; otherwise,the user-specified lower frame rate is set. In this section,we evaluate LBVC with severe device vibration selectedas the guarding context through a series of experiments.

Frame Rate (fps) 1 2 4 6 8mA THRs (m/s2) 2.0 3.0 4.0 4.5 6.0mG THRs (rad/s) 2.5 3.0 3.2 3.5 4.0Common Delay (ms) 1100 620 260 240 200

TABLE ISELECTED FRAME RATES AND THEIR ASSOCIATED PARAMETERS

(THRESHOLDS AND DELAY).

Table I summarizes 5 typical frame rates that are lowerthan the default 12 fps. For each selected frame rate, thethresholds for the Frame Rate Controller to make framerate adaption decisions are listed. Also, the commondelays added for each frame rate to synchronize theaudio and video are also added listed. We suggest appdevelopers to determine these values empirically withconsiderations about bandwidth saving, video qualitydegradation and networking dynamics.

In experiment section, we compare various resultswhen LBVC is enabled and disabled at each selectedframe rate. When LBVC is disabled, Linphone usesH.264 to encode video chats by default. H.264 is a start-of-the-art encoding method that adopts motion compen-sation to compress video size [13]. From the differencesbetween the results when LBVC is enabled and thosewhen LBVC is disabled, we are able to quantitatively

infer how much cost LBVC pays for maintaining videoquality.

B. Performance Under Typical Scenarios

In this subsection, we evaluate the performance impactof LBVC at the selected frame rates listed in Table I.Technically, we recruit the same 10 pairs of subjectsand each pair performs 6-minute video chats at allselected frame rates under typical video chat scenariossuch as sitting and standing for bandwidth and powermeasurements.

We perform bandwidth usage and power consump-tion measurements over the T-Mobile HSPA+ cellularnetwork, but only perform power consumption measure-ments over a public college WiFi network. WiFi networkis not considered for bandwidth usage measurementsince there is no bandwidth cost for video streamingunder WiFi. During all video chats, we disable allunnecessary apps, services, and radios on smartphones.

1) Bandwidth Usage: To measure bandwidth usage,we adopt Shark for Root [18] to record network trafficand WireShark [19] to analyze bandwidth usage ona server. We separate the video chats for bandwidthmeasurements from those for power measurements, sincethe network traffic capturing software also consumespower.

Figure 3 illustrates the measured total bandwidth us-age of making video chats on Galaxy Nexus. The greenbars illustrate the average total bandwidth usage at theselected frame rates with LBVC disabled, while bluebars illustrate that with LBVC enabled. The red linedepicts the average total bandwidth usage at the default12 fps with LBVC disabled.

Page 7: A Context-aware Framework for Reducing Bandwidth Usage of ...qyang/paper/LBVC_TMM_2016.pdf · A Context-aware Framework for Reducing Bandwidth Usage of Mobile Video Chats Xin Qi,

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. XX, XXXX 20XX 7

Fig. 5. Power Consumption vs. Frame Rate in a WiFi Networkon Galaxy Nexus.

1 2 4 6 80

10

20

30

40

Frame Rate (fps)

TV

M S

co

res (

dB

)

Default 12 fps with LBVC Disabled

Selected fps with LBVC Enabled

Selected fps with LBVC Disabled

Fig. 6. TVM Scores vs. Frame Rate. Error Bars indicate theStandard Deviations.

In the figure, we observe that the bandwidth usagewith LBVC enabled is higher than that with LBVCdisabled. The extra bandwidth usage is actually the costLBVC pays to maintain video quality. It is becausethat LBVC dynamically adapts frame rate between thelower user input frame rate and higher default framerate. Compared to the bandwidth usage at the defaultframe rate, LBVC still significantly saves bandwidth.For example, the average bandwidth usage of videochats is reduced by 35% at 4 fps under typical videochat scenarios. We summarize the average bandwidthsavings in Table II. The average bandwidth savings forthe two smartphones are similar because the communica-tion is symmetric. Additionally, lower bandwidth usageat clients reduces the traffic flow competition at theRTP server, and potentially results in better networkingconditions.

We also observe that the average total bandwidthusage with LBVC enabled at 1 fps is even higher thanthat at 2 fps. This is due to the small thresholds used forframe rate adaption at 1 fps. Small thresholds allow sometypical device vibrations to trigger switches from lowerinput frame rate to higher default frame rate, resulting inlarger bandwidth usage. At 2 fps, the larger thresholdsresult in less switches to the default frame rate, and hencethe bandwidth usage with LBVC enabled is smaller.

Moreover, when LBVC is enabled, we observe thatthe bandwidth usage variances at 1 and 2 fps are largerthan those at higher frame rates. It is because that thethresholds at 1 and 2 fps are smaller than those at higherframe rates. Smaller thresholds allow device vibrationsto trigger more switches between the input frame rateand the default frame rate. The unstable runtime framerate results in large bandwidth usage variances. However,

as the frame rate increases, increased thresholds result insmaller bandwidth usage variances.

Frame Rate (fps) 1 2 4 6 8Galaxy Nexus (%) 39.1 43.2 35.2 22.3 13.0Nexus 4 (%) 38.5 41.9 35.4 21.2 13.3

TABLE IIAVERAGE BANDWIDTH SAVINGS VS. FRAME RATES ON GALAXYNEXUS AND NEXUS 4 (COMPARED TO THE DEFAULT 12 FPS WITH

LBVC DISABLED).

2) Power Consumption: In the experiments, we uti-lize the Monsoon Power Monitor [20] to measure the av-erage power consumption of performing each video chaton Galaxy Nexus. However, the power output interfaceof the monitor cannot be connected to the battery pinsof Nexus 4. Due to this hardware constraint, we use thesoftware PowerTutor [28] to measure the average powerconsumption of performing each video chat on Nexus4. In general, the measured power consumption includesthe power for sampling, processing, encoding/decoding,transmitting/receiving and playing the video and audio.When LBVC is enabled, it also includes the power ofthe sensors and frame interpolation.

Figure 4 and 5 summarize the measured average powerconsumption of all the video chats over the cellularand WiFi networks on Galaxy Nexus. The green barsillustrate the average power consumption at the differentframe rates with LBVC enabled, while the blue barsdepict that with LBVC disabled. In the figures, the powerconsumption at 12 fps is plotted as the base line, whichis to be compared with the power consumption at otherselected frame rates.

From the figures, as we expect, the power consumptionincreases as frame rate increases. At each frame rate, the

Page 8: A Context-aware Framework for Reducing Bandwidth Usage of ...qyang/paper/LBVC_TMM_2016.pdf · A Context-aware Framework for Reducing Bandwidth Usage of Mobile Video Chats Xin Qi,

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. XX, XXXX 20XX 8

1 2 4 6 80

20

40

60

80

100

Frame Rate (fps)

Su

bje

cti

ve

Sc

ore

s

Default 12 fps with LBVC Disabled

Selected fps with LBVC Enabled

Selected fps with LBVC Disabled

Fig. 7. Subjective Scores vs. Frame Rate. Error Bars indicatethe Standard Deviations.

Frame Rate (fps)

Pa

ir I

D

1 2 4 6 8

124686789

101112131415161718192021 1

1.5

2

2.5

3

3.5

4

Fig. 8. The Ratio of the Subjective Score with LBVC Enabledto the Subjective Score with LBVC Disabled for Each Pair.

power consumption when LBVC is enabled is higherthan that when LBVC is disabled, since the sensing andframe interpolation operations in LBVC consumes extrapower. We also observe that the average power consump-tion at 1 fps is larger than that at 2 fps. Higher powerconsumption at 1 fps not only results from the frequentadaption to the default frame rate, but also results frommore frame interpolation operations performed at 1 fps.Also, due to the unstable runtime frame rate, the powerconsumption variances with LBVC enabled at 1 fps islarger than those at higher frame rates.

Frame Rate (fps) 1 2 4 6 8Gal.N.-WiFi(%) 8.9 13.2 10.4 7.8 6.6Gal.N.-Cell(%) 9.7 10.1 8.8 5.2 3.6Nexus4-WiFi(%) 10.1 14.0 11.2 8.6 7.4Nexus4-Cell(%) 11.9 13.6 10.3 8.7 5.1

TABLE IIIAVERAGE POWER SAVINGS VS. FRAME RATES ON GALAXY

NEXUS AND NEXUS 4 (COMPARED TO THE DEFAULT 12 FPS WITHLBVC DISABLED).

The average power savings are summarized in TableIII. From the table, we learn that although LBVC doesnot save much power at the selected frame rates, thepower saving resulting from frame rate reduction is largeenough to offset the power consumption of sensing andframe calculations in LBVC. Therefore, LBVC does notintroduce extra power overhead under typical video chatscenarios.

C. Video Quality

We organize a user study to investigate the impactof the LBVC on both objective and subjective videoquality. In the study, we recruit 21 different pairs ofsubjects and there are 42 subjects in total. Each pairperforms 6-minute video chats with the LBVC being

either disabled or enabled. The video chats are doneunder different selected frame rates. To guarantee subjectdiversity, we recruit half of the subjects from Physics,Applied Science, Mathematics, Nursing, Education andRhythmic Gymnastic departments and the other halffrom Computer Science departments. During each videochat, we collect both objective and subjective scores asfollows.

For objective scores, we implement a video chatrecorder and add it to the Mediastreamer2 library inLinphone. It automatically records all video chats duringthe study. We select the Temporal Variation Metric(TVM) [21] to measure the objective video quality, sinceit is a start-of-the-art video quality metric for videostreaming on mobile devices. The recorded videos areanalyzed by MATLAB on a serve to obtain the TVMscores.

To collect subjective scores, we carry out a DoubleStimulus Continuous Quality Evaluation (DSCQE) [29].Technically, each subject rates a video chat by comparingit to a reference video. The scores range is from 0 to 100and can be explained as: very annoying (0-20); annoying(21-40); fair (41-60); good (61-80); excellent (81-100).

In DSCQE, each subject pair perform a video chatunder the default 12 fps with LBVC disabled. This videochat is used as the reference baseline. Then, we randomlychoose a frame rate from Table I. Each subject pairperform and score a video chat with the Linphone beingconfigured with that frame rate when LBVC is eitherenabled or disabled. We repeat all these steps until allthe selected frame rates have been traversed. In addition,we also collect the subjects’ opinions about the delaythey experience during each video chat.

We summarize the objective and subjective scores inFigure 6 and 7. In the figures, the green bars illustratethe average scores at the selected frame rates with

Page 9: A Context-aware Framework for Reducing Bandwidth Usage of ...qyang/paper/LBVC_TMM_2016.pdf · A Context-aware Framework for Reducing Bandwidth Usage of Mobile Video Chats Xin Qi,

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. XX, XXXX 20XX 9

LBVC disabled, the blue bars illustrate those with LBVCenabled and the red line depicts the average score at thedefault 12 fps of Linphone with LBVC disabled.

Objective Scores Figure 6 summarizes the TVM scorescalculated from the recorded videos. From the figure,we observe that the TVM score at 4 fps with LBVCenabled is much higher than that with LBVC disabled.Since TVM score is defined based on the log of thedifference between consecutive frames, it means LBVCcan significantly reduce the difference between consecu-tive frames at lower frame rate. Videos become smootheras a result of the significant decrease in the meansquare difference between consecutive frames. Thus, weconclude that LBVC is able to objectively alleviate videoquality degradation at lower frame rate.

Subjective Scores Figure 7 illustrates the average scoresgiven by the subjects involved in the user study. Take 4fps as example, LBVC increases the subjective score by25.3% compared to that when it is disabled. The scoreincreasing percentage is even higher at 6 fps. Thus, weconclude that LBVC is able to subjectively alleviate theperceptual video quality degradation at lower frame rate.

From the figures, we observe that the subjective scoreis harder for LBVC to maintain than the objectivescore as frame rate decreases. Here are two possibleexplanations. First, the lower the frame rate is, thelarger the scene change between consecutive frames is.Larger scene changes result in more artifacts, which mayimpair the subjects’ perceptual experience. Second, thethreshold values are smaller at lower frame rates (e.g.,1 and 2 fps), and they result in the frequent changesbetween the default frame rate and input frame rate.The frequent frame rate changes may also impair thesubjects’ perceptual experience.

1 2 4 6 80

0.002

0.004

0.006

0.008

0.01

Frame Rate (fps)

Pri

ce (

MB

it/s

core

)

Fig. 9. Average Price in Terms of Bandwidth (MBits) LBVC Paysfor Every Improved Unit of Subjective Score.

From Figure 3, 4 and 5, we observe that LBVC paysextra cost for maintaining video quality at each framerate. To better understand how much bandwidth LBVCpays for every improved unit of subjective score, wepresent Figure 9. This figure demonstrates the averageprice, in terms of bandwidth (MBits), LBVC pays forevery improved unit of subjective score. From the fig-ure, we observe that as the frame rate decreases, theprice increases. It is because that at lower frame rates,effective video content is limited and video quality isfar from being acceptable. Thus, LBVC needs to spendmore bandwidth to collected additional video content forimproving subjective score by one unit. In contrast, athigher frame rates, there is more video content. Withsmall amount of additional bandwidth, LBVC is able togain an extra unit of subjective score.

Figure 8 is a heat map of the subjective score ratioswith LBVC enabled and disabled at each frame rate. Weobserve that LBVC alleviates video quality degradationmost of the timer. Additionally, from both Figure 7 andFigure 8, we observe that at lower frame rate such as 1fps and 2 fps, the video quality improvement percentagesare larger than those at other frame rates. This is becausethe videos are extremely jerky at 1 and 2 fps, and LBVCcan accentuates the smoothness by frame rate control andframe interpolation.

Finally, from the users’ feedbacks, the common delaysadded at the beginning of each video chat are acceptable.They are even negligible when the selected frame rate is4 fps and above.

From Table II and III, and Figure 7, we notice thatat 4 fps, LBVC achieves about 35% bandwidth savingwhile still maintaining good video quality without in-troducing extra power consumption under typical videochat scenarios.

IV. RELATED WORK

The original conference version [50] of thismanuscript proposes a vibration-aware frame rate adap-tion framework for achieving low-bandwidth mobilevideo chats. The conference paper is not comprehen-sive enough since its effectiveness fails when usersmake video chat with smartphone in stationary status,such as being placed on a table. Therefore, we extendthe conference paper to a general context-aware framerate adaption framework by introducing the concept ofguarding contexts. General guarding contexts are usedfor keeping scene changes small and controlling videoframe rate. In addition to the guarding context of devicevibration appeared in the conference paper, we add theguarding context of quick object movement into the sys-tem. This new guarding context is used for dealing with

Page 10: A Context-aware Framework for Reducing Bandwidth Usage of ...qyang/paper/LBVC_TMM_2016.pdf · A Context-aware Framework for Reducing Bandwidth Usage of Mobile Video Chats Xin Qi,

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. XX, XXXX 20XX 10

the situation where smartphone is in stationary status.Additionally, we present the theoretical analysis for theeffectiveness of the frame interpolation algorithm weadopt. Finally, we make more efforts on the experimentsand update the figures in the experiment section in orderto incorporate the new results.

In this section, we discuss how LBVC is differentfrom other existing literatures from the perspectivesof video bitrate adaption, video streaming bandwidthreduction, video streaming power reduction and imageinterpolation.

Video Bitrate Adaption. Existing video bitrate adaptiontechniques optimize video bandwidth usage by consider-ing various contextual information, such as networkingconditions [1] [2] [5] [34] [35], video contents [37], userpreferences [4] [42], and residual power [3]. Being awareof these contexts, they control video streaming bitratein order to maximize video utility (or video quality)for video consumers [43]. Compared to these quality-oriented video bitrate adaption techniques, LBVC adjustsvideo chat bitrate to help user reduce their data planquota usage and maintain satisfiable video quality.

Recent works, such as QAVA [9] and TUBE [44]help user economically use their data plan quota forvideo streaming. LBVC is different from them fromthe following two perspectives. First, LBVC is designedto optimize the bandwidth usage of streaming liveand interactive video chats, but QAVA and TUBE aredesigned for optimizing videos downloading. Second,QAVA and TUBE optimize bandwidth usage as wellas guarantee the quality of outgoing videos by makingtrade-off between video quality and bitrate (or bandwidthusage) at video sender. Instead, LBVC takes a moreaggressive frame rate adaption approach to reduce videobitrate at video sender. It does not guarantee the qualityof outgoing videos, but it adopts extra techniques atboth video sender (context-aware frame rate control)and receiver (frame interpolation) to maintain the videoquality.

Video Streaming Bandwidth Reduction. Video com-pression techniques [12] [13] [30] [31] [14] have alsobeen proposed to reduce video streaming bandwidthusage. Some of them, such as VP8 [12] and H.264 [13],reduce video sizes by leveraging motion compensationtechnique that exploits the temporal correlation amongframes. Furthermore, SenseCoding [31] and SaVE [30]utilize inertial sensors, such as accelerometers, to en-hance the aforementioned motion estimation based meth-ods. Additionally, techniques such as compress sensing[32] and context-aware compression [33] have also beenproposed for compressing videos.

The main goal of video compression techniques is toreduce the number of necessary bits to encode videosby exploiting various contextual information from videoframes. Instead, LBVC takes a different approach byreducing the input size of video compression techniquesand this approach is orthogonal to the existing videocompression techniques.

Video Streaming Power Reduction. Empirical mea-surements performed in [45] demonstrate that the chunk-based video streaming protocols are the most powerefficient for downloading videos on mobile devices, sincenetworking components on mobile devices have morechance to sleep. In [38], the authors propose an approachthat jointly utilizes unicast and multicast to optimizethe energy consumption of video streaming on mobiledevices. SMMC [39] optimizes mobile energy consump-tion by considering demand, socialization, mobility, anddelivery simultaneously. It achieves high content deliveryefficiency and QoS at the same time. QUVoD [40] is aQoE-driven User-centric solution for Video-on-Demand(VoD) services in urban vehicular network. It achieveshigh efficiency in terms of energy consumption andquality of experience. PMCV [41] achieves the goodbalance between energy efficiency and performance bycreatively detecting and managing mobile community.GreenTube [47] optimizes power consumption for mo-bile video downloading by judiciously scheduling down-loading activities to minimize unnecessary active periodof 3G/4G radio.

In contrast, LBVC is proposed to mainly reduce band-width usage of video chats streaming on smartphones,although it has the capacity of, but not significantly,reducing power consumption of video chats.

Image Interpolation. The cross dissolve algorithm hasbeen extensively used in computer graphics. For exam-ple, it has been used to interpolate intermediate framesbetween two well-aligned images with the purpose ofcreating a face animation with smooth face transitionsin Exploring Photobios [27]. It has also been used in In-terviewVideo [48] to provide smooth transitions betweendifferent video segmentations in an interview video edit-ing tool. Optical-flow based warping algorithms, suchas [23] and [25], are also considered in LBVC’s designphase, but they are not selected because these warpingalgorithms usually involve complex operations, such asimage derivation and matrix multiplication. It makesthese algorithms computationally intensive and we fi-nally resort to the simpler cross dissolve algorithm [27]in LBVC. Works such as [49] utilize frame interpolationtechniques to interpolate the frames which are passivelylost during video transmission. Although LBVC also in-

Page 11: A Context-aware Framework for Reducing Bandwidth Usage of ...qyang/paper/LBVC_TMM_2016.pdf · A Context-aware Framework for Reducing Bandwidth Usage of Mobile Video Chats Xin Qi,

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. XX, XXXX 20XX 11

terpolates frames, those frames are deliberately droppedto save bandwidth.

V. CONCLUSION

In this paper, we attempt to balance the mobile videoquality and bandwidth usage by proposing LBVC, ageneral context-aware frame rate adaption framework.LBVC adapts the video frame rate with respect tothe selected guarding context at the sender side andmakes up missing information by interpolating the in-termmediate frames at the receiver side. We implementLBVC with guarding context of severe device vibrationand demonstrate its effectiveness through real systemexperiments and a user study. LBVC saves bandwidthby up to 35% under typical video chat scenarios as wellas alleviate video quality degradation.

ACKNOWLEDGMENT

The work is supported in part by U.S. NationalScience Foundation Grants CNS-1253506 (CAREER)and CNS-1250180, the National Natural Science Foun-dation of China Grant 61402380, the FundamentalResearch Funds for the Central Universities GrantXDJK2013C116.

REFERENCES

[1] X. Wang, M. Chen, Kwon. T.T., Yang L.T., and Leung V.C.M.“AMES-Cloud: A Framework of Adaptive Mobile Video Stream-ing and Efficient Social Video Sharing in the Clouds,” IEEETrans. Multi., vol. 15, no. 4, pp. 811-820, Jan. 2013.

[2] Egilmez H.E., Civanlar S., and Tekalp A.M.. “ An OptimizationFramework for QoS-Enabled Adaptive Video Streaming OverOpenFlow Networks,” IEEE Trans. Multi., vol. 15, no. 3, pp.710-715, Dec. 2012.

[3] S. Chuah, Y. Tan, and Z. Chen. “ Rate and Power Allocationfor Joint Coding and Transmission in Wireless Video ChatApplications,” IEEE Trans. Multi., vol. 17, no. 5, pp. 687-699,Mar. 2015.

[4] L. Chen, Y. Zhou, and D. Chiu. “ Smart Streaming for OnlineVideo Services,” IEEE Trans. Multi., vol. 17, no. 4, pp. 485-497,Feb. 2015.

[5] Argyriou A., Kosmanos D., and Tassiulas L. “ Joint Time-Domain Resource Partitioning, Rate Allocation, and VideoQuality Adaptation in Heterogeneous Cellular Networks,” IEEETrans. Multi., vol. 17, no. 5, pp. 736-745, Mar. 2015.

[6] US Video Call Statistics. Available: http://goo.gl/C7yjcO.[7] Mobile Video Chat User Population Trend in UK. Available:

http://goo.gl/YtEWR1.[8] Skype Bandwidth Requirement. Available: http://goo.gl/apzY9s.[9] J. Chen, A. Ghosh, J. Magutt, and M. Chiang, “Qava: quota

aware video adaptation,” in Proc. of ACM CoNEXT, Nice, France,Dec. 2012.

[10] H. Falaki, R. Mahajan, S. Kandula, D. Lymberopoulos, R.Govindan, and D. Estrin, “Diversity in smartphone usage”, inProc. of ACM MobiSys, San Francisco, California, Jun. 2010,pp.179-194.

[11] White paper: Live and on-demand streaming video with lifesizeuvc video center, Mar. 2012.

[12] RFC6386 - vp8 data format and decoding guide. Available:http://tools.ietf.org/html/rfc6386.

[13] H.264: Advanced video coding for generic audiovisual service.International Telecommunication Union, 2013.

[14] S. Shi, C.-H. Hsu, K. Nahrstedt, and R.Campbell, “Usinggraphics rendering contexts to enhance the real-time video codingfor mobile cloud gaming,” in Proc. of ACM MM, Scottsdale,Arizona, Nov.-Dec. 2011, pp. 103-112.

[15] Y. Wang, Y.-Q Zhang, and J. Ostermann, Video Processing andCommunications, 2001.

[16] L. Keller, A. Le, B. Cici, H. Seferoglu, C. Fragouli, andA. Markopoulou, “Microcast: cooperative video streaming onsmartphones,” in Proc. of ACM MobiSys, Low Wood Bay, LakeDistrict, UK, Jun. 2012, pp. 57-70.

[17] Linphone. Available: http://www.linphone.org.[18] Shark for root. Available: http://goo.gl/IRCUVw.[19] Wireshark. Available: http://www.wireshark.org/.[20] Monsoon power monitor. Available: http://goo.gl/spM1pt.[21] A. (Jack) Chan, A. Pande, E. Baik, and P. Mohapatra. “Tem-

poral quality assessment for mobile videos,” in Proc. of ACMMobicom, Istanbul, Turkey, Aug. 2012, pp.221-232.

[22] Session description protocol. Avaiable: http://goo.gl/fcphyP.[23] T. Brox, A. Bruhn, N. Papenberg, and J. Weickert. “High

accuracy optical flow estimation based on a theory for warping,”in Proc. of ECCV, vol.3024, 2004, pp. 25-36.

[24] A. Rahmati, and L. Zhong, “Context-for-wireless: Context-sensitive energy-efficient wireless data transfer,” in Proc. of ACMMobiSys, San Juan, Puerto Rico, Jun. 2007, pp.165-178.

[25] L. L. Raket, L. Roholm, A. Bruhn, and J. Weickert, “Motioncompensated frame interpolation with a symmetric optical fllowconstraint,” in Proc. of ISVC, vol. 7431, 2012, pp. 447-457.

[26] Optical-flow execution time. Available: http://goo.gl/vcS1px.[27] I. Kemelmacher-Shlizerman, E. Shechtman, R. Garg, and S. M.

Seitz. “Exploring photobios,” ACM Trans. on Graphics, Vol. 30,Is. 4, Jul. 2011.

[28] L. Zhang, B. Tiwana, Z. Qian, Z. Wang, R.P. Dick, Z.M. Mao,and L. Yang, “Accurate online power estimation and automaticbattery behavior based power model generation for smartphones,”in Proc. of CODES/ISSS, Amsterdam, The Netherlands, Oct.2010, pp. 105-114.

[29] “Methodology for the subjective assessment of the quality oftelevision pictures,” ITU-R Rec. BT.500-11 , 2002.

[30] X. Chen, Z. Zhao, A. Rahmati, Y. Wang, and L. Zhong. “SaVE:sensor-assisted motion estimation for efficient h.264/avc videoencoding,” in Proc. of ACM MM, Beijing, China, Oct. 2009.

[31] G. Hong, A. Rahmati, Y. Wang, and L. Zhong. “Sensecoding:Accelerometer-assisted motion estimation for efficient video en-coding,” in Proc. of ACM MM, Vancouver, British Columbia,Canada, Oct. 2008, pp. 749-752.

[32] J.Meng, “Compressive sensing in wireless communications,”Ph.D. Thesis, University of Houston, 2010, 121 pages.

[33] X. Bao, T. Narayan, A. A. Sani, W. Richter, R. R. Choudhury,L. Zhong, and M. Satyanarayanan. “The case for context-awarecompression,” in Proc. of ACM HotMobile, Phoenix, Arizona,2011, pp. 77-82. 2011.

[34] J. Chen, R. Mahindra, M. A. Khojastepour, S. Rangarajan, andM. Chiang. “A scheduling framework for adaptive video deliveryover cellular networks,” in Proc. of ACM MobiCom, Miami, FL,Sep.- Oct. 2013, pp.389-400.

[35] S. Lee and K. Chung. “Combining the rate adaptation andquality adaptation schemes for wireless video streaming,” J. Vis.Comun. Image Represent., vol. 19, no. 8, pp.508-519, Dec. 2008.

[36] Marr, D. AND HildrethI, E. “Theory of edge detection,” Proc.R. Soc. Lond. B 207, 187?217, 1980.

[37] Y. Li, Z. Li, M. Chiang, and A. R. Calderbank. “Content-awaredistortion-fair video streaming in congested networks,” IEEE

Page 12: A Context-aware Framework for Reducing Bandwidth Usage of ...qyang/paper/LBVC_TMM_2016.pdf · A Context-aware Framework for Reducing Bandwidth Usage of Mobile Video Chats Xin Qi,

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. XX, XXXX 20XX 12

Transaction on Multimedia, vol. 11, no. 6, pp. 1182-1193, Oct.2009.

[38] S. Almowuena, M. M. Rahman, C. H. Hsu and A. A. Hassan.“Energy-Aware and Bandwidth-Efficient Hybrid Video Stream-ing Over Mobile Networks”, IEEE Transactions on Multimedia,vol.18,no.1,pp.102-115,2016.

[39] C. Xu, S. Jia, L. Zhong and G.-M. Muntean. “Socially awaremobile peer-to-peer communications for community multimediastreaming services”, IEEE Communications Magazine, vol. 53,no. 10, pp.150-156,2015.

[40] C. Xu, F. Zhao, J. Guan, H. Zhang and G.-M. Muntean. “QoE-driven User-centric VoD Services in Urban Multi-homed P2P-based Vehicular Networks”, IEEE Transactions on VehicularTechnology, vol.62, no.5, pp. 2273-2289, 2013.

[41] C. Xu, S. Jia, M. Wang, L. Zhong, H. Zhang and G.-M.Muntean. “Performance-Aware Mobile Community-Based VoDStreaming Over Vehicular Ad Hoc Networks”, IEEE Transac-tions on Vehicular Technology, vol. 64, no. 3, pp. 1201-1217,2015.

[42] W. Song, D. Tjondronegoro, and M. Docherty, “Saving bitratevs. pleasing users: where is the break-even point in mobile videoquality?” in Proc. of ACM MM, Scottsdale, Arizona, Dec. 2011,pp. 403-412.

[43] S. F. Chang, and A. Vetro, “Video adaptation: Concepts, tech-nologies, and open issues,” in Proc. IEEE, Mar. 2005, pp. 148-158.

[44] S. Ha, S. Sen, C. Joe-Wong, Y. Im, and M. Chiang, “TUBE:Time-dependent pricing for mobile data,” in Proc. of ACMSIGCOMM, Helsinki, Finland, Aug. 2012, pp. 247-258 .

[45] Y. Liu, L. Guo, F. Li, and S. Chen. “An empirical evaluationof battery power consumption for streaming data transmissionto mobile devices,” in Proc. of ACM MM, Scottsdale, Arizona,Nov. - Dec., 2011, pp. 473-482.

[46] Szeliski, R. AND Shum, H.-Y. “Creating full view panoramicimage mosaics and environment maps.” in Proc. of SIGGRAPH1997, pp 251?258.

[47] X. Li, M. Dong, Z. Ma, and F. C.A. Fernandes, “GreenTube:power optimization for mobile videostreaming via dynamic cachemanagement,” in Proc. of ACM MM, Nara, Japan, Oct. - Nov.2012, pp. 279-288.

[48] F. Berthouzoz, W. Li, and M. Agrawala, “Tools for placing cutsand transitions in interview video,” ACM Trans. Graph., vol. 31,no. 4, pp. 67:1-67:8, Jul. 2012.

[49] J. Su and R. M. Mersereau, “Motion-compensated interpolationof untransmitted frames in compressed video,” in Proc. of 30thAsilomar Conf. on Signals Systems and Computers, Asiloma,USA, Mar. 1996, pp. 100-104.

[50] X. Qi, Q. Yang, D. Nguyen, G. Zhou and G. Peng “LBVC:Towards Low-bandwidth Video Chat on Smartphones” Proc. ofthe 6th ACM Multimedia Systems Conference, Portland, Oregon,Mar. 2015, pp.1-12.

Xin Qi Dr. Xin Qi received his Ph.D. degree inComputer Science from the College of Williamand Mary in 2015. Before that, he receivedhis M.Eng. degree in Pattern Recognition andIntelligent Systems from National Key Lab-oratory of Pattern Recognition at Institute ofAutomation, Chinese Academy of Science, in2010 and BSc degree in Computer Sciencefrom Nanjing University, China, in 2007. His

research interests are mainly in mobile systems and cloud computing.He joined VMware Inc. as Member of Technical Staff in June 2015.

Qing Yang Qing Yang received his B.S fromCivil Aviation University of China in 2003 andM.S. from Chinese Academy of Sciences in2007. Since Fall 2011, he is a Ph.D student inthe Department of Computer Science, Collegeof William & Mary. His research interests aresmartphone security and energy efficiency.

David T. Nguyen David T. Nguyen has beenworking on his Ph.D. in Computer Scienceat the College of William and Mary, USA,since Fall 2011. He is working with Dr. GangZhou, and his research interests include mobilecomputing, ubiquitous computing, and wire-less networking. Before coming to W&M (Fall2011), he was a lecturer at Suffolk Universityin Boston for 2 years. He was also a lecturer

at Christopher Newport University in 2013. In 2014, he worked as aMobile Hardware Engineer in Facebook’s Connectivity Lab, MenloPark, CA.

Peng Ge Ge Peng received her B.S fromNational University of Defense Technologyin 2008. Since Fall 2011, she has been aPh.D student in the Department of ComputerScience, College of William and Mary. Herresearch interests include wireless network-ing, smartphone energy efficiency, and mobilecomputing.

Page 13: A Context-aware Framework for Reducing Bandwidth Usage of ...qyang/paper/LBVC_TMM_2016.pdf · A Context-aware Framework for Reducing Bandwidth Usage of Mobile Video Chats Xin Qi,

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. XX, XXXX 20XX 13

Gang Zhou Dr. Gang Zhou is an AssociateProfessor, and also Graduate Director, in theComputer Science Department at the Collegeof William and Mary. He received his Ph.D.degree from the University of Virginia in 2007.He has published over 70 academic papers inthe areas of sensors & ubiquitous computing,mobile computing, body sensor networks, in-ternet of things, and wireless networks. The

total citations of his papers are more than 5000 according to GoogleScholar, among which five of them have been transferred into patentsand the MobiSys’04 paper has been cited more than 800 times. Heis currently serving in the Journal Editorial Board of IEEE Internetof Things as well as Elsevier Computer Networks. He received anaward for his outstanding service to the IEEE Instrumentation andMeasurement Society in 2008. He also won the Best Paper Awardof IEEE ICNP 2010. He received NSF CAREER Award in 2013.He received a 2015 Plumeri Award for Faculty Excellence. He is aSenior Member of ACM and also a Senior Member of IEEE.

Bo Dai Bo Dai is currently a Ph.D. candidatein Computational Science and Engineering atGeorgia Tech, supervised by Prof. Le Song. Heis working on developing nonparametric statis-tical methods, including kernel methods andnonparametric graphical models, for learningfrom massive volume of complex, uncertainand high-dimensional data. In particularly, heis trying to make nonparametric models scal-

able with provable statistical and computational guarantees.

Daqing Zhang Daqing Zhang is a Chair Pro-fessor at Institute of Software, Peking Uni-versity, China. His research interests includecontext-aware computing, urban computing,mobile social networking, big data analytics,pervasive elderly care, etc.. Dr. Zhang hasserved as the general or program chair formore than 10 international conferences, givingkeynote talks at more than 16 international

events. He is the associate editor for 4 journals including ACMTransactions on Intelligent Systems and Technology, IEEE Trans. OnBig Data, etc.. He is the winner of the Ten-years CoMoRea impactpaper award at IEEE PerCom 2013, the Best Paper award at IEEEUIC 2012 and the Best Paper Runner Up award at Mobiquitous2011. Daqing Zhang obtained his Ph.D. from University of Rome“La Sapienza” in 1996.

Yantao Li Dr. Yantao Li is an AssociateProfessor in the College of Computer andInformation Sciences at Southwest University,China. He received the PhD degree from theCollege of Computer Science at ChongqingUniversity, in December 2012. From Septem-ber 2010 to September 2012, he was a vis-iting scholar supported by China ScholarshipCouncil working with professor Gang Zhou

at the Department of Computer Science in the College of Williamand Mary, USA. His research area includes wireless communicationand networking, sensor networks and ubiquitous computing, andinformation security.