user-centric models for network applications

Thesis ProposalUser-centric Models for Network Applications

Athula BalachandranOctober 2013

Computer Science DepartmentCarnegie Mellon University

Pittsburgh, PA 15213

Thesis Committee:Srinivasan Seshan, Chair

Vyas Sekar,Hui Zhang,

Peter Steenkiste,Aditya Akella (University of Wisconsin-Madison)

Submitted in partial fulfillment of the requirementsfor the degree of Doctor of Philosophy.

Abstract

Users are the true end points of various network applications (e.g., Internet video, webbrowsing). They sustain the advertisement-based and subscription-based revenue modelsthat enable the growth of these applications. However, the design and evaluation of networkapplications are traditionally based on network-centric metrics (e.g., throughput, latency).Typically, the impact of these metrics on user behavior and quality of experience (QoE) arestudied separately using controlled user studies involving a few tens of users. But, with therecent advancements in big data technologies, we now have the ability to collect large-scalemeasurements of network-centric metrics and user access patterns in the wild. Leveraging onthese measurements, this thesis explores the use of big data techniques including machinelearning approaches to characterize and capture various user access patterns, and developuser-centric models of quality of experience and user behavior in the wild. Different playerssuch as content providers, ISPs and CDNs can improve content delivery by using thesemodels.

1 IntroductionTraditionally, the design and evaluation of network applications and protocols have been largelybased on simple network-centric metrics like throughput, latency. Studies on understandinghow these network-centric metrics affect user behavior and experience are primarily done usingcontrolled user studies involving a few tens of users [9, 32].

However, users are the true end points of these applications since they sustain the advertisement-based and subscription-based revenue models that enable its growth. Hence, it is important tounderstand and architect network applications by primarily taking into account user behavior anduser satisfaction. This is gaining more importance due to the ever-increasing Internet traffic [4],and rising user expectation for quality (lower buffering and higher bitrates for video, lower pageload times for web browsing) [1, 23, 33]. These phenomena put together places the onus on con-tent providers, CDNs and ISPs to understand and characterize user satisfaction and user behaviorin order to explore strategies to cope with the increasing traffic and user expectations.

The traditional approach of employing small scale controlled user studies to understand useraccess patterns are not applicable anymore because of the significant heterogenity of factors(e.g., device, connectivity etc.) in the modern network application setting. But, fortunately, dueto the recent development in big data technologies, we now have the ability to collect large-scale measurements of actual user access patterns in the wild. This opens up the opportunity toleverage the use of big data techniques and analytics towards designing and evaluating networkapplications.

My aim in this thesis is to explore the use of big data technologies including machine learningtechniques to characterize and capture user access patterns in the wild, and develop user-centricmodels of quality of experience and user behavior that can be used by different players in theecosystem (e.g., content providers, CDNs, ISPs) to improve network applications. In order tosystematically explore the problem space, I will present the following three studies:

1. First, I look at how we can use big data and machine learning techniques to understand andcapture user Quality of Experience (QoE) in the wild. I perform this study using client-side

1

measurements of user access from Internet video providers. Despite the rich literature onvideo and QoE measurement, our understanding of Internet video QoE is limited becauseof the shift from traditional methods of measuring user experience and quality. Qualitymetrics that were traditionally used like Peak Signal to Noise Ratio (PSNR) are replacedby average bitrate, rate of buffering etc. Similarly experience is now measured in terms ofmetrics like user engagement that are directly related to the revenue of providers. Usingmachine learning techniques, I develop a predictive model for video QoE from the useraccess data. The QoE model can be used by content providers to select bitrate and CDNsfor sessions.

2. However, not all players have access to end-point (server-side or client-side) logs of useraccess patterns. For example, improving and monitoring QoE is an important goal forthe network carriers and operators. But, unlike other players (e.g., content-providers andCDNs), their knowledge of user access patterns is limited to information that can be ex-tracted from network flow traces. Next, I look at how we can overcome this challenge byusing data-driven techniques to infer quality metrics and user experience measures fromnetwork flow traces, and analyze quality of experience for mobile web browsing sessionsusing network traces from a mobile ISP.

3. Third, I look at how we can use data-driven techniques to characterize various user accesspatterns that have important implications to system design. We identified various userbehaviors (e.g, evolution of interest in content) that can be used to profile and classify users.We also observed several aggregate user behaviors (e.g., partial interest in content, regionalinterests) that have important implications to content delivery infrastructure designs. Weuse these observations to profile individual users and generate population models.

In this proposal, I will discuss existing results and planned work for the above three studies.In Section 2, I will discuss my work on using machine learning techniques to develop a predictivemodel for Internet video Quality of Experience. In Section 3, I present results and planned worktowards analyzing QoE for mobile web browsing sessions using network flow traces. In Section4, I present various viewing patterns and population models that we observed in our datasetalong with planned work to extend the results before presenting the proposed timeline for thecompletion of the thesis in Section 5.

2 Developing a Predictive Model for Internet Video Qualityof Experience

The growth of Internet video has been driven by the confluence of low content delivery costsand the success of subscription-based and advertisement-based revenue models [5]. Hence, thereis agreement among leading industry and academic initiatives that improving users quality ofexperience (QoE) is crucial to sustain these revenue models, especially as user expectations ofvideo quality are steadily rising [1, 23, 33]. However, our understanding of Internet video QoEis limited despite a very rich history in the multimedia community [6, 7, 9]. The reason is that

2

500 1000 1500 2000Average bitrate (Kbps)

20

22

24

26

28

30

32

34

Frac

tion

ofco

nten

tvie

wed

(%)

(a) Complex relationships

400 600 800 1000 1200 1400 1600 1800Average bitrate (Kbps)

0

20

40

60

80

Join

time

(s)

(b) Interaction between metrics

0.0 0.2 0.4 0.6 0.8Rate of buffering (/minute)

20

30

40

50

60

70

80

Frac

tion

ofco

nten

tvie

wed

(%) DSL/Cable providers

Wireless providers

(c) Confounding Factors

Figure 1: Challenges in developing a QoE model for Internet video

Internet video introduces new effects with respect to both quality and experience. First, tradi-tional quality indices (e.g., Peak Signal-to-Noise Ratio (PSNR) [8]) are now replaced by metricsthat capture delivery-related effects such as rate of buffering, bitrate delivered, bitrate switching,and join time [1, 15, 23, 37, 43]. Second, traditional methods of quantifying experience throughuser opinion scores are replaced by new measurable engagement measures such as viewing timeand number of visits that more directly impact content providers business objectives [1, 43].

In [17], we described a systematic methodology to develop a predictive model for Internetvideo QoE. We identified two key requirements that any such model should satisfy. First, we wantan engagement-centric model that accurately predicts user engagement in the wild (measured interms of fraction of video viewed before quitting). Second, the model should be actionable anduseful to guide the design of video delivery mechanisms; e.g., content providers can use it toevaluate cost-performance tradeoffs of different CDNs and bitrates [3, 37], adaptive video playerdesigners can use this model to tradeoff bitrate, join time, and buffering [13, 25, 42].

However, meeting these requirements is very challenging. In Section 2.1, I summarize thethree main challenges in developing a predictive model. In Sections 2.2, I present our data-drivenapproach to develop a robust model. In Section 2.3, I demonstrate a practical utility of QoEmodel to improve content delivery, before discussing related work in Section 2.4. The results arebased on data collected by conviva.com over 3 months, spanning two popular video contentproviders (based in the US) and consisting of around 40 million video sessions.

2.1 ChallengesWe use our dataset to highlight the three main challenges in developing an engagement-centricmodel for video QoE:

• Complex relationships: The relationships between different individual quality metrics anduser engagement are very complex. These were shown by Dobrian et al., and we reconfirmsome of their observations [23]. For example, one might assume that higher bitrate shouldresult in higher user engagement. Surprisingly, there is a non-monotonic relationship be-tween them as shown in Figure 1a. The reason is that videos are served at specific bitratesand hence the values of average bitrates in between these standard bitrates correspond toclients that had to switch bitrates during the session. These clients likely experienced higher

3

5 10 15 20Number of classes

0

20

40

60

80

100

Acc

urac

yin

%

Naive BayesDecision TreeRegressionSingle metric regressionRandom coin toss

(a) Compare models

Type

VOD-Device

VOD-Device-2

Live-Device

VOD-Conn

Live-Conn

VOD-Time of day

Confounding factor

0

2

4

6

8

10

12

Impr

ovem

enti

nac

cura

cy(%

) SplitFeature

(b) Split vs. Feature

Figure 2: Machine Learning Approach to capture QoE

buffering, which led to a drop in engagement.• Complex interaction between metrics: The various quality metrics are interdependent on

each other in complex ways. For example, streaming video at a higher bitrate would leadto better quality. However, as shown in Figure 1b, it would take longer for the video playerbuffer to sufficiently fill up in order to start playback leading to higher join times.• Confounding Factors: In addition to the quality metrics, several external factors also di-

rectly or indirectly affect user engagement [33]. A confounding factor could affect engage-ment and quality metrics in the following three ways. First, some factors may affect userviewing behavior itself and result in different observed engagements. For example, we ob-served that live and VOD video sessions have significantly different viewing patterns [17](not shown here). Second, the confounding factor can impact the quality metric. For exam-ple, we observed that the join time distribution for live and VOD sessions are considerablydifferent [17] (not shown here). Finally, and perhaps most importantly, the confounding fac-tor can affect the relationship between the quality metrics and engagement. For example,Figure 1c shows that users on wireless connectivity are more tolerant to rate of bufferingcompared to users on DSL/cable connectivity.

2.2 ApproachThe three main steps in our approach to overcome the above challenges in order to develop apredictive model are as follows:1. Tackling complex relationships and interactions: We cast the problem of modeling the re-lationship between the different quality metrics as a machine learning problem and use discreteclassification algorithms. Engagement is classifed based on the fraction of video that the userviewed before quitting. For example, when the number of classes is set to 5 the model tries topredict if the user viewed 0-20% or 20-40% or 40-60% or 60-80% or 80-100% of the video be-fore quitting. We use similar domain-specific discrete classes to bin the different quality metrics.Figure 2a compares the performance of three different machine learning algorithms: binary de-cision trees, naive Bayes, and classification based on linear regression. The results are based on10-fold cross-validation [39].We need to be careful in using machine learning as a black-box ontwo accounts. First, the learning algorithms must be expressive enough to tackle our challenges.As observed in Figure 5, approaches like Naive Bayes that assume that the quality metrics are

4

Confounding Factor EngmntQuality Q→EQual

Q→EQuant

Type of video - live or VOD X X X XOverall popularity (live) 7 7 7 7

Overall popularity (VOD) 7 7 7 7

Time since release (VOD) 7 7 7 7

Time of day (VOD) 7 7 7 XDay of week (VOD) 7 7 7 7

Device (live) 7 7 7 XDevice (VOD) 7 X X 7 XRegion (live) 7 7 7 7

Region (VOD) 7 7 7 7

Connectivity (live) 7 7 7 XConnectivity (VOD) 7 7 7 X

Table 1: Summary of the confounding factors. Check mark indicates if a factor impacts qualityor engagement or the quality→engagement relationship. The highlighted rows show the keyconfounding factors that we identify and use for refining our predictive model

independent variables or simple regression techniques that implicitly assume that the relation-ships between quality and engagement are linear are unlikely to work. Second, we do not wantan overly complex machine learning algorithm that becomes unintuitive or unusable for prac-tioners. Fortunately, we find that decision trees which are generally perceived as usable intuitivemodels [35, 44] are also the most accurate.2. Identifying Confounding Factors: As mentioned in Section 2.1, confounding factors canhave the following three effects:1. They can affect the observed engagement2. They can affect the observed quality metric and thus indirectly impact engagement3. They can impact the nature and magnitude of quality→ engagement relationship

For (1) and (2), we use information gain analysis to identify if there is a hidden relationshipbetween the potential confounding factor w.r.t engagement or the quality metrics. For (3), weidentify two sub-effects: the impact of the confounding factor on the quality → engagementrelationship can be qualitative (i.e., the relative importance of the different quality metrics maychange) or it can be quantitative (i.e., the tolerance to one or more of the quality metrics mightbe different). For the qualitative effect, we use the technique described in [35] to compactthe decision tree separately for each class (e.g., TV vs. mobile vs. PC) and compare the treestructure for each class. For the quantitative sub-effect in (3), we simply check if there is anysignificant difference in tolerance. More details can be found in [17]. We identify a few potentialconfounding factors from our dataset (Table 1) and perform each of these tests on all of them.We acknowledge that this list is only representative as we are only accounting for factors thatcan be measured directly and objectively. We take a very conservative stance and mark a factoras confounding if any of the tests shows positive. The results are shown in Table 1.3. Incorporating Confounding Factors: There are two candidate approaches to incorporate theconfounding factors that we identified into the predictive model:

5

• Add as new feature: The simplest approach is to add the key confounding factors as additionalfeatures in the input to the machine learning algorithm and relearn the prediction model.• Split Data: Another possibility is to split the data based on the confounding factors (e.g.,

live on mobile device) and learn separate models for each split. Our predictive model wouldthen be the logical union of multiple decision treesone for each combination of the values ofvarious confounding factors.Each of the above two approaches has it pros and cons. While feature-addition approach

has the appeal of being simple and requiring minimal modifications to the machine learningframework, it assumes that the learning algorithm is robust enough to capture the effects causedby the confounding factors. The split data approach avoids any doubts we may have aboutthe expressiveness of the machine learning algorithm. The challenge with the split approach isthe “curse of dimensionality”—the available data per split becomes progressively sparser withincreasing number of splits. However, the fact that we have already pruned the set of possiblyconfounding external factors, and that the growth of Internet video will enable to capture largerdatasets alleviate this concern.

We analyze the improvements in prediction accuracy that each approach gives for differentdatsets in Figure 2b and observe that the split method performs better (or equivalent) to thefeature addition approach. The reason for this is related to the decision tree algorithm. Decisiontrees use information gain for identifying the best attribute to branch on. Information gain basedschemes, however, are biased towards attributes that have multiple levels [22]. While we bin allthe quality metrics at an extremely fine level, the confounding factors have only few categories.This biases the decision tree towards always selecting the quality metrics to be more important.Final Model: We observed many users who “sample” the video and quit early if it is not of inter-est to them. Taking into account this domain-specific observation, we ignore these early quittersessions from our dataset and relearn the model leading to 6% increase in accuracy. Further,incorporating the three key confounding factors (type of device, device and connectivity), wepropose a unified QoE model based on splitting the dataset for various confounding factors andlearning multiple decision treesone for each split. Accounting for all the confounding factorsfurther leads to around 18% improvement. Our final model predicts the fraction of video viewedwithin the same 10% bucket as the actual user viewing duration with an accuracy of 70%.

2.3 Implication for System DesignThe QoE model that we developed can be used by various principals in the Internet video ecosys-tem to guide system design decisions (e.g., video player designers can use the model design ef-ficient bitrate adaptation algorithms, CDNs can use it to pick bitrates). We evaluate the model inthe context of a (hypothetical) control plane [36] that content providers can use to choose CDNand bitrate for each session using a global optimization framework. For this evaluation, we needto also model a quality model that predicts various quality metrics for a given session. We use asimplified version of the quality prediction model proposed from prior work [36] that computesthe mean performance (buffering ratio, rate of buffering and join time) for each combination ofattributes (e.g., type of video, ISP, region, device) and control parameters (e.g., bitrate and CDN)using empirical estimation.

6

VOD-PC VOD-TV VOD-Mobile VOD-OverallDataset

0.0

0.2

0.4

0.6

0.8

1.0

Ave

rage

enga

gem

ent(

Frac

tion

ofvi

deo

view

ed) Smart QoE

BaselineSmart CDN + Lowest BuffRatioSmart CDN + Highest BitrateSmart CDN + Utility Function

(a) VOD

Live-PC Live-TV Live-Mobile Live-OverallDataset

0.0

0.2

0.4

0.6

0.8

1.0

Ave

rage

enga

gem

ent(

Frac

tion

ofvi

deo

view

ed)

Smart QoEBaselineSmart CDN + Lowest BuffRatioSmart CDN + Highest BitrateSmart CDN + Utility Function

(b) Live

Figure 3: Comparing the predicted average engagement for the different strategies

Using this framework we compare different strategies to pick control parameters (CDN andbitrate):• Smart QoE approach: This approach uses a predicted quality model and a predicted QoE

model based on historical data. For choosing the best control parameters for a particularsession, we estimate the expected engagement for all possible combinations of CDNs andbitrates by querying the predicted quality model and the predicted QoE model with the ap-propriate attributes (ISP, device etc.) and assigns the CDN, bitrate combination that gives thebest predicted engagement.• Smart CDN approaches: We find the best CDN for a given combination of attributes (region,

ISP and device) using the predicted quality model by comparing the mean performance ofeach CDN in terms of buffering ratio across all bitrates and assign clients to this CDN.We implement three variants for picking the bitrate: (a) Smart CDN, highest bitrate: Theclient always chooses to stream at the highest bitrate that is available. (b) Smart CDN, lowestbuffering ratio: The client is assigned the bitrate that is expected to cause the lowest bufferingratio based on the predicted quality model (c) Smart CDN, control plane utility function: Theclient is assigned the bitrate that would maximize the utility function (3.7 × Buff Ratio +Bitrate ) which was the optimization goal in prior work [36].• Baseline: We implemented a naive approach where the client picks a CDN and bitrate ran-

domly.We quantitatively evaluate the benefits of these techniques using a trace based simulation.

We use a week-long trace to simulate client attributes and arrival times. In each epoch (one hourtime slots), a number of clients with varying attributes (type of video, ISP, device) arrive. Foreach client session, we assign the CDN and bitrate based on the various strategies mentioned ear-lier. For simplicity, we assume the CDNs are sufficiently provisioned and do not degrade theirperformance throughout our simulation. To evaluate the performance of these strategies, we de-velop actual engagement models and an actual quality models based on the empirical data fromthe current measurement epoch and compare the engagement predicted by these models for eachsession. Since the arrival patterns and the client attributes are the same for all the strategies, theyhave the same denominator in each epoch. Figure 3 compares the performance of the differentstrategies for live and VOD datasets broken down by performance on each device type. As ex-

7

pected, the baseline scheme has the worst performance. The smart QoE approach can potentiallyimprove user engagement by up to 2× compared to the baseline scheme. We observed that thesmart CDN and lowest buffering ratio scheme picks the lowest bitrates and hence the expectedengagements are lower compared to the other smart schemes. The smart CDN with utility func-tion approach and smart CDN highest bitrate approaches have very comparable performances.This is because the utility function favors the highest bitrate in most cases. Our smart QoEapproach picks intermediate bitrates and dynamically shifts between picking the lower and thehigher bitrates based on the various attributes and the predicted quality. Thus, it can potentiallyimprove user engagement by more than 20% compared to the other strategies.

2.4 Related Work

Engagement in Internet video: Past measurement studies provide a simple quantitative un-derstanding of the impact of individual quality metrics on engagement [23, 33].We shed furtherlight and provide a unified understanding of how all the quality metrics when put together impactengagement by developing a QoE model. Similarly, although previous studies have also shownthat a few external factors (e.g., connectivity) affect user engagement [33], there does not existany techniques to identify if an external factor is potentially confounding or not. We extend ourprevious work [15] by developing techniques to identify external factors that are confoundingand incorporate these factors to form a unified QoE model.User studies: Prior work by the multimedia community try to assess video quality by perform-ing subjective user studies and validating objective video quality models against the user studyscores [9, 20, 32, 38, 40]. User studies are typically done at a small-scale with a few hundredusers and the perceptual scores given by users under a controlled setting may not translate intomeasures of user engagement in the wild. The data-driven approach that we proposed is scalableand it produces an engagement-centric model.QoE metrics in other media: There have been attempts to study the impact of network factorson user engagement and user satisfaction in the context of other media technologies. For exam-ple, in [10], the authors study the impact of bitrate, jitter and delay on call duration in Skype andpropose a unified user satisfaction metric as a combination of these factors.

3 Analyzing Quality of Experience for Mobile Web BrowsingSessions

With the proliferation of smartphone applications, cellular network operators are now expected tosupport and provide wire-line compatible quality of experience (QoE) for several applications.However, measuring QoE is extremely challenging for a network operator compared to otherplayers like CDNs and content providers due to the following reasons:• Unlike CDNs and content providers, ISPs do not have access to client-side and server-side

logs that are extensively used for QoE estimation in prior works [17].• The physical environment plays a critical role in wireless user experience. Hence it is not

practical to use active probes to measure and understand application QoE.

8

• Applications running over cellular networks have complex interaction with a number of dif-ferent protocol layers. This leads to trade-offs in several performance characteristics (e.g.,latency vs. capacity [29], average throughput vs. link disruptions [30]). Because of thesecomplex interactions and trade-offs, the relationship between network characteristics andQoE is poorly understood.In order to understand how different network characteristics affect user QoE, cellular net-

work operators need to first develop techniques to extract “ground truth” benchmark data of userengagement and other quality metrics from passive network traces. I will next present my studyon analyzing QoE for mobile web browsing sessions using passive network flow traces and radioaccess network traces collected by AT&T over two months. It looks at whether the techniquesproposed and evaluated for developing a QoE model for Internet video can be extended to a newdomain (web browsing) under a more constrained setting (e.g., lack of client side logs on userengagement).

3.1 Limitations of Previous WorkPrevious work on measuring web browsing QoE have always looked at the impact of page loadtime on user experience [19, 21]. Similar to the Internet video QoE studies in the past, theseinvolved controlled user studies to understand the impact of page load time on user experienceby installing browser plugins that monitor web page loading time and frequently prompt usersto provide feedback on their experience (e.g., satisfied or not). This approach has two majorlimitations:• First, page load time does not capture all aspects of a web browsing session. For instance,

since web pages are downloaded progressively, users can interact and start browsing the pageeven before it is completely loaded. In order to get a holistic view of the impact of networkcharacteristics on mobile web browsing QoE, we need to also accomodate and study the im-pact of new metrics such as network-flow based (e.g., TCP reset flags, partially downloadedobjects) and radio network-based (e.g., cellular handover, radio signal strength) parameters.• Second, user studies are typically conducted in a controlled setting with a few tens of users. It

is impossible to incorporate all possible scenarios under such a controlled setting. Leveragingon the data that is collected by providers, user feedback can be replaced with engagement inthe wild. In this study, I use web session length, measured in terms of the number of clicksthat the user makes within a session, as the measure of engagement. However, as mentionedearlier, estimating engagement is challenging for a network operator since they do not haveaccess to client-side or server-side logs.In what follows, I first present a few initial results and discuss my plans complete this study.

3.2 Preliminary Results

Estimating session length: Since ISPs do not have access to client-side or server-side logs, theyneed to estimate “ground truth” regarding engagement (measured as web session length) fromthe passive network traces. I worked on measuring web session length from HTTP traces. TheHTTP trace consists of GET requests for web objects generated either by (1) the user clicking

9

0 100 200 300 400 500 600Request arrival time

0.0

0.2

0.4

0.6

0.8

1.0

CDF

(% o

f ses

sion

s)

Embedded objectsPage clicks

(a) Arrival time distribution

0.0 0.2 0.4 0.6 0.8Fraction of partially downloaded objects

1.0

1.5

2.0

2.5

3.0

3.5

Aver

age

sess

ion

leng

th

(b) Partial download ratio vs sessionlength

0.0 0.2 0.4 0.6 0.8 1.0Fraction of flows with C->S reset flags

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Aver

age

sess

ion

leng

th

(c)

Figure 4: Selected results from analyzing mobile web browsing QoE

on a link or (2) automatic requests for embedded objects from web pages. The main challengein estimating session length is in differentiating clicks from embedded objects. I first worked onclassifying requests within a single domain (www.cnn.com). By investigating the domain names,I was able to find patterns for web clicks. Codifying this using regular expressions, we collectedground truth for the data and classified URLs as either web clicks or embedded objects.

We explored the following three techniques to automate this classification and generalize itto other domains:• Using inter-arrival time: Based on the assumption that embedded objects have low inter-

arrival time since they are trigerred automatically as opposed to clicks that need human in-tervention, I tried to find a threshold to differentiate the two based on inter-arrival time.The inter-arrival time distribution is shown in Figure 4a. Using a threshold of 2 secondsto differentiate between the two led to about 85% accuracy. Investigating the cases wherethe misclassification ocurred, we found out that CNN uses various third party services (e.g.,discus, scorecard research etc) that regularly send beacons to understand user web brows-ing behavior. Although these are requests for embedded objects, this technique classifies itwrongly as clicks.• Using domain name: We then looked at classifying based on domain names and represented

the domain name as a bag of words. We obtained the training set by looking at just the veryfirst 10 seconds of a session and assuming that the first request during this period has to beclick and the rest are embedded objects. Using this as the training set, we used Naive Bayesalgorithm to learn a model. The model was able to classify clicks and embedded objects with92% accuracy.• Using domain name, URN and type of object: Incorporating information about the type of

object and URN along with the domain name bag of words and relearning the model led to98% classification accuracy.

Session length vs. Network-flow features: We also investigated the relationship between ses-sion length and a few network-flow features. Partial download ratio and client to server resetflag ratio are two network flow metrics that we measured from the traces. We observed severalobjects that are partially downloaded (i.e. downloaded bytes < content length). We calculate thefraction of objects that are partially downloaded in a session and performed correlation analysis

10

with session length as shown in Figure 4b. We observed that sessions with higher partial down-load ratio had lower session length. Another metric that we used to characterize a session is thefraction of TCP flows in a session that had client to server reset flags. We observed that higherthis ratio, lower the session length (Figure 4c). In short, as expected, we observed that “worser”flow-metrics resulted in lower session lengths. We plan to extend this study by incorporatingmore network flow metrics and also radio network metrics.

3.3 Planned Work

Better session length models: We want to further improve the session length model by (1) tryingother advanced machine learning algorithms to do the click vs. embedded object classification,(2) adding more input features (e.g., inter-arrival time) to the model. Further, we want to test ifour technique for classifying web clicks and embedded objects works for other types of websites(e.g, blogging (tumblr), shopping (amazon), social networking (twitter)). This would requirecollecting ground truth for these domains and testing our final classification algorithm against it.Unified QoE model: We want to incorporate more features, including radio network level fea-ture (e.g., hand offs, signal strength) and have a more holistic understanding of how networkfeatures affect web browsing QoE. The first step would be to extract these features from thedataset and then perform correlational analysis to understand the relationships. I would then tryto use machine learning techniques to capture a QoE model similar to the Internet video study.It would also be interesting to study if the relationship between performance and engagement isaffected by external confounding factors, if any (e.g., type of website).

4 Characterizing Internet Video Viewing Behavior In the Wildfor User Customization

The previous two studies looked at leveraging big data techniques towards understanding hownetwork characteristics affect users’ quality of experience. Another application of big data tech-niques is to characterize and model user behavior in the wild, in order to customize the design ofnetwork applications and delivery infrastructure towards individual users and general trends. In[16], we studied various user behaviors and their implications to CDN augmentation strategies.In this section, I will present some of these results.

The data used for this analysis was collected by conviva.com in real time using a client-side instrumentation library in the video player.We focus on data queried over two months (con-sisting of around 30 million video sessions) from two content providers based in the US. Thefirst provider serves VOD objects between 35 minutes and 60 minutes long. These comprise ofTV series, news shows and reality show episodes. The second provider serves sports events thatare broadcast while the event is happening, and hence the viewing behavior is synchronized.

Network applications can be customized based on general trends in user behavior as well asindividual user access patterns. In Section 4.1, I present selected measurements of general trendsin video viewing pattern that have significant implications to delivery infrastructure design. InSection 4.2, I present some observations that can be used to profile and classify individual users

11

−0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0Pearson correlation co-efficient

0.0

0.2

0.4

0.6

0.8

1.0C

DF

offra

ctio

nof

vide

oob

ject

sRegionalNon-regional

(a) Regional Effect

0 5 10 15 20 25Time (hours) in PST

0.0

0.2

0.4

0.6

0.8

1.0

Nor

mal

ized

Num

bero

fAcc

esse

s

region 1region 2region 9

(b) Time of day Effect

0 20 40 60 80 100Fraction of the video

0.00

0.05

0.10

0.15

Frac

tion

ofse

ssio

ns

DataMixture Model

(c) Partial Interest - VOD

0 20 40 60 80 100Fraction of the video

0.0

0.1

0.2

0.3

0.4

Frac

tion

ofse

ssio

ns

DataModel

(d) Partial Interest - Live

Figure 5: User access Patterns

in order to customize the application. I further present planned work to extend these results inSection 4.3 before concluding in Section 4.4.

4.1 General TrendsSome of the interesting general trends in video viewing behavior that we observed are:1. Regional Effects: Typically, the number of accesses to a particular content from a geograph-ical region is strongly correlated with the total population of the region. However, in our livedataset, we observed anomalies in the case of content with region-specific interest (e.g., whena local team is playing a game). Our data consists of only clients within the United States andhence we classified the content as regional or non-regional based on whether it appeals to a par-ticular region within the US. Sports matches between local teams within the US (e.g., NCAA)were classified as regional as opposed to events that are non-regional to the US viewers (e.g.,Eurocup soccer). Figure 5a shows the CDF of the Pearson correlation coefficient [39] betweennumber of accesses from each region to the population of the region (obtained from censusdata [2]) for all live video objects. Access rates of non-regional content show strong correlationto the population, whereas for regional matches it is uncorrelated or negatively correlated.

Implications: The skeweness in access rates caused by regional interest is an important factorto be considered while provisioning the delivery infrastructure to handle unexpected high loads.2. Time of day Effects: In our VOD dataset, we clearly observe strong time-of-day effects—access peak in the evenings with a lull in the night. Further, in order to identify regional variationsin peak load, we zoom into a day and plot the time series of the normalized number of accessesseparately for each region in Figure 5b. Due to lack of space, we only show the results for thetop 3 regions. The number of accesses peaks around 8 pm local time with a lull in the night.We observe that there is a difference between the time when the load peaks at different regions(caused by time zone difference). We further performed auto-correlation and cross-correlationanalysis to conform that these patterns over the entire two months of data [16] (results not shownhere).

Implications: The temporal shift in peak access times across different regions opens up newopportunities to handle peak loads—e.g., spare capacity at servers in regions 1 and 2 can be usedto serve content in region 9 when access rates peak at region 9.

12

4.2 User profiling and classificationWe also observed patterns in user behavior that can be used to profile and classify them. Forinstance, we observed that several users had partial interest in the content that they are viewingand they quit the session without watching the content fully in the case of both VOD and live.We investigated the temporal characteristics of user behavior within a given video session byanalyzing what fraction of a video object users typically view before quitting.

For VOD content, Figure 5c shows that based on the fraction of video that a user viewedwithin a session, users can be classified into three categories—early-quitters (around 10% of theusers watch less than 10% of the video before quitting and might be “sampling” the video), drop-outs (users steadily drop out of the session possibly due to quality issues or lack of interest in thecontent) and steady viewers (a significant fraction of the users watch video to completion). Thisdistribution can be modeled using a mixture model with separate components. We also find that4.5% of the users quit the session early for more than 75% of the sessions; i.e., these users are“serial” early quitters. Similarly, 16.6% of the users consistently viewed the video to completion;i.e., these are consistently steady viewers.

We performed the same analysis for live content. As shown in Figure 5d, we observed userswatching live content can be classified into two categories—early-quitters (a very large fractionof users watch less than 20% of the video before quitting the session) and drop-outs (the remain-ing fraction of users steadily drop out of the video session). We also profile users viewing historyand notice that around 20.7% of the clients are “serial” early quitters—i.e., they quit the sessionearly for more than 75% of the sessions for live content. We also observe several users joiningand quitting multiple times during the same event. Since our dataset consists of sporting events,one possibility is that they might be checking for the current score of the match.

Implications: (1) This analysis is particularly relevant in the context of augmenting CDNswith P2P based delivery. For example, the fact that users typically view VOD objects from thestart and quit early might imply higher availability for the first few chunks of the video. Forlive, even though users quit quickly, they arrive randomly in between the event and hence thefirst part of the event may not necessarily be the most popular part; (2) Content providers andcontent delivery infrastructures can identify the early quitters and steady viewers and customizethe allocation of resources (e.g., use P2P to serve the content to early quitters who are “sampling”the video).

4.3 Planned Work

Characterizing User Interest: I plan to extend the above results by trying to profile individualusers further in terms of understanding and predicting their interest in a particular content. Aswe saw in Section 4.2, we observed users who sample videos and leave the session early in thecase of VOD. While quality issues could be a possible reason for this, it could also be due to lackof interest in the content. Being able to predict if a user is interested in a particular video aheadof time has important implications to caching and provisioning decisions (e.g. P2P vs serverbased delivery) of the infrastructure. In order to characterize user interest, I plan to use fractionof video viewed as an indicator of interest (after filtering out obvious cases where quality issuescould have affected engagement). I will explore various machine learning techniques to learn a

13

Time TaskNov 2013 - Jan 2014 QoE for mobile web browsing sessionsFeb 2014 - Apr 2014 User customization modelsMay 2014 - Aug 2014 Writing dissertationSept 2014 Thesis defense

Table 2: Proposed Timeline

model for user interest.Applications of User Interest Model: The user interest model that I develop can be used for thefollowing two purposes:• Identifying early quitters: The user interest model can be used to predict if a user would quit

session early or watch the video to completion. I will evaluate how well the model performsin identifying early quitters.• Understanding user tolerance: The model can also be used to understand if user interest has

an impact on tolerance to quality issues. If I observe that interest significantly impacts usertolerance, I will try to identify if user interest is a potential confounding factor and furtherfine tune the video QoE model by incorporate this factor.

4.4 Related Work

Content Popularity: There have been studies to understand content popularity in user-generatedcontent systems (e.g., [18, 28]), IPTV systems (e.g., [12, 14, 41]), and other VOD systems (e.g.,[24, 31, 34]). The focus of these studies was on understanding content popularity to enableefficient content caching and prefetching. Other studies analyze the impact of recommendationsystems on program popularity (e.g., [45]) or the impact of flash-crowd like events (e.g. [26]).In contrast, this work focuses on analyzing general trends in content popularity for customizingthe infrastructure design and extends these studies along two key dimensions. First, we modelthe longitudinal evolution in interest for different genres of video content. Second, we analyzeregional variations and biases in content popularity.User behavior: Previous studies show that many users leave after a very short duration possi-bly due to low interest in the content (e.g., [11, 27]). While we reconfirm these observations,we also provide a systematic model for the fraction of video viewed by users using mixturemodel and gamma distributions, and highlight key differences between live and VOD viewingbehavior. Furthermore, we look at the implications of such partial user interest in the context ofcustomizing the delivery infrastructure design.

5 TimelineTable 2 describes the timeline to complete the remainder of this thesis. We expect to write atleastone research paper as a result of the proposed research.

14

References[1] Driving Engagement for Online Video. http://events.

digitallyspeaking.com/akamai/mddec10/post.html?hash=ZDlBSGhsMXBidnJ3RXNWSW5mSE1HZz09. 1, 2

[2] Census Bureau Divisioning. http://www.census.gov/geo/www/us_regdiv.pdf. 4.1

[3] Buyer’s Guide: Content Delivery Networks. http://goo.gl/B6gMK. 2

[4] Cisco forecast. http://blogs.cisco.com/sp/comments/cisco_visual_networking_index_forecast_annual_update/, . 1

[5] Cisco study. http://goo.gl/tMRwM, . 2

[6] P.800 : Methods for subjective determination of transmission quality. http://www.itu.int/rec/T-REC-P.800-199608-I/en, . 2

[7] P.910 : Subjective video quality assessment methods for multimedia applications. http://goo.gl/QjFhZ, . 2

[8] Peak signal to noise ratio. http://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio. 2

[9] Vqeg. http://www.its.bldrdoc.gov/vqeg/vqeg-home.aspx. 1, 2, 2.4

[10] K. Chen, C. Huang, P. Huang, C. Lei. Quantifying Skype User Satisfaction. In Proc.SIGCOMM, 2006. 2.4

[11] L. Plissonneau and E. Biersack. A Longitudinal View of HTTP Video Streaming Perfor-mance. In Proc. MMSys, 2012. 4.4

[12] Henrik Abrahamsson and Mattias Nordmark. Program popularity and viewer behavior in alarge TV-on-Demand system. In IMC, 2012. 4.4

[13] Saamer Akhshabi, Lakshmi Anantakrishnan, Constantine Dovrolis, and Ali C. Begen.What Happens when HTTP Adaptive Streaming Players Compete for Bandwidth? In Proc.NOSSDAV, 2012. 2

[14] David Applegate, Aaron Archer, Vijay Gopalakrishnan, Seungjoon Lee, and Kadangode K.Ramakrishnan. Optimal Content Placement for a Large-Scale VoD System. In Proc.CoNext, 2010. 4.4

[15] Athula Balachandran, Vyas Sekar, Aditya Akella, Srinivasan Seshan, Ion Stoica, and HuiZhang. A quest for an internet video quality-of-experience metric. In Hotnets, 2012. 2, 2.4

15

http://events.digitallyspeaking.com/akamai/mddec10/post.html?hash=ZDlBSGhsMXBidnJ3RXNWSW5mSE1HZz09



http://www.census.gov/geo/www/us_regdiv.pdf

http://www.census.gov/geo/www/us_regdiv.pdf

http://goo.gl/B6gMK

http://blogs.cisco.com/sp/comments/cisco_visual_networking_index_forecast_annual_update/

http://blogs.cisco.com/sp/comments/cisco_visual_networking_index_forecast_annual_update/

http://goo.gl/tMRwM

http://www.itu.int/rec/T-REC-P.800-199608-I/en

http://www.itu.int/rec/T-REC-P.800-199608-I/en

http://goo.gl/QjFhZ

http://goo.gl/QjFhZ

http://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio

http://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio

http://www.its.bldrdoc.gov/vqeg/vqeg-home.aspx

[16] Athula Balachandran, Vyas Sekar, Aditya Akella, and Srinivasan Seshan. Analyzing thePotential Benefits of CDN Augmentation Strategies for Internet Video Workloads. In IMC,2013. 4, 4.1

[17] Athula Balachandran, Vyas Sekar, Aditya Akella, Srinivasan Seshan, Ion Stoica, and HuiZhang. Developing a Predictive Model of Quality of Experience for Internet Video. InSIGCOMM, 2013. 2, 2.1, 2.2, 3

[18] Meeyoung Cha, Haewoon Kwak, Pablo Rodriguez, Yong-Yeol Ahn, and Sue Moon. ITube, You Tube, Everybody Tubes: Analyzing the World’s Largest User Generated ContentVideo System. In Proc. IMC, 2007. 4.4

[19] Kuan-Ta Chen, Cheng-Chun Tu, and Wei-Cheng Xiao. OneClick: A Framework for Mea-suring Network Quality of Experience. In INFOCOMM, 2009. 3.1

[20] Nicola Cranley, Philip Perry, and Liam Murphy. User perception of adapting video quality.International Journal of Human-Computer Studies, 2006. 2.4

[21] Heng Cui and Ernst Biersack. On the Relationshup between QoE and QoE for Web ses-sions. In Technical Report, EURECOM, 2012. 3.1

[22] Houtao Deng, George Runger, and Eugene Tuv. Bias of importance measures for multi-valued attributes and solutions. In ICANN, 2011. 2.2

[23] Florin Dobrian, Vyas Sekar, Asad Awan, Ion Stoica, Dilip Antony Joseph, Aditya Ganjam,Jibin Zhan, and Hui Zhang. Understanding the impact of video quality on user engagement.In Proc. SIGCOMM, 2011. 1, 2, 2.1, 2.4

[24] Jeffrey Erman, Alexandre Gerber, K.K. Ramakrishnan, Subhabrate Sen, and OliverSpatscheck. Over the top video: The gorilla in cellular networks. In IMC, 2011. 4.4

[25] Jairo Esteban, Steven Benno, Andre Beck, Yang Guo, Volker Hilt, and Ivica Rimac. Inter-actions Between HTTP Adaptive Streaming and TCP. In Proc. NOSSDAV, 2012. 2

[26] H. Yin et al. Inside the Bird’s Nest: Measurements of Large-Scale Live VoD from the 2008Olympics. In Proc. IMC, 2009. 4.4

[27] Alessandro Finamore, Marco Mellia, Maurizio Munafo, Ruben Torres, and Sanjay G. Rao.Youtube everywhere: Impact of device and infrastructure synergies on user experience. InProc. IMC, 2011. 4.4

[28] H. Yu, D. Zheng, B. Y. Zhao, and W. Zheng. Understanding User Behavior in Large-ScaleVideo-on-Demand Systems. In Proc. Eurosys, 2006. 4.4

[29] Emir Halepovic, Jeffrey Pang, and Oliver Spatscheck. Can you GET me now? Estimatingthe first time-to-first-byte of HTTP transactions with passive measurements. In IMC, 2012.3

16

[30] Robert Hsieh and Aruna Seneviratne. A Comparison of Mechanisms for Improving MobileISP Handoff Latency for End-to-End TCP. In Mobicom, 2003. 3

[31] Yan Huang, Dah-Ming Chiu Tom Z. J. Fu, John C. S. Lui, and Cheng Huang. Challenges,Design and Analysis of a Large-scale P2P-VoD System. In Proc. SIGCOMM, 2008. 4.4

[32] A Khan, Lingfen Sun, and E Ifeachor. Qoe prediction model and its applications in videoquality adaptation over umts networks. In IEEE Transactions on Multimedia, 2012. 1, 2.4

[33] S. Shunmuga Krishnan and Ramesh K. Sitaraman. Video stream quality impacts viewerbehavior: inferring causality using quasi-experimental designs. In IMC, 2012. 1, 2, 2.1, 2.4

[34] Zhenyu Li, Jiali Lin, Marc-Ismael Akodjenou-Jeannin, Gaogang Xie, Mohamed Ali Kaafar,Yun Jin, and Gang Peng. Watching video from everywhere: a study of the pptv mobile vodsystem. In IMC, 2012. 4.4

[35] Bing Liu, Minqing Hu, and Wynne Hsu. Intuitive Representation of Decision Trees UsingGeneral Rules and Exceptions. In Proc. AAAI, 2000. 2.2, 2.2

[36] X. Liu, F. Dobrian, H. Milner, J. Jiang, V. Sekar, I. Stoica, and H. Zhang. A Case for aCoordinated Internet Video Control Plane. In Proc. SIGCOMM, 2012. 2.3

[37] Xi Liu, Florin Dobrian, Henry Milner, Junchen Jiang, Vyas Sekar, Ion Stoica, and HuiZhang. A case for a coordinated internet video control plane. In SIGCOMM, 2012. 2

[38] V Menkvoski, A Oredope, A Liotta, and A C Sanchez. Optimized online learning for qoeprediction. In BNAIC, 2009. 2.4

[39] Tom Mitchell. Machine Learning. McGraw-Hill. 2.2, 4.1

[40] Ricky K. P. Mok, Edmond W. W. Chan, Xiapu Luo, and Rocky K. C. Chang. Inferring theQoE of HTTP Video Streaming from User-Viewing Activities . In SIGCOMM W-MUST,2011. 2.4

[41] Tongqing Qiu, Zihui Ge, Seungjoon Lee, Jia Wang, Qi Zhao, and Jun Xu. Modeling channelpopularity dynamics in a large IPTV system. In Proc. SIGMETRICS, 2009. 4.4

[42] S. Akhshabi, A. Begen, C. Dovrolis. An Experimental Evaluation of Rate AdaptationAlgorithms in Adaptive Streaming over HTTP. In Proc. MMSys, 2011. 2

[43] Mark Watson. Http adaptive streaming in practice. http://web.cs.wpi.edu/

˜claypool/mmsys-2011/Keynote02.pdf. 2

[44] Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael Jordan. Detectinglarge-scale system problems by mining console logs. In Proc. SOSP, 2009. 2.2

[45] Renjie Zhou, Samamon Khemmarat, and Lixin Gao. The impact of YouTube recommen-dation system on video views. In Proc. IMC, 2010. 4.4

17

http://web.cs.wpi.edu/~claypool/mmsys-2011/Keynote02.pdf

http://web.cs.wpi.edu/~claypool/mmsys-2011/Keynote02.pdf

user-centric models for network applications

Documents

user expectations

usingcontrolled user

various user access

rising user expectation

user behaviorin order

simple networkcentric

architect network applications

big data technologies