1062 ieee journal on selected areas in …liu/paper/jsac17.pdf · index terms—data-driven...

1062 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 35, NO. 5, MAY 2017

From Prediction to Action: Improving UserExperience With Data-Driven Resource Allocation

Yanan Bao, Student Member, IEEE, Huasen Wu, Member, IEEE, Xin Liu, Member, IEEE

Abstract— Driven by the desire for a better user experience andenabled by improved data storage and processing, much of therecent work has studied user experience prediction in cellularnetworks. In this paper, moving beyond the prediction-onlyapproach, we propose a data-driven resource allocation frame-work that uses data-generated prediction models to explicitlyguide resource allocation for user experience improvement. In aclosed-loop fashion, it further leverages and verifies the causalrelation that often exists between certain feature values (e.g.,bandwidth) and user experience in computer networks. As a casestudy, we consider how to reduce the number of user complaintsin cellular networks. Our approach consists of three components:we train a logistic regression classifier to predict user experi-ence, utilize the trained likelihood as the objective function toallocate network resource, and then evaluate user experiencewith allocated resource to (in)validate and adjust the originalmodel. We design a DualHet algorithm to tackle the problemof multi-dimensional resource optimization with heterogeneoususers. Numerical simulations based on both synthetic and realnetwork data sets demonstrate the effectiveness of the proposedalgorithms. In particular, the simulations based on real datademonstrate up to 2× performance improvement compared withthe baseline algorithm.

Index Terms— Data-driven networking, machine learning,resource allocation, non-convex optimization.

I. INTRODUCTION

W ITH the explosive growth of wireless data traffic andvarious mobile applications, both industry and acad-

emia are increasingly focusing on user experience. In general,for the service provided, user experience determines userengagement, and therefore affects the revenue and long termdevelopment of a company. With the increase of storageand computation capacity, the analysis and prediction of userexperience becomes more feasible. A large body of literaturediscusses how user experience is learned and predicted usingmachine learning techniques [1]–[5].

In many cases, however, user experience prediction itselfis not the ultimate goal. Normally, we hope to proactivelyidentify users with poor experience and take proper actionsto improve it. For instance, cellular operators receive com-plaints about the data services from their customers. Based on

Manuscript received September 22, 2016; revised January 13, 2017;accepted January 26, 2017. Date of publication March 13, 2017; date ofcurrent version May 24, 2017. This work was supported by the NationalScience Foundation under Grant CNS-1547461, Grant CNS-1457060, andGrant CCF-1423542.

The authors are with the Department of Computer Science, University ofCalifornia at Davis, Davis, CA 95616, USA (e-mail: [email protected];[email protected]; [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSAC.2017.2680918

Fig. 1. A closed-loop framework in data-driven resource allocation.

realtime network performance indicators, the complaints canbe predicted to a certain degree. Then, if network operatorscan allocate more wireless resources to the users with poorexperience, it is possible that the complaints can be avoidedproactively. We are facing a natural problem: given limitedresources, how to allocate them to multiple users to optimizethe overall experience?

To answer this question, we advocate a closed-loopapproach that uses data-generated prediction models to explic-itly guide resource allocation for user experience optimization.This approach is illustrated in Fig. 1.

First, based on a historical dataset with labeled user expe-rience, we construct an appropriate user experience predictionmodel to reflect the correlation between feature values anduser experience. Then, we feed the model into the resourceallocation component as the objective function to optimizeresource allocation for incoming users. The output is an appro-priate resource allocation and users with improved featurevalues. Last, the evaluation and data sampling componentsamples data after resource allocation, validates or invalidatesthe model, and adjusts the constructed prediction model asneeded. The details are given in Sec. III-A.

In this framework, we leverage existing machine learningmethods for user experience prediction. Specifically, in thispaper, we use the logistic regression model. We focus on theresource allocation algorithms for the trained model and dis-cuss how to adjust the classification model based on evaluatingresource allocation results to further improve performance.

The proposed framework has two benefits. First, the con-structed classifier illustrates a quantitative relationship betweenthe feature values and the user experience. Using such aquantitative relationship and domain knowledge, we are ableto allocate network resources more precisely to reduce theexpected number of users with poor experience, in contrast

0733-8716 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

BAO et al.: IMPROVING USER EXPERIENCE WITH DATA-DRIVEN RESOURCE ALLOCATION 1063

to the typical approach of using abstract utility functionsfor resource allocation. Second, the framework includes anevaluation component, where users are sampled after resourceallocation to validate or invalidate the causal relationshiphypothesis between the feature values and the user experience.This step also provides further opportunities to adjust theconstructed prediction model.

The proposed framework has several challenges. First, itis typically more challenging to optimize resource allocationbased on prediction models derived from real data than to useutility functions with nice properties such as convexity [6], [7].In our data-driven resource allocation problem, the likelihoodfunction of user complaint based on logistic regression isnon-convex. Therefore, gradient-type methods are unlikely toachieve the global optimum for these problems. Furthermore,there are typically many users in the network and differentusers have different sensitivities to the network parameters.This heterogeneity makes it impossible to find the optimalsolution by exhaustive searching, when the network scale islarge. Moreover, when there are multiple types of resources,the problem becomes more difficult as resources are coupled.

In this paper, we present a holistic solution using the closed-loop framework in data-driven resource allocation. Specifi-cally, we make the following contributions:

• We propose the closed-loop framework in data-drivenresource allocation. Two novel aspects in this frameworkare 1) the resource allocation component, where weexplicitly use the constructed prediction model as theobjective function to optimize resource allocation [8];and 2) the evaluation and data sampling component,which validates or invalidates the model, and adjusts theconstructed prediction model as needed [9].

• We propose a DualHet algorithm to obtain near opti-mal solutions for logistic-regression-based optimizationproblem (Sec. V). Different from [10], the algorithm canobtain near optimal solutions in scenarios with multipletypes of resources, and each type further has heteroge-neous effects on users. Moreover, we propose a perturbedversion of DualHet, referred to as ε-Perturbed DualHet,that leverages the closed-loop feedback to continuouslyadjust the classifier and the improve QoE performance.

• Finally, we evaluate the algorithms based on simu-lations with both synthetic and real world cellulardatasets (Sec. VI). Results indicate our designed algo-rithms can reduce up to 2x expected user complaintscompared to an optimized baseline. We compare oursolution with the upper bounds obtained from the dualproblem, and the results show the gap is less than 3%.

II. RELATED WORK

User experience in cellular networks is studied extensivelyin recent years [1]–[3], [11]–[16]. Balachandran et al. [1]use a month-long anonymous data collected from a cellularnetwork provider to study Quality of Experience (QoE) met-rics including session length, abandonment rate, and partialdownload ratio. The relation between mobile video streamingperformance and user engagement from the perspective of

network operators is discussed in [2]. Using 27 TB videostreaming traffic from more than 37 million flows, the authorsobserve strong correlations between many network featuresand the abandonment or skip rates. [11] uses controlled exper-iments and supervised learning (particularly, decision tree) topredict QoE for Skype calls based on network-level measure-ments. The authors conclude that measuring delay, bandwidth,and loss rate can achieve 83% accuracy in predicting QoE.Chan et al. [3] study the QoE prediction for mobile videoservices using a temporal quality metric and linear learningmodels.

A large body of literature considers the learning-based cost-efficient decision making. Theoretically, our problem could beformulated as a reinforcement learning problem [17]. How-ever, we will face the curse of dimensionality if we directlysolve it with the standard reinforcement learning approach dueto the large state and action space (as we have many users withpossibly different states in the networks). There is anotherline of study on the combination of statistical learning anddecision. Horvitz and Mitchell [18] discuss the pipeline ofdata collection, predictive model, and decision analysis. Theproblem of patient readmission in hospitals with congestiveheart failure is considered by [19]. The authors construct aclassifier to predict readmissions and propose to use patient-specific interventions to reduce the cost. The combination ofprediction and allocating interventions shows a reduction ofboth rehospitalization rate and cost. Considering the resourceallocation as the decision, [20] uses a prediction engineto estimate the performance of a given resource allocationand a genetic algorithm to find an optimized solution forInfrastructure-as-a-Service (IaaS)-based cloud system. Refer-ence [21] uses the collected information from system behav-iors to predict power consumption levels, CPU loads and SLA(Service Level Agreement) timings to improve schedulingdecisions in data centers. Reference [22] proposes a learning-based approach of power control for the uplink interferencemanagement in 4G cellular networks. However, none of theexisting work, to the best of our knowledge, have considereddata-driven network resource allocation problem for improvinguser QoE. Moreover, the methods they considered do not applyto resource allocation in our setting, due to complex userexperience model and the large number of users.

Network Utility Maximization (NUM) has been extensivelystudied, e.g., in [7], [10], [23]. The difference between ourwork and their work lies in two aspects: 1) Our utility functionis learned from real datasets, and thus is more complicated;2) Our problem includes multiple types of resources, whichmakes the problem more challenging. Optimizing the sumof separable utility functions is discussed in [24]–[26]. Ourwork focuses on the case where users are heterogeneous andparameters are independently randomly distributed real num-bers. Compared with results in [25] and [26], our algorithmhas lower complexity while achieving the same theoreticalperformance.

The framework of data-guided resource allocation is studiedin our previous work [8] and [9]. Reference [8] applies theclassifier learned on labeled data to guide resource allocationon unlabeled data. However, [8] considers a simple scenario


where there is only one type of resource, with homogeneoususers. In contrast, this work considers more general andcomplex scenarios with multiple resources and heterogeneouseffects of resources on different users. Reference [9] considersthe same framework but focusing on the neural-network pre-diction model, which is hard to solve efficiently and analyze,and only heuristic algorithms are proposed.

III. PROBLEM STATEMENT

In this paper, we study the problem of reducing the numberof customer complaints at mobile operators. In particular,we consider a tier-1 operator in a city in southern China.The city has about 6 million population and the operator hasa user penetration of 2/3. The operator receives an averageof 600 complaints a day from customers related to the networkservice quality. The goal of the operator is to proactivelyreduce the average number of user complaints by takingappropriate actions. We describe our framework for achievingthis goal as follows.

A. Data-Driven Resource Allocation Framework

To achieve this goal, we advocate a data-driven resourceallocation framework, first proposed in [8] and then enhancedin [9]. As shown in Fig. 1, the framework has three com-ponents: classifier construction, resource allocation, and eval-uation & data (re)sampling. First, we start with a historicaldataset with labeled user experience and the correspondingfeature values (including network performance metrics). Basedon the data, we construct a user experience prediction modelto reflect the correlation between feature values and userexperience. In this step, standard machine learning techniques,such as logistic regression, Support Vector Machines (SVM),random forests, and neural networks, can be applied. In dif-ferent application scenarios, they may differ in predictionperformance and they also result in different complexity inthe resource allocation component. For example, in this work,we use logistic regression as it provides the best predictionresult in our dataset. Furthermore, it is also relatively simplethat allow efficient resource allocations as shown in Sec. V.In [9], neural network model is applied because of its gener-ality, and the corresponding resource allocation is much morecomputationally expensive.

After constructing the prediction model based on labeleddata, we feed the model into the resource allocation componentas the objective function for optimal resource allocation. Thekey intuition here is that the constructed model providesthe best indication on the quantitative relationship betweenthe user features and experience. Therefore, such informationhelps guide resource allocation in a more quantitative manner.The output here is an appropriate resource allocation resultand users with improved feature values.

Last, the evaluation and data sampling component samplesdata after resource allocation, validates or invalidates themodel, and adjusts the constructed prediction model as needed.The component is crucial in real world implementations. Theprediction model constructed in the first step shows correla-tion, not necessarily causation, between the feature values and

TABLE I

FEATURES OF THE MOBILE USER COMPLAINT DATASET

user experience. In addition to domain knowledge and fieldexperience, this step allows us to validate or invalidate thecausation relationship. Furthermore, because resource alloca-tion may change the distribution of feature values, the originalprediction model may need to be adjusted.

B. Applying to the Cellular Network Data

The dataset we obtained from the operator contains569170 normal users and 1275 complaining users, whichillustrates the highly imbalanced nature of user complaints.The dataset is obtained from the network monitoring systemthat records 13 network performance indicators for each user.The recorded numbers are the averages of one hour of eachuser. The dataset has been pre-screened to contain complaintsonly about data services. Complaints for other reasons, e.g.,billing issues, have already been filtered out from the dataset.

Table I shows the features and their sample values. Fea-ture 0 is a constant, decided by the logistic regression modeldescribed in the Sec. IV-B. Features 1-4, 7-8, and 10 are PacketData Protocol (PDP) success percentage, Attachment (ATT)success percentage, Routing Area Update (RAU) successpercentage, session success ratio, unexpected line drops,core network failure percentage, and transmission successratio, respectively. These features are network features mostlyrelated to core networks or existing radio front end charac-teristics. We consider these features “uncontrollable” in thisstudy, i.e., we cannot change the feature values by allocat-ing resources. Furthermore, Features 5 (requested sessions),6 (attempted connections) and 12-13 are traffic characteristicsof users, again, considered “uncontrollable”. The “control-lable” features include Features 9 and 11, which are the radionetwork failure percentage and the rate of downlink throughputto the downlink traffic volume (Feature 12), respectively.

In summary, for each user in the dataset, we have itshourly network measurements, as well as its label: a useris labeled as a complaining user (i.e., a positive, followingthe convention in the machine learning community), if theuser called the customer service at least once during the hour;and normal (i.e., negative) otherwise. Based on this dataset,one can build a prediction model that correlates network


performance metrics and the likelihoods of user complaints,which is discussed in more detail in Sec. IV-B.

Based on the prediction model, the next step is to proac-tively allocate resource to reduce the expected number ofuser complaints. In this work, we only consider the resourcesthat a base station (BS) can allocate to users in its coverage.In particular, a BS can allocate two types of resources to theusers: bandwidth and local proactive reconnection. Allocatingmore bandwidth increases the throughput of the recipient, andthus increases the value of Feature 11, and local proactivereconnection helps the user connect to the BS when a connec-tion failure happens, which improves the value of Feature 9(by reducing its value).

We note that in general the prediction model demonstratescorrelation instead of causation. In this case, domain knowl-edge plays an important role in deciding the causation, whichis similar to the traditional utility-based resource allocation,where a causal relationship between the resource allocationand overall utility is assumed. The benefit of the data-driven approach here is that the prediction model captures aquantitative relationship between the network metrics and theuser experience. By leveraging this quantitative relationship,the prediction model allows us to improve the QoE of usersmore explicitly, compared to using a traditional abstract utilityfunction.

IV. MATHEMATICAL FORMULATION

We formulate the above-stated problem mathematically inthis section.

A. Features and Resources

Consider users in a D-dimensional feature space, i.e., foruser i , we have xi = [xi,1, xi,2, · · · , xi,D ]T , where xi,d is thevalue of feature d (d = 1, 2, · · · , D). Each user is associatedwith a label, which can be a complaining user (label 1) or auser in a normal state (label 0). Following the tradition, we alsodenote label 0 (users in normal state) as negative and label 1(complaining user) as positive.

There are K types of resources, and the resources allo-cated to user i are denoted by ri = [ri,1, ri,2, · · · , ri,K ]T .We assume that there is a linear relation between the allocatedresource and the change of feature values.1 Given ri amountof resources, user i has its feature values updated to

g(xi, ri) = xi + Qiri, (1)

where Qi is a D by K matrix, denoting the effects of the Kresources on D features.

In the following, we first discuss how to build the classifierand then how to use it to allocate network resources to usersmore effectively.

1Linear relation is considered in our first step study of the data-drivenresource allocation problem. This simple relation holds for the features weconsider here such as throughput. More complex and general functions willbe studied in the future work.

Fig. 2. RoC curves of the tested machine learning methods.

B. Learning

The first step is to construct classifiers for cellular userexperience based on the dataset discussed in Sec. III. To handlehighly imbalanced data, we randomly undersample the nega-tives with rate 1/50, and use Receiver operating Character-istic (RoC) curves as the performance metric [27]. Given aclassifier, by changing the decision threshold, multiple pairsof false positive rate and true positive rate can be obtained, andthese pairs define a RoC curve. Specifically, based on true labeland prediction label, any user can be classified into exactlyone of the four categories: true positive, false positive, falsenegative, and true negative. For example, true positive meansboth true label and prediction label are positive. False positivemeans prediction label is positive, but the true label is negative.False positive rate is the proportion of “false positives” in “allthe users whose true labels are negative” (including “falsepositives” and “true negatives”). True positive rate is theproportion of “true positives” in “all users whose true label arepositive” (including “true positives” and “false negatives”).

To choose a proper classifier, we test a variety of widelyused classification algorithms, including logistic regression,neural network, support vector machine, decision tree, randomforest and k-nearest neighbors. 50% randomly sampled datais used to train the classifier, and the rest is used as test data.AUC (Area Under the Curve) [28], a commonly used metricfor positive and negative classification, is utilized to judgeprediction performance. It has an advantage of being insen-sitive to unbalanced datasets, compared with other evaluationmethods, such as the simpler misclassification error. As shownin Fig. 2, logistic regression achieves the best classificationperformance, i.e., the highest AUC scores. Therefore, we uselogistic regression (with its trained weights) to guide resourceallocation in the next step.

Specifically, the logistic regression learning step is modeledas the following optimization problem

maxw,w0

N∑

i=1

[yi log

( 1

1 + exp(−w0 − wT xi)

)

+ (1 − yi ) log( 1

1 + exp(w0 + wT xi)

)] + |w|, (2)

where N is the number of users with labels, yi ∈ {0, 1}(i = 1, 2, · · · , N) are the labels, and wT xi is the inner product


between w and xi. The classifier needs to learn parametersw = [w1, w2, · · · , wD]T and w0, where w contains theweights associated with features, and w0 is the intercept (cor-responding to Feature 0 in Table I). To alleviate overfitting,L1 regularization (the term |w|) is applied [29].

We obtain the prediction model based on the training data,and the weights for the trained logistic regression are shownin Table I.

C. Resource Optimization

For a user with feature x, based on the classifierlearned from labeled data, its probability to be positive is

11+exp(−wT x−w0)

. Therefore, given ri amount of resources,the probability for user i to be positive is

pi = 1

1 + exp(−wT (xi + Qiri) − w0

) . (3)

For the ease of notation, define Ci = −wT xi − w0, which isthe initial intercept for user i , Ai = −QT

i w, and

η(x) = 1/ (1 + exp(x)) . (4)

Let η′(x) denote the gradient of η(x). Note that although weassume a common w for all users, unlike [8], we considermuch more general scenarios where users are under differentnetwork conditions and may have different Qi ’s. Due tothis heterogeneity, the multi-resource allocation problem inour paper cannot be reduced to a single-resource allocationproblem, and the algorithms in [8] do not apply here.

The target of resource allocation is to reduce the expectednumber of positives. This objective is motivated by theneed to improve KPIs (Key Performance Indicators) (in thiscase, the number of complaints). By utilizing the classifierlearned from labeled data as the objective function, givenR = [R1, R2, · · · , RK ]T amount of resources, the resourceallocation for M users without labels can be formulated asthe following optimization problem:

(P-0) minr1,r2,··· ,rM

M∑

i=1

η(Ci + AiT ri); (5)

s.t .M∑

i=1

ri ≤ R; (6)

0 ≤ ri ≤ Bi, i = 1, 2, · · · , M; (7)

where Ai = −QTi w = [Ai,1, Ai,2, · · · , Ai,K ]T , and Ai,k is

the aggregated resource efficiency of resource k on user i .If Ai,k ≤ 0, i.e., allocating resource k to user i impairs itsperformance, we will set Bi,k = 0. Therefore, without loss ofgenerality, we assume Ai,k > 0. Constraint (7) is the user-side resource upper bound, i.e., the resource allocated to useri is bounded by Bi = [Bi,1, Bi,1, · · · , Bi,K ]T , which can becaused by two reasons: the user-side system configuration, andmarginal benefit (when more than Bi resource is allocated, it isno longer effective in improving user experience).

This optimization problem is non-convex, which meansit cannot be solved efficiently by gradient-based numericalmethods. Lee et al. [10] consider a single dimensional case,

where optimal solution can be obtained. However, in our case,the resources are a vector, which makes the problem muchtrickier and the algorithm in [10] cannot be directly applied.We consider the following approach to obtain a near optimalsolution. First, we study the dual problem of the proposedoptimization problem. The dual problem gives us a set ofdual Lagrangian multipliers, which can be interpreted as theprices of the resources. The challenge is that the solutionsobtained by the dual problem may not be feasible in theprimal problem because of the non-convexity. We prove thatwhen users’ features are independently distributed, the feasiblesolution can be obtained readily. The problem formulationand the designed algorithms are not limited to user complaintreduction problem mentioned in Sec. III. It applies to caseswhere logistic regression is used as the binary classifier, andresource allocation has linear effects on the change of featurevalues.

D. Closed-Loop Feedback and Classifier Optimization

In this section, we discuss the closed-loop feedback andpossible ways of classifier optimization.

First, this component serves the crucial role of validating,partially validating, or invalidating the causal relationshipassumed in the resource allocation component. In other words,based on domain knowledge and the learned prediction model,we assume that changing certain feature values improves theuser experience in the resource allocation component. Theclosed-loop feedback allows us to validate or invalidate thisassumption by sampling user labels after resource allocation.This is similar to randomized tests typically used in evaluatingcausal relationship.

Furthermore, we note that the goal of the prediction modelhere is slightly different from that of the traditional one.Traditionally, the goal of the classifier is to maximize accuracyin predicting user labels. In contrast, the goal of the classifierhere is to best guide resource allocation with respect to theground truth. In particular, it needs to better quantify therelationship in the targeted region: where users are distributedas the result of resource allocation.

Specifically, denote the ground truth by a function G(x),representing the positive probability for a given feature vec-tor x. The set of classifiers that can be expressed by a certainmachine learning method (e.g. logistic regression, neural net-works) is denoted by F. Then, the task is to find the optimal f ∗within F such that

(P-1) f ∗ = arg min f ∈F

∑

i=1,2,...,M

G(g(xi, r∗i ( f ))), (8)

where [r∗1( f ), r∗

2( f )...r∗M( f )] is an optimal solution of (P-0).

If we know the ground truth G(x), we can find the optimal f ∗by optimizing (P-1). However, G(·) is unknown and thus weneed to approximate it based on the sampled data.

We note that this is a fairly complex problem that involvesthe fundamental tradeoff of exploration v.s. exploitation, wherewe need to balance between optimizing the user experiencebased on the learned classifier (exploitation) and improvingthe classifier by sampling more data (exploration). However,


the feature space is typically huge and exploring all thisspace to obtain an accurate classifier will result in a largecost as noted in [30]. In this paper, we will only study aheuristic policy for this exploration and exploitation tradeoff,as described in Section V-D.

V. NEAR OPTIMAL RESOURCE ALLOCATION

In this section, we propose a DualHet algorithm to obtainthe near optimal solution for (P-0). The key idea is dualdecomposition, where we use the Lagrangian multipliers tocoordinate the resource allocation among users. The key chal-lenge is that dual decomposition does not necessarily provideoptimal primal solution due to the nonconvexity of (P-0).In this section, we not only design the algorithm of finding anoptimal dual and using it to obtain feasible primal solutions,but also prove the near-optimality of the proposed DualHetalgorithm under mild technical conditions.

A. DualHet Algorithm

First, the DualHet algorithm solves the dual problem (D-0),defined next, of the primal problem (P-0). Based on the dualsolution, the key step is to obtain a feasible primal solutionand demonstrate its near optimality.

Define ηi (r) = η(Ci + AiT r). The Lagrangian of (P-0) is

L(r1, r2, · · · , rM, λ) =M∑

i=1

ηi (ri) + λT (

M∑

i=1

ri − R), (9)

where λ ∈ RK×1 is the Lagrangian Multiplier (LM). The dualproblem of (P-0) is

(D-0) maxλ

D(λ) =M∑

i=1

ui (λ) − λT R; (10)

s.t . λ ≥ 0, (11)

where

(D-i ) ui (λ) = minri

ηi (ri) + λT ri; (12)

s.t . 0 ≤ ri ≤ Bi. (13)

Typically, λ is interpreted as the prices of the K typesof resources. Based on it, each user i minimizes its owncomplaining likelihood plus the cost of resource consumptionin a distributed manner ((D-i )). Intuitively, when the price ofresource k increases, a user tends to reduce the consumptionof resource k, which may result in higher consumption ofother types of resources. We discuss how to obtain the optimalsolution for problem (D-i ), denoted by ri

∗(λ), in Sec. V-B.Note that (D-i ) may have more than one optimal solution.

Denote one of the optimal LM-s by λ∗, i.e.,

λ∗ = arg maxλ minr1,r2,··· ,rM

L(r1, r2, · · · , rM, λ). (14)

Next, we present the DualHet algorithm, which first findsthe optimal LM λ∗, and then allocate resources based onthe λ∗ to users, while guaranteeing that the resource constraintsare satisfied. Specifically, as shown in Algorithm 1, we usethe subgradient method to solve the dual problem. Because

R−∑Mi=1 r∗

i (λ(t)) is one of the subgradients of the dual prob-lem, the following equation converges to the optimal LM λ∗:

λ(t + 1) =[λ(t) − a(t)(R −

M∑

i=1

r∗i (λ(t)))

]+, (15)

where the step size a(t) needs to satisfy the followingconditions [10],

a(t) → 0, as t → ∞ and∞∑

t=1

a(t) = ∞. (16)

For instance,

a(t) = β/t, (17)

for some positive constant β. After achieving the optimal LM,resources are allocated in a sequence where users with singleoptimal solution are given higher priority, as shown inAlgorithm 1 (Lines 8 to 11).

Algorithm 1 DualHet AlgorithmInput : Complaining likelihood function

η1(), η2(), · · · , ηM (), λ(0), available resource R,and convergence threshold �.

Output: r1p, r2

p, · · · , rMp

1 while ||λ(t + 1) − λ(t)|| ≥ � do2 ri

∗ = arg minri ηi (ri) + λ(t)ri s.t. 0 ≤ ri ≤ Bi;3 rneed = ∑M

i=1 ri∗;

4 λ(t + 1) =[λ(t) − a(t)(R − rneed )

]+;

5 end6 λ∗ = λ(t + 1);7 ri

∗ = arg minri ηi (ri) + λ∗ri s.t. 0 ≤ ri ≤ Bi;8 for user i who has a single optimal solution ri

∗ do9 ri

p = ri∗;

10 Update available resource: R = R − rip;

11 end12 for user i who has multiple optimal solutions ri

∗ do13 ri

p = arg minri ηi (ri) + λ∗ri s.t. 0 ≤ ri ≤ min(Bi, R);14 Update available resource: R = R − ri

p;15 end

In Algorithm 1, rneed denotes the resources consumed byusers, based on current LM (i.e., λ(t) and λ(t)), withoutconsidering the global available resources. Furthermore, rneed

is compared with R and subgradient method (Lines 3 to 4in Algorithm 1) is applied to update the prices.

Note that the optimal value of (D-0) is a lower boundof the primal problem, and this lower bound is used in ourperformance evaluation.

B. Resource Allocation at Individual Users

In this subsection, we consider how to solve the non-convexoptimization problem (D-i ) (Lines 2, 7 and 13 in Algorithm 1).Given the prices of resources as λ, each user needs to decidethe amount of resource to consume. It is a challenging prob-lem, even though it is distributed and the number of variables


is reduced to K , because the objective function is still non-convex. The analysis in this subsection also contributes to thenear optimality analysis in Sec. V-C.

The KKT conditions are necessary conditions that optimalsolutions have to satisfy. Utilizing the KKT conditions, we findtwo properties of the optimal solutions. These properties limitthe number of candidate solutions to at most K + 1, and theoptimum is selected by comparing the K + 1 candidates.

The Lagrangian of (D-i ) is as follows:

LS (ri, Bi, λ, τ, v) = η(

Ci + AiT ri

)+ λT ri

− τ T (Bi − ri) − vT ri. (18)

Any optimal solution of (D-i ) satisfies the following KKTconditions:

Ai,kη′(Ci + Ai

T ri) + λk + τk − vk = 0; (19)

vkri,k = 0; (20)

τk(Bi,k − ri,k ) = 0; (21)

vk ≥ 0; (22)

τk ≥ 0;for k = 1, 2, · · · , K . (23)

Sort the K types of resources for user i in a non-increasingorder of Ai,k

λk, i.e., the resources are labeled such that

Ai,1

λ1≥ Ai,2

λ2≥ · · · ≥ Ai,K

λK. (24)

Case 1: Consider the strictly decreasing case of Eq. (24),i.e., Ai,k+1

λk+1= Ai,k

λkfor any k. By analyzing the KKT conditions,

the following property is derived.Property 1: In the optimal solution of (D-i ), for user i ,

the K types of resources are allocated sequentially, i.e., ri,k >0only occurs when ri,k′ = Bi,k′ for all k ′ < k. Moreover, if

η′(Ci +k−1∑

k′=1

Ai,k′ri,k′)

< − λk

Ai,k. (25)

and Bi,k > 0, we have ri,k > 0.Proof: For user i , assume that the optimal resource

solution ri satisfies −λk+1Ai,k+1

< η′(Ci + AiT ri) ≤ −λk

Ai,k. Then

for resource k ′ = k + 1, k + 2, · · · , K , ri,k′ has to be 0,since vk′ needs to be positive to satisfy Eqs. (19, 20, 22, 23).Meanwhile, for resource k ′ = 1, 2, · · · , k−1, ri,k′ has to equalto Bi,k′ , since τk′ needs to be positive to satisfy Eqs. (19,21, 22, 23). Moreover if η′(Ci + ∑k−1

k′=1 Ai,k′ri,k′)

< −λkAi,k

andBi,k > 0, ri,k has to be positive, because otherwise τk = 0and Eq. (19) is not satisfied.

Property 1 implies there is a unique sequence of resourceallocation, i.e., resource k + 1 will be allocated only if allresources k ′ ≤ k have reached their upper bounds. The fol-lowing property provides the amount of each type of resourcethat should be allocated.

Property 2: If resource k is allocated, i.e. ri,k > 0, ri,k

satisfies the following condition:

ri,k = min

{η′−1(−λk, Ai,k) − Ci − ∑k−1

k′=1 Ai,k′ Bi,k′

Ai,k, Bi,k

},

(26)

where η′−1(−λ, ai ) = ln

(−1 + ai

2λ −√

a2i − 4 aiλ

).

Proof: Based on the KKT conditions, ri,k has to satisfy

Ai,kη′(Ci +

k−1∑

k′=1

Ai,k′ Bi,k′ + Ai,kri,k ) + λk = 0, (27)

when 0 < ri,k < Bi,k . The equation aiη′(r) = −λ has two

unique solutions. However, the smaller one is a local maximalpoint with its second order derivative to be a2

i η′′(r) < 0.Therefore we choose η′−1(−λ, ai ).

In summary, combining Property 1 and 2, we have at mostK +1 candidate solutions [ri

0, ri1, · · · , ri

K ], where rik means

the solution in which resource 1 to k are allocated. By selectingthe best among the K + 1 candidates, the optimal resourceallocation at each user is achieved. Note that there could existmultiple optimums among the K + 1 candidates.

Case 2: There may be the case that for some k, Ai,k+1λk+1

=Ai,kλk

. If η′(Ci +∑k+1k′=1 Ai,k′ Bi,k′

) ≥ − λkAi,k

, any combination of

ri,k and ri,k+1 that satisfies η′(Ci +∑k−1k′=1 Ai,k′ Bi,k′ +Ai,kri,k +

Ai,k+1ri,k+1) = − λk

Ai,kis a candidate solution; otherwise,

resource upper bounds Bi,k and Bi,k+1 are consumed, andlikely next resource k + 2 also needs to be allocated.

C. Near Optimality Analysis

In this section, we discuss the performance of the DualHetalgorithm under certain technical conditions.

Let Vopt denote the optimal value of (P-0). To show thenear-optimality of DualHet, we first show that the gap betweenDualHet and the optimal solution is bounded by the number ofusers with more than one optimal solution, given the optimalLM λ∗ (Proposition 1). Then we discuss the possible numberof multi-solution users in practice, which is shown to be smallin the simulations.

Proposition 1: If there are Q users with multiple solutionsgiven λ∗, the solution [r1

p, r2p, · · · , rM

p] generated by theDualHet algorithm is feasible and satisfies

M∑

i=1

ηi (rip) ≤ Vopt + Q. (28)

Proof (Sketch): We prove this lemma by investigating thesubgradient of the dual function D(λ), denoted by ∂ D(λ).Without loss of generality, we assume the first Q users havemultiple solutions. Due to the optimality of λ∗, we have0 ∈ ∂ D(λ∗), and we can show that

∑Mi=Q+1 r∗

i ≤ R. Thus[r1

p, r2p, · · · , rM

p] is feasible because DualHet first allocatesthe single-solution users with resource ri

p = r∗i and then use

the remaining resource to other users. Moreover, by bound-ing the complementary term as (λ∗)T

[∑Mi=1 r∗

i − R] ≥

− ∑Qi=1 ηi (r∗

i ), we have∑M

i=Q+1 ηi (rip) = ∑M

i=Q+1 ηi (r∗i ) ≤

Vopt and the conclusion then follows because ηi (rip) ≤ 1 for

all 1 ≤ i ≤ Q. Please refer to Appendix for more details.Discussions: The Number of Multi-solution UsersNext, we discuss the bound of Q. Assume user i has more

than one optimal solution. In the following, we investigatethe properties of λ∗ when user i has multiple solutions.


Since one user’s optimal solutions are a subset of its candidatesolutions (Sec. V-B), user i must have at least two candidatesolutions achieving the same objective value in (D-i ). Thereare two cases with a tie among candidate solutions.

Case 1: For some k, we have

Ai,k+1

λ∗k+1

= Ai,k

λ∗k

. (29)

In this case, user i could have an infinite number of candidatesolutions sharing the same objective value.

Case 2: Candidate solutions j1 and j2 have the sameobjective value, i.e.,

ηi (rij1) + λ∗T ri

j1 = ηi (rij2) + λ∗T ri

j2, (30)

where j1 < j2. According to Properties 1 and 2,

rij1 = [Bi,1, Bi,2, Bi, j1 , 0, 0, · · · ]T ; (31)

and

rij2 = [Bi,1, Bi,2, Bi, j2−1,

min

⎧⎨

⎩η′−1(−λ∗

j2, Ai, j2)−Ci − ∑ j2

k=1 Ai,k Bi,k

Ai, j2, Bi, j2

⎫⎬

⎭,

0, 0, · · · ]T . (32)

Taking these two solutions into Eq. (30), we have

j2−1∑

k= j1+1

λ∗k Bi,k + λ∗

j2

η′−1(−λ∗j2, Ai, j2) − Ci − ∑ j2−1

k=1 Ai,k Bi,k

Ai, j2

+ η(η′−1

(−λ∗j2, Ai, j2)

)= η

⎛

⎝Ci +j1∑

k=1

Ai,k Bi,k

⎞

⎠ (33)

orj2∑

k= j1+1

λ∗k Bi,k = η

⎛

⎝Ci +j1∑

k=1

Ai,k Bi,k

⎞

⎠

− η

⎛

⎝Ci +j2∑

k=1

Ai,k Bi,k

⎞

⎠ . (34)

Note that user i could satisfy more than one condition in theform of (29), (33) or (34).

Denote the set of users with multiple optimal solutionsby S. Since each user i in S may have more than onecondition in the form of (29), (33) or (34), the dimensionof freedom of λ∗ is reduced by at least 1 from user i .In addition, when the users are heterogeneous, it is likely thatthe conditions (29), (33) or (34) for a user cannot be expressedby the conditions generated by the other users. Because λ∗ isa K -dimensional variable, K users with multiple optimalsolutions reduce the freedom of λ∗ to at least 0. In other words,if |S| > K , the optimal LM λ∗ does not exist. Therefore,for heterogeneous users, we will likely have Q = |S| ≤ K .According to Proposition 1, we know that the performance gapbetween DualHet and the optimum is at most K , the numberof the types of resources, independent of the number of users.Therefore, if the algorithm is deployed on a large number ofusers, the loss can be relatively small.

We note that due to the complexity of conditions (29), (33),and (34), we are unable to obtain specific expressions of theseconditions at this stage. However, as we can see in the linearcase, noise can prevent singularity [31]. The only differencein our problem is that we have nonlinear terms in (33). Sincethe following function of λ∗

j2has only finite solutions:

λ∗j2

η′−1(−λ∗j2, Ai, j2) − θ1

Ai, j2+ η

(η′−1

(−λ∗j2, Ai, j2)

)= θ2,

(35)

where θ1 and θ2 are two real numbers, we expect that ifAi,k , Bi,k , Ci are independently distributed real numbers fordifferent users, they are unlikely to have the same solution andthus the number of users with multiple optimal solutions willbe bounded by K . Our simulation results in Sec. VI validatethe near optimality of our algorithm. More specific expressionof these conditions are left for our future work.

D. ε-Perturbed DualHet for Classifier Optimization

Algorithm 2 ε-Perturbed DualHetInput : Time horizon T , exploration probability εt ’s;

Initial complaining likelihood function ηi ()’s,λ(0), available resource R,

and convergence threshold �;1 for t = 1, 2, . . . , T do2 Obtain resource allocation ri

p by runningDualHet (Algorithm 1) with ηi ()’s, λ(0), R, and �;

3 Draw a random number p ∼ U([0, 1]);4 if p ≤ εt then5 for i = 1, 2, . . . , M do6 if

∑i−1i ′=1 ri′ p ≥ R then

7 rip = 0;

8 else9 ri

p ∼ U([0, 2rip]);

10 rip = min(ri

p, Bi);11 end12 end13 Observe the label of the users and add the samples

to the training set;14 Retrain the model based on the new training set

and update ηi ()’s;15 end16 end

In this section, we propose an ε-perturbed version ofDualHet to improve the classifier by randomized exploration.As shown in Algorithm 2, we implement DualHet (exploit)with probability 1 − εt , and explore and update the predictivemodel with probability εt (Lines 4 to 15). Specifically, whendeciding to explore new samples, we randomly perturb theresource allocation result obtained by DualHet while satisfyingthe resource constraint. With this perturbation, we are able toexplore the ground truth near the decision boundary. In thispaper, we let the perturbed resource follow uniform distrib-ution. Other distributions, e.g., truncated normal distribution,


could also be applied to generate this perturbation. εt is usedto control the exploitation and exploration tradeoff. Typically,if the environment does not change quickly, we will need fewerexplorations as time increases and set εt to decrease as t ,e.g., εt ∝ 1/t . Rigorous design and analysis for exploitationand exploration tradeoff is left as part of our future work.

VI. PERFORMANCE EVALUATION

In this section, we conduct experiments on two differ-ent datasets to evaluate the performance of the proposedalgorithms. The first dataset is a synthetic dataset. It allowsperformance evaluation in a scenario where the ground truthis known. The second dataset is the real-world mobile usercomplaint dataset, introduced in Sec. III. It allows us to studythe performance of the proposed algorithms in a realisticproblem setting.

Our proposed algorithms are compared with an optimizedbaseline algorithm. In the baseline algorithm, each type ofresource is evenly allocated to the predicted positive users.Denote the set of predicted positive users by Sp . When a userhas resource upper bounds, for each type of resource receivedby the user, it is the minimum between the allocated resourceand the upper bound, i.e., ri = min( R

|Sp| , Bi) for any i ∈ Sp .Because whether a user is predicted to be positive or notdepends on the setting of the cut-off point. After rankingthe overall M users based on their predicted complaininglikelihoods, there are at most M + 1 possible predicted resultsby choosing different cut-off points. Therefore, the optimizedbaseline tries all the M + 1 possible predicted results, andthe one with the optimal resource allocation performance ischosen.

We use the optimal solution of the dual problem in Sec. V-Aas the lower bound, which may or may not be achievable.

A. Gaussian Distributed Data in 2D space

In this experiment, a synthetic dataset generated from aknown ground truth distribution is considered. This experi-ment, conducted in a low dimensional space, also illustratesthe intuitions of the algorithms.

In this dataset, positive and negative points are assumed tobe distributed in a 2D space with their means to be (−10, −10)and (10, 10) respectively, and their covariance matrix tobe [8, 0; 0, 8]. A balanced dataset is considered whichhas equal numbers of positives and negatives. In this case,the probability for a point at (u1, u2) to be positive isp(u1, u2) = dp(u1, u2)/(dp(u1, u2) + dn(u1, u2)), wheredp(u1, u2) and dn(u1, u2) are the Gaussian density func-tions of positives and negatives, respectively. In Fig. 3(a),the line indicates the logistic regression classifier trainedfrom 2000 data points with labels. The 200 points, to whomresources will be allocated, are also shown in this figure.Two types of resources are allocated to the points, resource kaffects only feature k and the linear coefficient is denotedby qi,k , for k = 1 and 2. We assume for each point iallocated with resources, it has resource upper bounds asri,1 ≤ (20 − xi,1)/qi,1 and ri,2 ≤ (20 − xi,2)/qi,2, i.e., after

Fig. 3. Illustration of resource allocation w/ synthetic data.

resource allocation, the points cannot exceed an area boundedby u1 ≤ 20 and u2 ≤ 20.

We assume qi,1 and qi,2 are both chosen independently anduniformly from [0,1]. Using DualHet Algorithm, after resourceallocation, the locations of points are shown in Fig. 3(b).The original positives are still marked with “+”, but a subsetof them have been moved across the decision boundary.These moved points probably have their new labels changedto negative. From this figure, we can see that the pointsare moved to a “virtual” diagonal line in parallel with thedecision boundary of the trained classifier. However, due tothe fact that different points have different qi,1 and qi,2,the destination (“virtual” diagonal) line is not straight. Thefigure also illustrates the effect of user-side resource upperbounds (u1 ≤ 20 and u2 ≤ 20) on the resource allocation.

Fig. 4(a) shows the performance of the DualHet algorithm,the optimized baseline, and the lower bound. Since we have2 types of resources, the theoretical gap is 2. However,as shown in Fig. 4(a), the solution found by the DualHetalgorithm has very close performance compared with the lowerbound, which means the performance is very close to theoptimal performance.

Fig. 4(b) plots the performance based on ground truth.This figure shows, compared with the algorithms we designed,the baseline usually takes 2-4 times more resource to achievethe same performance. With 1000 units of resource 1 and1000 units of resource 2, our algorithm reduces about 52%positives, while the baseline only reduces 16% positives. Whenthe resource is limited, e.g., less than 10, even though bothour algorithm and the baseline reduce only a fraction of


Fig. 4. Performance evaluation w/ synthetic data.

expected complaints, the relative gain is more than 5 times. Forexample, when 4.0 units of resource is available, our algorithmreduces 0.257 expected complaints, while the baseline reduces0.050 expected complaints. Fig. 4 also shows the near-optimalsolution we found has similar performance based on both thelearned logistic model and ground truth, since the logisticregression classifier we learned is very close the ground truth.

B. Cellular Customer Complaint Data

In the problem stated in Sec. III, Features 6, 9, 11,and 12 are related to resource allocation within a cell. We con-sider two types of resources that are allocated by a BS. Thefirst resource is bandwidth, and with more bandwidth allocatedto a user, the value of Feature 11 (the ratio of downlinkthroughput v.s. downlink volume) will increase. Although thevalue of Feature 12 (downlink volume) cannot be affected byresource allocation, it affects Feature 11, and different usershave heterogeneous effects. The other resource used by a BSis active reconnection. In cellular networks, a BS periodicallychecks the connections between users and itself. When thereis a radio connection failure, the BS can setup a proactivereconnection quickly. The number of proactive reconnections aBS can perform each hour depends on the computing capacityof the BS’s server. Allocating this reconnection resourcedecreases the value of Feature 9 (radio failure percentage).Since Feature 9 is a ratio, Feature 6 (attempted connections)

TABLE II

COMPLAINTS REDUCED BY DUALHET ALGORITHM VS. BASELINE

needs to be considered as well, even though it is “uncon-trollable”. Note that the number of proactive reconnectionsallocated to a user is bounded by the existing number of radio-network failures of the user. Meanwhile, we assume bandwidthcan be allocated to a user without user side constraints. Notethat these two resources actually can be allocated in a timescale smaller than hour. However, because the user complaintis highly related to the aggregated user experience in thepast hour, our resource allocation approach serves as a longerperiod policy.

Among the 570445 users, 50% users are selected randomlyfor training the classifier. The remaining 50%, or 285222 usersare used for testing. They are grouped into 570 cells and ineach cell there are about 500 users. Since on average a userhas throughput of 4.79 KB/s and 11.42 radio fails per hour,the total bandwidth is 2.395 MB/s, and the total radio failuresare 5710 in a cell. The bandwidth and active reconnections areboth allocated by the BS in a cell to the users in its coverage.We treat the allocated reconnections as a continuous variablefor simplicity, instead of dealing with a NP-hard problemotherwise. In reality, when the allocated reconnection is adecimal, e.g., 3.6, one solution is to round it down to thenearest integer. Some other heuristics, e.g. randomization, canalso be designed to solve it.

With different amount of available resources allocated byeach BS, we run experiments to evaluate the number ofcomplaints reduced. The expected number of complaints is616.50 without resource allocation. Table II shows the reducedcomplaints of the DualHet algorithm compared with theoptimized baseline. The reduced numbers of complaints ofthe DualHet algorithm are presented outside the parentheses,while the baseline results are inside the parentheses. Whenthere are 100 KB/s additional bandwidth (4.1% of existingthroughput) per cell and 1000 reconnections (18% of radiofailures) per cell, the DualHet algorithm reduces 128.67 usercomplaints (20.87% total complaints), which is more than 2xof the performance of the baseline. The DualHet algorithmachieves greater improvement compared with the baselinewhen resource is scarce. This is intuitive: when resource isscarce, judicious allocation is more important. On the otherhand, when resource is abundant, each user is likely to receivesufficient resource and thus the performance of our algorithmand the baseline is relatively close.

Table III shows the upper bounds of the complaints thatcan be reduced. This result is derived from the dual problem.The upper bounds are outside the parentheses, while theratios of the achieved complaint reductions v.s. the upper


TABLE III

UPPER BOUNDS OF THE OPTIMUM AND LOWER BOUNDS OFPERFORMANCE RATIOS (DUALHET ALGORITHM)

Fig. 5. Number of complaints v.s. reconnections.

bounds are inside the parentheses. Within cell c, the DualHetalgorithm is suboptimal by at most 2 maxi∈Ic ηi (0), where Ic

denotes all users in cell c. Therefore, considering all 570 cells,the solution found by the DualHet algorithm is suboptimal byat most

∑570c=1 2 maxi∈Ic ηi (0) = 44.89. However, as shown

by Table III, in reality much better performance is achieved.In our experiment, less than 3% performance loss is achievedby the DualHet algorithm compared with the optimal.

Figs. 5 and 6 illustrate the impact of one dimensionalresource on the expected number of complaints, with the otherdimension fixed. In Fig. 5, additional bandwidth is chosen tobe 10 KB/s or 1 MB/s. This figure shows allocating recon-nections at most can reduce about 20% complaints, which isdue to the user side resource upper bounds on reconnections.In Fig. 6, the number of reconnections is 10 or 1000.

The simulation results show high resource usage efficiencyin improving user experience in the beginning (when theamount of available resources are small) and diminishingreturns in the later stage when the resources are abundant. Forexample, when there are 10 reconnections available, 10 KB/sadditional bandwidth can reduce on average 22.03 complaints,i.e., 0.45 KB/s per complaint; at the same time, 1 MB/s addi-tional bandwidth can reduce 154.45 complaints, i.e., 6.47 KB/sper complaint. The results show that there are low hangingfruits in terms of improving the overall/aggregated user expe-rience. Therefore, it is possible for an operator to improveuser experience with relatively low resource consumption; anoperator can decide a sweet-spot for its operation.

C. Impact of Closed-Loop Feedback

In this section, we discuss the impact of closed-loop feed-back by running simulations for ε-Perturbed DualHet. We note

Fig. 6. Number of complaints v.s. bandwidth.

that running experiments in practical cellular networks tocollect actual QoE is costly. Thus, we run simulations based onthe 2D Gaussian mixture data, as described in Section VI-A.We let M− and M+ be the number of samples generatedfrom the distribution with mean (10, 10) and (−10,−10),respectively. If we consider the logistic ground truth similarto Section VI-A, the predicted model will converge to theground truth quickly because logistic regression is indeedthe ground truth model throughout the entire region. In suchcases, the close-loop serves only to validate the causal relation,and its impact on model adjustment is negligible. In reality,we usually do not have a perfect model that captures theground truth for the entire region. Therefore, to illustrate thisimpact, we consider the following case where the samples arepositive if x1 ≤ 0 and x2 ≤ 0, i.e.,

G(x) ={

1, if x1 ≤ 0 and x2 ≤ 0

0, otherwise

In these simulations, we could only allocate the second dimen-sion of resource, where R = [0, R2] with R2 = 5000. Forε-Perturbed DualHet, we set εt = ε0/t for t = 1, 2, . . . , T ,where ε0 = 0.4 is the exploration probability at t = 1.

Figs. 7(a) to 7(c) show the evolution of the decisionboundary for the predictive model, which is trained based onthe samples obtained in the exploration iterations. As we cansee from the figures, although the logistic regression modeldoes not fit the ground truth, the decision boundary graduallyconverges to the horizontal line. Since we can only allocatethe second dimension of the resource, this new model is moresuitable for resource allocation (although it may generate lessaccurate classification results), and thus result in better QoEperformance. This is demonstrated in Fig. 8. We can see that,with random exploration, ε-Perturbed DualHet can reduce alarger number of user complaints as soon as it find a moresuitable predictive model.

Note that we study the above simple example for the logisticregression model. Similar idea applies to more complicatedcases, while more machine learning models such as neuralnetworks are needed. Interested users are referred to ourprevious work [9]. It is also worth noting that, due to itshidden and complex impact, rigorous design and analysis forthe exploration-exploitation tradeoff is still an open problem.


Fig. 7. Evolution of the decision boundary. The ground truth is G(x) = 1 if x1 ≤ 0 and x2 ≤ 0, and G(x) = 0, otherwise. The predictive model is trainedbased on the samples in the exploration iterations.

Fig. 8. Evolution of reduced positives.

VII. CONCLUSION AND FUTURE WORK

We envision that user-experience-oriented system designand its resource allocation are becoming increasingly impor-tant in the near future. This work studies a data-driven resourceallocation problem in cellular networks where the objectiveis to minimize the number of user complaints based ontrained logistic regression classifiers. We consider a generalsetting, where the same amount of allocated resources can haveheterogeneous effects on different users’ features. We design aDualHet algorithm to handle the cases with multiple resourcesand heterogeneous users. Simulation using a real dataset fromcellular networks shows our algorithms are very close to theoptimal performance, and it can reduce up to 2x complaintscompared with the optimized baseline.

There are several limitations of our work. First, we assumethe resources affect the values of features independently. While

it is a reasonable approximation for a set of applicationscenarios, we hope to generalize the model to incorporate moresophisticated scenarios. Second, since our current dataset iscollected in the wild, we cannot evaluate the performance ofthe algorithms based on the feedback from users after resourceallocation. Our future work will consider the online learningand resource allocation problem, where the feedback fromusers will potentially further tune the prediction model.

APPENDIX

PROOF OF PROPOSITION 1

We prove Proposition 1 by investigating the subgradient ofthe dual function D(λ) at λ∗. Using the fact that the zerovector 0 is a subgradient of D(λ) at λ∗, we show the feasibilityand the near-optimality for the solution obtained by DualHet,respectively.

Step 1) Subgradient of the Dual FunctionWe first investigate the subgradient of the dual function

D(λ), denoted as ∂ D(λ). To obtain ∂ D(λ), we rewrite the dualfunction using convex conjugate. Specifically, we introduce thefollowing indicator function to capture the resource constraint:

ϕ(r) ={

0, if 0 ≤ r ≤ R

∞, otherwise.

The convex conjugate of ϕ(r) is

ϕ∗(λ) = supr∈RK

[λT r − ϕ(r)].

Then the dual function can be rewritten as

D(λ) = −ϕ∗(λ) +M∑

i=1

[ηi (ri ) + λT ri ]. (36)

For λ ≥ 0, we can easily verify that the subgradient ofϕ∗(λ) is

∂ϕ∗(λ) = {(R̃1, R̃2, . . . , R̃K ) : R̃k = Rk if λk > 0,

and R̃k ∈ [0, Rk] if λk = 0}. (37)

Combining Eqs. (36) and (37) and according toLemma 3 in [26], we have the following lemma:

Lemma 1: The subgradient of the dual function D(λ) is

∂ D(λ) = conv{ M∑

i=1

ri − R̃ : ri ∈ Ri (λ), R̃ ∈ ∂ϕ∗(λ)}, (38)


where Ri (λ) is the optimal individual decision set under λ andconv

{S}

is the convex hull of set S.Step 2) Feasibility of the DualHet SolutionWe now construct a near-optimal feasible solution by lever-

aging the property of ∂ D(λ∗). Due to the optimality of λ∗,we know that the zero vector 0 ∈ ∂ D(λ∗), i.e.,

0 ∈ ∂ D(λ∗) = conv{ M∑

i=1

r∗i − R̃ : r∗

i ∈ Ri , R̃ ∈ ∂ϕ∗(λ∗)}.

Combining with the convexity of ∂ϕ∗(λ∗), we know that thereexists a vector R̃∗ ∈ ∂ϕ∗(λ∗) such that

R̃∗ ∈ conv{ M∑

i=1

r∗i : r∗

i ∈ R ∗i

}, (39)

where R ∗i is the optimal individual decision set under λ∗.

When there are only Q users having multiple solutionunder λ∗, without loss of generality, we assume the first Qusers have multiple solution under λ∗, i.e., |R ∗

i | > 1 for1 ≤ i ≤ Q and |R ∗

i | = 1 for i > Q. Then, (39) indicatesthat there exists an allocation decision r̃ = [r̃1, r̃2, . . . , r̃M ]such that

r̃i

{∈ conv(R ∗

i ), 1 ≤ i ≤ Q,

= r∗i , i > Q,

(40)

and

M∑

i=1

r̃i = R̃∗, (41)

implying that [r̃1, r̃2, . . . , r̃M ] is a feasible solution of theprimal problem, and

∑Mi=Q+1 r∗

i = ∑Mi=Q+1 r̃i ≤ R. In Algo-

rithm 1, DualHet first allocates the resource to the single-solution users with rp

i = r∗i and then allocate the remaining

resource to other users. Thus, the solution [rp1 , r p

2 , . . . , r pM ] is

feasible.Step 3) Near-Optimality of the DualHet SolutionNote that R ∗

i is the optimal individual decision set andthus ηi (r∗

i ) + λ∗T r∗i is a constant value for all r∗

i ∈ R ∗i . Let

(r̃i , ζi ) ∈ conv{(r∗i , ηi (r∗

i )) : r∗i ∈ R ∗

i }. Then,

ζi + (λ∗)T r̃i = ηi (r∗i ) + λ∗T r∗

i , i = 1, 2, . . . , M. (42)

On the other hand, R̃∗ ∈ ∂ϕ∗(λ∗) implies that (λ∗)T R =(λ∗)T R̃∗. Thus,

(λ∗)T ( N∑

i

r∗i − R

) = (λ∗)T ( N∑

i

r∗i − R̃∗)

= (λ∗)T ( N∑

i

r∗i −

M∑

i=1

r̃i)

=Q∑

i=1

[ζi − ηi (r∗i )]

≥ −Q∑

i=1

ηi (r∗i ) (43)

Because

M∑

i=1

ηi (ri∗) + λ∗T (

M∑

i=1

ri∗ − R)

= L(r1∗, r2

∗, · · · , rM∗, λ∗) ≤ Vopt ,

we have

M∑

i=Q+1

ηi (ri∗) ≤ Vopt .

The conclusion then follows by using the fact that underDualHet, ri = r∗

i for i > Q and ηi (ri ) ≤ 1 for 1 ≤ i ≤ Q.

REFERENCES

[1] A. Balachandran et al., “Modeling Web quality-of-experience on cellularnetworks,” in Proc. 20th MobiCom ACM, 2014, pp. 213–224.

[2] M. Z. Shafiq, J. Erman, L. Ji, A. X. Liu, J. Pang, and J. Wang,“Understanding the impact of network dynamics on mobile video userengagement,” in Proc. ACM SIGMETRICS, 2014, pp. 367–379.

[3] A. J. Chan, A. Pande, E. Baik, and P. Mohapatra, “Temporal qualityassessment for mobile videos,” in Proc. 18th Annu. Int. Conf. MobileComput. Netw. 2012, pp. 221–232.

[4] A. Balachandran, V. Sekar, A. Akella, S. Seshan, I. Stoica, and H. Zhang,“Developing a predictive model of quality of experience for Internetvideo,” ACM SIGCOMM Comput. Commun. Rev., vol. 43, no. 4,pp. 339–350, 2013.

[5] S. S. Krishnan and R. K. Sitaraman, “Video stream quality impactsviewer behavior: Inferring causality using quasi-experimental designs,”IEEE/ACM Trans. Netw., vol. 21, no. 6, pp. 2001–2014, Dec. 2013.

[6] G. Song and Y. Li, “Utility-based resource allocation and schedulingin OFDM-based wireless broadband networks,” IEEE Commun. Mag.,vol. 43, no. 12, pp. 127–134, Dec. 2005.

[7] M. Fazel and M. Chiang, “Network utility maximization with noncon-cave utilities using sum-of-squares method,” in Proc. 44th IEEE Conf.Decision Control, Eur. Control Conf. (CDC-ECC), 2005, pp. 1867–1874.

[8] Y. Bao, X. Liu, and A. Pande, “Data-guided approach for learning andimproving user experience in computer networks,” in Proc. ACML, 2015,pp. 127–142.

[9] Y. Bao, H. Wu, and X. Liu, “From prediction to action: A closed-loop approach for data-guided network resource allocation,” in Proc.22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2016,pp. 1425–1434.

[10] J. W. Lee, R. R. Mazumdar, and N. B. Shroff, “non-convex optimizationand rate control for multi-class services in the Internet,” IEEE/ACMTrans. Netw., vol. 13, no. 4, pp. 827–840, Aug. 2005.

[11] T. Spetebroot, S. Afra, N. Aguilera, D. Saucez, and C. Barakat, “Fromnetwork-level measurements to expected quality of experience: TheSkype use case,” in Proc. IEEE Int. Workshop Meas. Netw. (M&N),Oct. 2015, pp. 1–6.

[12] E. Baik, A. Pande, C. Stover, and P. Mohapatra, “Video acuity assess-ment in mobile devices,” in Proc. 32nd IEEE Int. Conf. Comput.Commun., Apr. 2015, pp. 1–9.

[13] C. Yu, Y. Xu, B. Liu, and Y. Liu, “‘Can you SEE me now?’ A measure-ment study of mobile video calls,” in Proc. IEEE INFOCOM, Apr. 2014,pp. 1456–1464.

[14] Z. M. Mao, “Diagnosing mobile apps’ quality of experience: Challengesand promising directions,” IEEE Internet Comput., vol. 20, no. 1,pp. 66–69, Jan. 2016.

[15] A. Samba, Y. Busnel, A. Blanc, P. Dooze, and G. Simon, “Through-put prediction in cellular networks: Experiments and preliminaryresults,” in Proc. CoRes, 2016, May 2016. [Online]. Available:https://hal.archivesouvertes.fr/hal01311158

[16] Y. Guo, F. Qian, Q. A. Chen, Z. M. Mao, and S. Sen, “Understandingon-device bufferbloat for cellular upload,” in Proc. ACM Internet Meas.Conf., New York, NY, USA, 2016, pp. 303–317. [Online]. Available:http://doi.acm.org/10.1145/2987443.2987490

[17] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction.Cambridge, U.K.: Cambridge Univ Press, 2017.


[18] E. Horvitz and T. Mitchell. (Jun. 2010). From Data to Knowledgeto Action: A Global Enabler for the 21st Century, White Paper ofComputing Community Consortium, accessed on Apr. 4, 2017. [Online].Available: http://erichorvitz.com/CCC_Data%20to%20Knowledge%20to%20Action.pdf

[19] M. Bayati et al., “Data-driven decisions for reducing readmissions forheart failure: General methodology and case study,” PLoS ONE, vol. 9,no. 10, pp. 1–9, Oct. 2014.

[20] G. Lee, N. Tolia, P. Ranganathan, and R. H. Katz, “Topology-awareresource allocation for data-intensive workloads,” in Proc. 1st ACMAsia–Pacific Workshop Syst., 2010, pp. 1–6.

[21] J. L. Berral et al., “Towards energy-aware scheduling in data centersusing machine learning,” in Proc. 1st Int. Conf. Energy-Efficient Comput.Netw., 2010, pp. 215–224.

[22] S. Deb and P. Monogioudis, “Learning-based uplink interference man-agement in 4G LTE cellular systems,” IEEE/ACM Trans. Netw., vol. 23,no. 2, pp. 398–411, Apr. 2015.

[23] M. Chiang, S. Zhang, and P. Hande, “Distributed rate allocationfor inelastic flows: Optimization frameworks, optimality conditions,and optimal algorithms,” in Proc. INFOCOM, vol. 4. Mar. 2005,pp. 2679–2690.

[24] M. Udell and S. Boyd. (2014). Maximizing a Sum of Sigmoids,accessed on Apr. 4, 2017. [Online]. Available: http://web.stanford.edu/~boyd/papers/pdf/max_sum_sigmoids.pdf

[25] M. Udell and S. Boyd, “Bounding duality gap for separable problemswith linear constraints,” J. Comput. Optim. Appl., vol. 64, no. 2, pp.355-378, Jun. 2016

[26] M. Wang. Vanishing Price of Anarchy in Large Coordinative Non-convex Optimization, accessed on Apr. 4, 2017. [Online]. Available:http://www.optimizationonline.org/DB_HTML/2015/07/5021.html

[27] N. V. Chawla, N. Japkowicz, and A. Kotcz, “Editorial: Special issueon learning from imbalanced data sets,” ACM Sigkdd ExplorationsNewslett., vol. 6, no. 1, pp. 1–6, 2004.

[28] D. M. W. Powers, “Evaluation: From precision, recall and fmeasureto roc, informedness, markedness and correlation,” J. Mach. Learn.Technol., vol. 2, no. 1, pp. 37–63, 2011.

[29] A. Y. Ng, “Feature selection, l1 vs. l2 regularization, and rotationalinvariance,” in Proc. 21st Int. Conf. Mach. Learn., 2004, p. 78.

[30] A. Slivkins, “Contextual bandits with similarity information,”J. Mach. Learn. Res., vol. 15, no. 1, pp. 2533–2568, 2014.

[31] A. Neumaier, “Solving ill-conditioned and singular linear systems:A tutorial on regularization,” SIAM Rev., vol. 40, no. 3, pp. 636–666,1998.

Yanan Bao (S’12) received the B.S. and M.S.degrees from Tsinghua University, Beijing, China,in 2010 and 2013, respectively, and the Ph.D. degreefrom the University of California, Davis, CA, USA,in 2016. He is currently with Image Search, Google,Inc. His research interests include machine learning,data mining, and green communications.

Huasen Wu (S’12–M’14) received the B.S. andPh.D. degrees from Beihang University, Beijing,China, in 2007 and 2014, respectively. From 2010to 2012, he was a Visiting Student with Universityof California at Davis (UC Davis), Davis, CA, USA,and from 2012 to 2014, he was a Research Internwith the Wireless and Networking Group, MicrosoftResearch Asia. He is currently a Post-DoctoralResearcher with the Department of ComputerScience, UC Davis. His research interests are instochastic learning and optimization for wireless

networks, crowdsourcing, and recommendation systems.

Xin Liu (M’09) received the Ph.D. degree in elec-trical engineering from Purdue University in 2002.She was a Post-Doctoral Research Associate withthe Coordinated Science Laboratory, UIUC. From2012 to 2014, she was on leave from University ofCalifornia at Davis (UC Davis), Davis, CA, USA,and with Microsoft Research Asia. She is currentlya Professor with the Computer Science Department,UC Davis. Her research interests are in the areaof wireless communication networks, with a currentfocus on data-driven approach in networking. She

became a Chancellor’s Fellow in 2011. She received the Best Paper ofYear Award from the Computer Networks Journal in 2003 for her work onopportunistic scheduling. She received the NSF CAREER Award in 2005for her research on Smart-Radio-Technology-Enabled Opportunistic SpectrumUtilization. She received the Outstanding Engineering Junior Faculty Awardfrom the College of Engineering, UC Davis, in 2005.

1062 ieee journal on selected areas in …liu/paper/jsac17.pdf · index terms—data-driven...

Documents