impact of user patience on auto-scaling resource capacity for … · 2020. 8. 13. · impact of...

HAL Id: hal-01199207https://hal.inria.fr/hal-01199207

Submitted on 15 Sep 2015

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.

Impact of User Patience on Auto-Scaling ResourceCapacity for Cloud Services

Marcos Dias de Assuncao, Carlos Cardonha, Marco Netto, Renato Cunha

To cite this version:Marcos Dias de Assuncao, Carlos Cardonha, Marco Netto, Renato Cunha. Impact of User Patience onAuto-Scaling Resource Capacity for Cloud Services. Future Generation Computer Systems, Elsevier,2015, pp.1-10. �hal-01199207�

https://hal.inria.fr/hal-01199207

https://hal.archives-ouvertes.fr

Impact of User Patience on Auto-Scaling Resource Capacity for Cloud Services

Marcos Dias de Assuncaoa,∗, Carlos H. Cardonhab, Marco A. S. Nettob, Renato L. F. Cunhab

aInria, ENS de Lyon, FrancebIBM Research, Brazil

Abstract

An important feature of most cloud computing solutions is auto-scaling, an operation that enables dynamic changeson resource capacity. Auto-scaling algorithms generally take into account aspects such as system load and responsetime to determine when and by how much a resource pool capacity should be extended or shrunk. In this article,we propose a scheduling algorithm and auto-scaling triggering strategies that explore user patience, a metric thatestimates the perception end-users have from the Quality of Service (QoS) delivered by a service provider based onthe ratio between expected and actual response times for each request. The proposed strategies help reduce costs withresource allocation while maintaining perceived QoS at adequate levels. Results show reductions on resource-hourconsumption by up to approximately 9% compared to traditional approaches.

Keywords: Cloud computing, auto-scaling, resource management, scheduling

1. Introduction

Cloud computing has become a popular model forhosting enterprise and backend systems that provide ser-vices to end-users via Web browsers and mobile devices[1, 2]. In this context, a typical scenario often comprisesa provider of computing and storage infrastructure, gen-erally termed as Infrastructure as a Service (IaaS) orsimply a cloud provider; a service provider who offers aweb-based service that can be deployed on Virtual Ma-chines (VMs) hosted on the cloud; and the clients, orend-users, of such a service.

In this work, we investigate challenges faced by theservice providers who compose their resource poolsby allocating machines from cloud providers. Meet-ing end-users expectations is crucial for the serviceprovider’s business, and in the context of Web applica-tions, these expectations typically refer to short requestsresponse times. Simultaneously, there are costs asso-ciated with resource allocation, so resource pools withlow utilisation are economically undesired. Moreover,clients demands are uncertain and fluctuate over time,so the problem of resource allocation faced by serviceproviders is clearly non-trivial.

Elasticity, a selling point of cloud solutions, enablesservice providers to modify the size of resource pools

∗Corresponding author: [email protected]

in near real-time via auto-scaling strategies, allowinghence for reactions to fluctuations on clients’ demand.Current strategies typically monitor and predict valuesof target system metrics, such as response time and util-isation level of relevant resources (e.g., CPU, memory,and network bandwidth), and employ rule-based sys-tems to trigger auto-scaling operations whenever prede-fined thresholds are violated. The parameters employedby these strategies do not allow them to explore hetero-geneity of expected response times and of tolerance todelays that end-users have towards service providers. Incombination with actual response times, these two ele-ments define the perception end-users have from a ser-vice QoS, which we will refer to as user patience.

Previous work in other domains has shown that end-users can present heterogeneous patience levels depend-ing on their context [3]. Moreover, in a society wherehuman attention is increasingly becoming scarce, andwhere users perform multiple concurrent activities [4]1

and use multiple devices [5]2, response time might notbe the sole element defining the perceptions end-usershave from a service’s QoS. Our previous work inves-tigated how application instrumentation [6, 7, 8], andend-user context and profiling [9] could be used in the

1Ofcom reports that up to 53% of UK adults media multi-taskwhile watching TV.

2Accenture’s study shows users expect other devices to augmentcontent shown on TV.

Preprint submitted to Future Generation Computer Systems September 4, 2015

End-user Serviceprovider

Cloudprovider

Request service

Deploy service

loop

Monitor service

metrics (e.g. util.)

alt

[trigger scale in/out]

Compute step

Request scale in/out

Service provider allocates a resource pool from the cloud to host a service

Users access the cloud-hosted service

Service provider/cloud monitors the service, computing metrics such as resource utilisation

If a metric, e.g. utilisation, is outside pre-defined limits, a scale in/out operation may be triggered

Compute the step size, i.e. number of VMs to allocate or release

Trigger activation/deactivation of resources

1

2

3

4

5

6

Figure 1: Service provision model and auto-scaling scenario considered in this work.

collection of honest signals determining how clients in-teract with a service provider and what their levels ofpatience are when making requests.

The present work investigates how patience can beexplored by auto-scaling strategies. Compared to ourprevious work [8], in addition to considering provision-ing time, this article makes the following contributions:

• A scheduling algorithm and a patience-based auto-scaling triggering strategy for a service provider tominimise the number of resources allocated froma cloud provider, applicable to scenarios wherethe maximum number of available resources is un-bounded (§ 3);

• Experiments with bounded and unbounded num-bers of resources available to service providers thatshow reductions of up to nearly 9% on resource-hour consumption compared to traditional auto-scaling strategies (§ 4).

2. Problem Description

In this section we describe the scenarios investigatedin this work, present the assumptions regarding capacitylimitations on the pool of resources offered by the cloudprovider, and explain the elements of client behaviourwhich are taken into account by our algorithms.

Figure 1 depicts the service hosting model andthe main steps of auto-scaling operations. A serviceprovider allocates a pool of VMs from a cloud provider

to deploy its service, which is then accessed by end-users. The service provider and/or cloud provider peri-odically monitors the status of the pool, thus computingmetrics such as resource utilisation. A metric lying out-side certain lower and upper limits may trigger a scalein/out operation. A step size — the number of VMs tobe allocated or released — is computed before a changeto the capacity of the resource pool is requested.

Resources are pairwise indistinguishable, that is, theyare VMs which have the same cost and performancecharacteristics. We also consider provisioning time;namely, after triggering the activation of resources, aservice provider must wait for non-negligible time touse them. Finally, the scenarios either have a maximumnumber of resources that can be allocated (bounded) ordo not have limitations of this nature (unbounded); thesetwo scenarios are considered because they pose differ-ent levels of stress on resource utilisation.

We take into account that resources are paid only forthe periods in which they were allocated in a per-minutebilling model. This assumption is not unrealistic, asproviders such as Microsoft Azure offer this type of ser-vice. Consequently, the strategies proposed in this arti-cle may allocate and deallocate a given resource severaltimes within one hour in order to improve utilisation.

This work also considers short tasks whose executiontimes are all equal and in the order of a few seconds,reflecting hence scenarios typically faced by providersof web services. Nevertheless, the concepts and resultscan, without any loss of generality, be reproduced inscenarios with either shorter or longer tasks.

2

Impact onpatience

QoS

(Above expectation)

(Below expectation)

Figure 2: Prospect theory modelling variations of user patience as afunction of QoS [10]; where QoS can represent expectations on, forinstance, request response time and jitter.

We assume that expectations on response time mayvary across individuals, a realistic assumption in cer-tain practical settings; for example, whereas end-usersexpect to get results from Web searches in a coupleof seconds at maximum, clients performing large-scalegraph mining operations, which can take from minutesto hours, tend to be more tolerant.

Differences between expected and actual responsetimes for submitted requests have an impact on users’patience (either positive or negative) whose weight de-creases over time. More precisely, changes on the pa-tience level of an end-user take place after the executionof each request and are described by a function appliedto the ratio between expected and actual response times.Based on Prospect Theory [10], we assume that the neg-ative impact of delays on users’ patience exceeds thebenefits of fast responses (see Figure 2).

The service provider periodically evaluates the sys-tem utilisation and/or user patience and decides onwhether the capacity of its resource pool should beexpanded (shrunk) by requesting (releasing) resourcesfrom (to) the cloud. The questions we therefore seek toaddress are: (i) how to determine critical times whenauto-scaling is necessary or avoidable by exploitinginformation on users’ patience; (ii) how to determinethe number of resources to be allocated or released;and (iii) how strategies behave under scenarios withand without restrictions on the maximum number of re-sources that can be allocated.

3. Algorithms for Auto-Scaling and Scheduling

This section starts with a mathematical description ofthe challenges a service provider faces to set its resourcepool capacity. Based on this formalisation, we introducethe main contributions of this work: auto-scaling strate-gies and a scheduling algorithm based on user patience.Table 1 summarises the employed notation.

3.1. Mathematical DescriptionWe consider a discrete-time state-space model where

time-related values are integers in T ⊆ N representingseconds. There is a set U of end-users or clients and asequence J of incoming requests to a service; the end-user who submitted request j ∈ J is denoted by u( j).

The service provider allocates up to m resources fromthe cloud provider in order to host its service and han-dle end-users’ requests, and its resource pool capacityat time t ∈ T is denoted by mt for each t ∈ T ; recallthat, with auto-scaling, mt may be different from mt′ fort , t′. We consider that mt ≥ b, where b ∈ N is defineda priori by the service provider; setting such a lowerbound is a common practice in the industry to preventresource shortage under low, yet bursty load. Moreover,wt represents the number of resources in use to han-dle requests at time t; note that wt < mt indicates thatmt − wt active resources are not being used by the ser-vice provider.

A request occupies one resource for ρ seconds in or-der to be processed, ρ ∈ N, and that request decom-position (i.e., execution of sub-parts in different ma-chines or times) and preemption are forbidden. Due tothe amount of time spent with queueing and schedul-ing operations, though, response times are almost al-ways higher than ρ. Based on previous interactions withthe service provider, the response time that end-user uexpects for each submitted request is given by ρ × βu,where βu ∈ R is a multiplicative factor larger or equalthan 1; we assume that βu does not change over time.The actual response time for request j is denoted by r j.

End-user u’s patience level at instant t is denoted byφt,u ∈ R+. If φt,u is below a certain threshold τu ∈ [0, 1),we say that u is unhappy. Based on the ideas introducedby Prospect Theory — where the sense of losing an op-portunity transcends the feeling of gaining it — φt,u

behaves as follows. First, for every request j, u( j)’s re-action to response time r j is given by

x j =ρ × βu( j)

r j,

where x j > 1 indicates that r j was smaller than ex-pected, thus making u( j) positively “surprised”; x j < 1

3

accounts for cases where the r j was greater, making u( j)“disappointed”; and x j = 1 when equality between ex-pected and actual response times hold. Considering thedifferent user reactions suggested by Prospect Theory,we assume in this work that patience level φt,u changesaccording to x j as follows:

φt,u =

α1,uφt−1,u + (1 − α1,u)x j, if x j > 1;α2,uφt−1,u + (1 − α2,u)x j, otherwise,

where 0 ≤ α1,u, α2,u ≤ 1. Exponential smoothing isused in both cases to update the patience level. In orderto make the immediate impact of slower response timeshigher than that of faster responses, we have α1,u > α2,u;moreover, different end-users may have different valuesof α1,u and α2,u. In summary, each end-user u is de-scribed by a tuple (βu, α1,u, α2,u, τu, φ0,u).

3.2. Auto-Scaling and Scheduling StrategiesIn order to set an auto-scaling strategy, a service

provider needs to define when an scaling operationshould take place and to select adequate step sizes,which are the number of resources by which the re-source pool capacity should extend or shrink.

Typically, resource utilisation is used to trigger scale-in and scale-out operations. Resource utilisation at timet, denoted by υt, is the ratio between the number wt

of resource hours effectively used to handle requestsand the number mt of resource hours allocated from thecloud (i.e., idle resources are not considered). The ser-vice provider sets parameters H and L, 0 ≤ L ≤ H ≤ 1,indicating utilisation thresholds according to which re-sources are activated and deactivated, respectively. Ob-serve that, irrespective of scale-in decisions, though, re-source pool capacity will never be inferior to b.

Ideally, resource pool capacity is modified beforeutilisation reaches undesired levels. For this reason, ser-vice providers typically rely on predictions of resourceutilisation. In this work, this estimation is based on themeasurements performed over the past i measurementintervals for some i ∈ N. Namely, after measuring υt

at time t, weighted exponential smoothing is used topredict the utilisation for step t + 1. If the past v ≤ imeasurements (i.e., υt−v, υt−v+1, . . . , υt) and the forecastutilisation are below (above) the lower (upper) thresh-old L (H), the scheduler triggers a scale-in (scale-out)operation.

We propose the following auto-scaling strategies:

• Utilisation Trigger (UT): a strategy that uses sys-tem utilisation and employs a formula to iden-tify an upper bound on step sizes for activat-ing/releasing resource capacity;

Table 1: Summary of the notation used in this article.

Symbol Description

T Discrete-time state-space model

U Set of end-users of the service

J Sequence of requests made by end-users

u( j) End-user who submitted request j

m Service provider’s maximum resource poolcapacity

mtService provider’s resource pool capacity attime t

b Service provider’s minimum resource poolcapacity, set a priori

wt Number of resources in use at time t

ρ Processing time of a request

βuFactor that determines the expected responsetime end-user u has after submitting requests

r j Actual response time for request j

φt,u End-user u’s patience level at time t

x j End-user u( j)’s reaction to response time r j

α1,u and α1,uParameters that define how the patience level ofend-user u varies over time

υt Resource utilisation at time t

L Lower bound for resource utilisation

H Upper bound for resource utilisation

i Number of intervals for predicting resourceutilisation

λ Percentage of m which equals step size

s Resource step size (i.e., number of resources toactivate/deactivate in an auto-scaling operation)

st Resource step size at time t

γIt determines the aggressiveness of auto-scalingstrategies when computing the step size st

k jNumber of responses with bad QoS u( j) willaccept before having patience limit exceeded

KtAverage value of k j for all the requests in theservice (either enqueued or in execution)

τu( j)Maximum response time a user u is willing toendure after receiving a response for j

atNumber of end-users accessing the service attime t

LtLower bound derived for step sizes in scenarioswhere m = ∞

UtUpper bound derived for step sizes in scenarioswhere m = ∞

lt System load at time t according to workloads

4

• Unbounded-resource-pool Utilisation Trig-ger (UUT): uses system utilisation and employs aformula to determine an upper bound and a lowerbound on step sizes for activating/releasing re-source pool capacity, based on previous work [11];

• Utilisation and Patience Trigger (UPT): extendsthe first strategy by using user patience to computedynamically a factor for multiplying the step sizessuggested by the formula;

• Unbounded-resource-pool Utilisation and Pa-tience Trigger (UUPT): extends the second strat-egy by using user patience in order to choose dy-namically a value from the step size interval de-fined by the formula.

The rest of this section details the main principles ofthe auto-scaling strategies and the scheduling algorithm.

3.2.1. Utilisation-based Auto-ScalingLet λ ∈ [0, 1] be the percentage of m by which the

resource pool extends or shrinks in each auto-scalingoperation, i.e., let s = bλ × mc. When setting utilisationthresholds L and H and step size s, service providerswant to prevent the occurrence of opposite auto-scalingoperations in consecutive steps. Such events take placein two situations, which can be avoided if we assumethat wt = wt+1 as we discuss below.

First, we have a scale-out operation taking place attime t and a scale-in operation taking place at time t + 1if wt

mt> H =⇒ wt > Hmt, wt+1

mt+λ×m < L =⇒ wt+1 <L × (mt + λ × m), and wt ≤ wt+1 hold simultaneously.This can be avoided if the following inequality holds:

Hmt > L × (mt + λ × m) =⇒

HL

>mt + λ × m

mt=⇒

HL

> 1 +λ × m

mt.

A similar analysis indicates how to avoid scale-in oper-ations at time t and scale-out operations at t + 1 for anyt ∈ T whenever wt ≥ wt+1. Namely, we want to avoidwtmt

< L =⇒ wt < Lmt and wtmt−λ×m > H =⇒ wt >

H × (mt − λ × m) holding simultaneously; for this, weneed

HL> 1 +

λ × mmt − λ × m

.

Moreover, we have thatλ × m

mt − λ × m>

λ × mmt

=⇒

mt > mt − λ × m =⇒

λ × m > 0,

so both inequalities are satisfied if wt = wt+1 and HL ≥

1 + λ×mmt−λ×m . As the service provider will keep at least b

machines active, we have that mt−λ×m ≥ b. Therefore,the parameters have to obey the relation below:

HL> 1 +

λ × mb

.

From the above, one derives an upper bound for the stepsize s = λ×m. This value is derived from borderline sce-narios, though, so it yields step sizes that are too largein most cases. Therefore, service providers can choosea multiplicative factor γ ∈ [0, 1] indicating the strat-egy’s “aggressiveness”; namely, given γ, the step sizewill be s × (1 − γ) and s × γ for scale-in and scale-outoperations, respectively.

Utilisation Trigger (UT) is an implementation of theideas discussed above; namely, in order to use UT, a ser-vice provider only needs to set utilisation thresholds Land H and factor γ. Observe that this strategy is basedsolely on utilisation and assumes that m is finite.

When a service provider is not subject to upperbounds on the number of resources it can allocate (i.e.,m = ∞), the formulae employed to derived upperbounds on step sizes used by UT cannot be applied.Under these circumstances, we employ Unbounded-resource-pool Utilisation Trigger (UUT), a strategy in-troduced in our previous work [11] in which the upperbound on s for scale-out operations is given by

s ≤ mtυt − L

Land the lower bound by

s ≥ mtυt − U

U,

whereas, for scale-in operations, the upper bound is

s ≤ mtL − υt

Land the lower bound is

s ≥ mtU − υt

U.

Let Lt and Ut be the lower and upper bounds derivedfrom the inequalities above at time t, respectively. Inaddition to Lt and Ut, UUT is also given an aggressive-ness factor γ ∈ [0, 1]. Given these parameters, for scale-in operations the value of st is

st = (1 − γ) × Lt + γ × Ut,

whereas for scale-out operations it is

st = γ × Lt + (1 − γ) × Ut.

More details are provided in previous work [11].

5

3.2.2. Patience-based Scheduling and Auto-ScalingLowest Patience First (LPF) is a scheduling algorithm

that orders arriving requests based on end-users’ pa-tience level as follows. If, at instant t, end-user u sub-mits request j and x j < 1, we have

φt+1,u = α1φt,u + (1 − α1)x j ≥ α1φt,u.

Since

limr j→∞

ρ × βu

r j= 0,

φt+1,u becomes closer to α1φt,u as r j → ∞. Thereforeα1φt,u is a lower bound for φt+1,u, and the accuracy ofthis approximation grows as system load (and, conse-quently, response times) gets larger. Using this bound,we can estimate the minimum value k j ∈ N for whichφt+k j,u becomes smaller than τu as follows:

αk j

1 φt,u ≥ τu =⇒

αk j

1 ≥τu

φt,u=⇒

k j ≥ logα1

(τu

φt,u

);

namely, value k j indicates the number of submissionsfor which u( j) will tolerate a bad QoS before hav-ing τu( j) surpassed.

LPF sorts the requests in the scheduling queue ac-cording to k j (smaller values first). In order to avoidstarvation, k j is set to zero if request j is waiting for aperiod of time that is at least twice as large as the currentaverage response time.

Under UT and UUT γ is fixed, so the service provideris forced to have the same behaviour (aggressive orconservative) in each auto-scaling operation. UPT andUUPT leverage UT and UUT, respectively, by employ-ing user patience to set dynamically the value of γ.

Let Kt denote the average value of k j for all the re-quests in the service provider (either enqueued or in ex-ecution) at time-step t. Whenever the system decidesthat an auto-scaling operation should be performed, asequence S containing the last k averages Kt is createdand has some percentage of its largest and smallest val-ues removed. From the remaining elements of S , thesystem takes the largest value K. Finally, the value of γwill be 1 − Kt

K and KtK for scale-out and scale-in opera-

tions, respectively.Essentially, UPT and UUPT adapt the system be-

haviour according to users’ current average-patiencelevel. Namely, if Kt is high, a more aggressive policyis acceptable. Conversely, if Kt is low, a conservativeapproach could support the improvement of QoS.

4. Evaluation

We conducted extensive computational experimentsin order to investigate the benefits and drawbacks of thealgorithms presented in Section 3. We modelled scenar-ios where the maximum number of resources can be ei-ther bounded or unbounded, and evaluated the followingcombinations of task scheduling and auto-scaling trig-gering strategies:

• FIFO+UT or FIFO+UUT: First-In, First-Out(FIFO) scheduling with auto-scaling consideringonly resource utilisation.

• LPF+UT or LPF+UUT: LPF scheduling withauto-scaling considering only resource utilisation.

• LPF+UPT or LPF+UUPT: LPF scheduling withauto-scaling considering resource utilisation, andend-users’ patience to define the resource step sizeof scale-out and scale-in operations.

Our comparisons employed the following metrics:

• Percentage of dissatisfactions: percentage of re-quests j whose response time r j led or kept end-user u( j)’s patience level below threshold τu( j);

• Percentage of QoS violations: percentage of re-quests whose response times were longer than theamount of time expected by their end-users;

• Allocated resource-time: resource capacity—in machine-seconds—allocated to serve user re-quests.

• Aggregate request slowdown: consists of the ac-cumulated sum of all delays suffered by requestsinvolved in QoS violations as is therefore given by∑

j∈J

max(r j − βu( j) × ρ, 0).

The rest of this section describes the experimentalsetup and discusses the obtained results.

4.1. Experimental Setup

A built-in-house discrete-event simulator was used toevaluate the performance of the proposed strategies. Wecrafted two types of workloads with variable hourly-load over a 24-hour period. The rationale behind theseworkloads, depicted in Figure 3, is as follows:

6

0 3 6 9 12 15 18 21 24

Hour of Day

0.0

0.2

0.4

0.6

0.8

1.0

Lo

ad(l

)

ScenarioNormal Peaky

Figure 3: The workloads considered in this work, where the hourlyload l is employed to determine the number of end-users accessingthe service.

• Normal day: consists of small peaks of utilisa-tion during the start, middle, and towards the endof working hours. Outside these intervals, butstill during working hours, the workload remainsaround the peak values, whereas outside the work-ing hours the load decreases significantly.

• Peaky day: consists of tipping workload peaks, ascenario in corporations where requests need to beexecuted only at a few moments of the day.

The workloads are employed to determine the num-ber of users accessing a service at a given time t. Every10 seconds, the number at of end-users accessing theservice is adjusted to

at = |U| × (lt + η),

where |U| denotes the maximum number of users whocan be accessing the service at any given moment, 0 ≥lt ≤ 1 is the load at time t according to the crafted work-loads (representing hence a fraction of |U|), and η isuniformly distributed over −0.05 to 0.05 in order to adda random variation of 5%; the sum lt + η is always setto the closest value in the interval [0,1]. For instance, ifat time t, |U| = 100, lt = 0.7 and η = 0.05, the num-ber of users accessing the system and making requestsis at = 75. For intermediate values of t, at = at−1.Unless otherwise noted, the experiments described hereuse |U| = 1, 000.

Requests have processing time ρ of 10 seconds,whereas end-user’s expected response time for requestj is ρ multiplied by βu( j) ≥ 1; e.g., if βu( j) = 1.3,the expected response time is 13 seconds. Requestinter-arrival times, or end-users “think time”—amount

of time in seconds separating the arrival of a responsefor request j and the submission of a new request by ac-tive end-user u( j)—is drawn uniformly from 0 to 100for each request submission, that is, it is not constantfor any end-user. The values of α1 and α2 are givenby 0.2 and 0.1 multiplied by numbers drawn uniformlyfor each end-user from [0.9, 1.1], respectively. Finally,each τu and φu,0 are random numbers drawn uniformlyfrom [0.45, 0.55] and [0.8, 1.0], respectively. These val-ues are used by the algorithms employed by the serviceprovider, so it can be assumed that they have been ob-tained or estimated based on historical data associatedwith each end-user.

We tested bounded scenarios in which maximum re-source pool capacity was an even value between 162to 180; in these cases, the initial capacity m0 and theminimum capacity b are set to 25% of the maximum.For unbounded scenarios, initial/minimum capacity isset to 45 resources. The rationale behind these values issimple. Whereas it is a common practice in the industryto set a minimum resource capacity for a service, we donot want this minimum capacity to be large enough toeclipse the impact of auto-scaling decisions.

The lower and upper target thresholds for resourceutilisation (i.e., L and H) are 40% and 70% respec-tively, which are default utilisation levels used by exist-ing auto-scaling services, such as Amazon Elastic Com-pute Cloud (EC2). Resource utilisation levels are mea-sured every 60 seconds. The value of γ is set to 0.25 forstrategies UT and UUT under bounded scenarios and to0.5 under unbounded scenarios, respectively.

Provisioning time in each scale-out operation is mea-sured in seconds and is drawn from a Normal distribu-tion with mean 180 and standard deviation 15; samplingis performed in an aggregate way, that is, a single valueis used by all resources involved in the same operation.These values were chosen based on existing work thatidentified that the mean time required to provision leanVM instances in certain clouds is often over 100 sec-onds [12]. Moreover, the fact that in a group of VMsthere are certain VMs that can boot slower and thatthe provision of VMs with Web server software stacktakes longer as demonstrated in the existing work, weadded extra 80 seconds. We performed experimentswith higher mean values (up to 10 minutes) and ob-tained similar conclusions for the proposed techniques(therefore, due to space constraints, we report only onresults involving provisioning time variable to 180±15).Deallocation time was not considered in this work be-cause it is typically much shorter than provisioning timeand its effects can only be observed in scenarios whereutilisation changes abruptly and quickly, which is not

7

the case of the workloads considered in this work.Finally, we simulated the equivalent of 4 days of ac-

tivities of a service provider in order to reduce biasesthat single executions could have brought to the finalresults.

4.2. Result Analysis

Bounded number of resources. We start discussingthe results of our experiments with scenarios where themaximum resource pool capacity is bounded. These re-sults are summarised in Figures 4, 5, 6, 7, 8, and 9.

162 164 166 168 170 172 174 176 178 180Number of Resources

0

1

2

3

4

5

6

7

8

%o

fQ

oS

Vio

lati

ons

FIFO+UT LPF+UT LPF+UPT

Figure 4: Bounded: Percentage of QoS violations for FIFO+UT,LPF+UT, LPF+UPT on the Normal workload.


0

2

4

6

8

10

12

14

16

%o

fQ

oS

Vio

lati

ons


Figure 5: Bounded: Percentage of QoS violations for FIFO+UT,LPF+UT, LPF+UPT on the Peaky workload.

Figures 4 and 5 present the percentage of requestswhose response times were larger than expected. Ingeneral, LPF+UPT is superior and to FIFO+UT in sce-narios with 162-170 and worse than FIFO+UT with172-180 resources, respectively, whereas LPF+UT is


0

1

2

3

4

5

6

Allo

cate

dR

eso

urce

Tim

e(s

)

×107 FIFO+UT LPF+UT LPF+UPT

Figure 6: Bounded: Comparison of resource-time usage forFIFO+UT, LPF+UT, LPF+UPT on the Normal workload.


0

1

2

3

4

5A

lloca

ted

Res

our

ceT

ime

(s)


Figure 7: Bounded: Comparison of resource-time usage forFIFO+UT, LPF+UT, LPF+UPT on the Peaky workload.


0

1

2

3

4

5

6

7

%o

fD

issa

tisf

acti

on


Figure 8: Bounded: Percentage of user dissatisfaction for FIFO+UT,LPF+UT, LPF+UPT on the Normal workload.

better than both in all scenarios with respect to this met-ric. Poor results from FIFO+UT are not surprising, as

8


0

2

4

6

8

10

12

14

16

%o

fD

issa

tisf

acti

on


Figure 9: Bounded: Percentage of user dissatisfaction for FIFO+UT,LPF+UT, LPF+UPT on the Peaky workload.

it is completely oblivious to user patience. The compar-ison between LPF+UT and LPF+UPT is less straight-forward; LPF+UT leads to fewer QoS violations thanLPF+UPT, but Figures 6 and 7 show that the later com-pensates this with significant reductions on allocated re-source time (approximately 9%).

Additionally, results of LPF+UPT on percentage ofQoS violations and on percentage of dissatisfactions forscenarios with 170-180 resources show the effects ofmaking γ dynamic. Namely, in scenarios where themaximum number of resources is large enough to sat-isfy demand, users’ patience level are more likely to stayhigh. Whenever auto-scaling operations have to be per-formed, the algorithm assigns “aggressive” values to γthat enforce the allocation (deallocation) of a very small(large) number of machines in scale-out (scale-in) op-erations. As a result, response time for a certain num-ber of requests may be larger than expected in situationswhere such delays could have been avoided. However,the results presented in Figures 8 and 9 for 172-180 re-sources suggest that it is possible to provide insufficientQoS for approximately 1-2% of all requests without per-ceptible damages to user patience in such scenarios (asillustrated by LPF+UPT in Figures 4 and 5).

Figures 8 and 9 show a clear superiority (i.e., lowerpercentages) of LPF+UT and a clear inferiority ofFIFO+UT with respect to end-user dissatisfaction. Thelatter phenomenon can be again explained by the factthat FIFO does not take patience into account. More-over, differences between LPF+UT and LPF+UPT areagain counterbalanced by differences on the volume ofresource-time usage for each configuration. The resultsof scenarios with 174-180 resources are inconclusive,as the percentage of dissatisfied users in these cases is

insignificant; because queues are almost never formedin these situations, the scheduling strategies basicallydeliver the same solutions and auto-scaling algorithmshave enough time to adapt the resources’ pool.

Finally, the results show that keeping user patiencein satisfactory levels is more challenging for the Peakyworkload. The overall resources demand on both sce-narios are different, but whereas differences on thevolume of allocated resources are relatively low (ap-proximately 10%), the percentages of QoS violationsand dissatisfactions differ by a factor of 2. Peakyworkloads model situations where resource demand in-creases abruptly, and under these circumstances, thenumber of QoS violations tend do increase consider-ably. Therefore, it is natural to expect lower user sat-isfaction levels in such scenarios.Unbounded number of resources. We present now re-sults involving the simulation of scenarios where ser-vice providers could provision as many resources asthey wished. These results are summarised in Fig-ures 10, 11, 12.

normal peaky

Workload Scenarios

0.00

0.02

0.04

0.06

0.08

0.10

%o

fQ

oS

Vio

lati

ons


Figure 10: Unbounded: Percentage of QoS violations for FIFO+UT,LPF+UT, LPF+UPT.

As expected, the number of QoS violations is consid-erably smaller in these scenarios. Figure 10 shows thatthe occurrence of such events is extremely rare in bothworkloads (< 1%). One can also observe a potentiallystatistically negligible, but nevertheless interesting, phe-nomenon in this figure: all the strategies yielded moreQoS violations in the Normal workload than in Peaky.Moreover, differences on the percentage of dissatisfac-tion presented in Figure 12 reinforces this apparent con-tradiction. The explanation for these results lay in anobservation that was irrelevant in scenarios where QoSviolations take place frequently: although its changesare more abrupt, Peaky is smoother, in the sense that

9

normal peaky

Workload Scenarios

0

1

2

3

4

5

6

7

Allo

cate

dR

eso

urce

Tim

e(s

)


Figure 11: Unbounded: Comparison of resource-time usage forFIFO+UT, LPF+UT, LPF+UPT.

normal peaky

Workload Scenarios

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

%o

fD

issa

tisf

acti

on


Figure 12: Unbounded: Percentage of user dissatisfaction forFIFO+UT, LPF+UT, LPF+UPT.

normal peaky

Workload Scenarios

0

5

10

15

20

25

30

35

40

Tim

e(s

)

FIFO+UT LPF+UT LPF+UPTx105

Figure 13: Unbounded: Aggregate request slowdown for FIFO+UT,LPF+UT, LPF+UPT.

modifications on its upwards/downward orientation areless frequent than in Normal (6 against 15); QoS vi-

olations are more likely to occur during these shifts,which justifies the observed difference. Additionally,Figure 10 shows that FIFO+UT delivered less QoS vio-lations than the other two strategies for the Peaky work-load; this happened because patience-aware strategiesfocus on the minimization of user dissatisfaction ratherthan QoS violations, and sacrificing the latter may benecessary in order to excel on the first.

Conversely, results presented in Figure 12 suggestthat the impact of QoS violations on end-user patienceon Normal was considerably larger than on Peaky. Di-rect comparison with Figure 10 suggests that the num-ber of violations play a major role on this phenomenon,so we investigated the aggregate request slowdown.

The results are presented in Figure 13 and showthat the values for Peaky and Normal are close; con-sequently, individual QoS violations typically involvedmuch larger delays on Peaky than on Normal. Fromthis, we conclude that having several small delays ismore harmful to user dissatisfaction than a few longdelays. Finally, Figure 11 reinforces the benefit of us-ing a patience-aware auto-scaling triggering strategy.Whereas user dissatisfaction levels for all configurationsare basically the same in both workloads, LPF+UPTconsumed 9% less resources than the other strategies.

To evaluate the impact of provisioning time on theauto-scaling strategies described in this work, we per-formed an experiment with provisioning time of 10 ± 5seconds and 10,000 as a maximum number of end-users— that is, |U| = 10, 000 when computing the number ofusers in the system at at time t. The minimum numberof resources, b, was set to 450. This scenario presentsdemand peaks that exceed 2,000 VMs. Figure 14 andFigure 15 summarise the results for allocated-resourcetime and percentage of dissatisfactions, respectively.

We can observe that the patterns identified in previ-ous experiments hold under this last scenario; there isa reduction in the amount of allocated resource timeresulting into a small increase in the number of re-quests that suffer from insufficient QoS. Further inves-tigation revealed that this is the case because, althoughresources can be provisioned quickly, LPF delays auto-scaling operations by exploiting users patience levels,eventually leading to dissatisfactions as resource utili-sation and triggering of auto-scaling decisions are takenat discrete intervals. Although the patience-based auto-scaling strategy could be made more responsive to vari-ations in resource utilisation, that would lead to an in-crease in the cost with infrastructure as resources wouldbe activated and released more often, and eventuallyrender the consideration of users’ patience levels inauto-scaling unusable.

10

normal peaky

Workload Scenarios

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

Allo

cate

dR

eso

urce

Tim

e(s

)


Figure 14: Unbounded: Comparison of resource-time usage forFIFO+UT, LPF+UT, LPF+UPT under provisioning time of 10±5 sec-onds and |U| = 10, 000 users.

normal peaky

Workload Scenarios

0.00

0.02

0.04

0.06

0.08

0.10

0.12

%o

fD

issa

tisf

acti

on


Figure 15: Unbounded: Percentage of user dissatisfaction forFIFO+UT, LPF+UT, LPF+UPT under provisioning time of 10±5 sec-onds and |U| = 10, 000 users.

5. Related Work

Projects related to our work fall into categories ofscheduling, user behaviour, and cloud computing auto-scaling. Scheduling is a well-studied topic in severaldomains for which a large number of theoretical prob-lems, solutions, heuristics, and practical applicationshave been considered [13, 14]. Some of the used al-gorithms include FIFO, priority-based, deadline-driven,hybrid approaches that use backfilling techniques [15],among others [16, 17]. In addition to priorities anddeadlines, other factors have been considered, such asfairness [18], energy-consumption [19], and context-awareness [9]. Moreover, utility functions were usedto model how the importance of results to users variesover time [20, 21].

User behaviour has been explored for optimising re-

source management in the context of Web caching andpage pre-fetching [22, 23, 24, 25, 26, 27]. The goal inprevious work was to understand how users access Webpages, investigate how tolerant users are to delays, andpre-fetch or modify page content to enhance user expe-rience. Techniques in this area focus mostly on adaptingWeb content and minimising response time of user re-quests. Service research has also investigated the impactof delays in users’ behaviour. For instance, Taylor de-scribed the concept of delays and surveyed passengersaffected by delayed flights in order to test various hy-potheses [28]. Brown et al. and Gans et al. investigatedthe impact of service delays in call centres [29, 30]. Inbehavioural economics, Kahneman and Tversky [10] in-troduced Prospect Theory to model how people makechoices in situations that involve risk or uncertainty.Netto et al. introduced a scheduling strategy that con-siders information on how quickly users consume re-sults generated by service providers [7]. Our previouswork investigated the scheduling of user requests con-sidering their patience and expectations, but with noauto-scaling of cloud resources [6].

Shen et al. introduced a system for automating elas-tic resource scaling for cloud computing [31]. Theirsystem does not require prior knowledge on the appli-cations running in the cloud. Other projects considerauto-scaling in different scenarios and proposed severalapproaches, such as modifying the number of resourcesallocated for running MapReduce applications [32, 33],comparing vertical versus horizontal auto-scaling [34],minimising operational costs [35], and providing inte-ger model based auto-scaling [36] .

Unlike previous work, our proposed auto-scalingtechnique considers information on user patience whileinteracting with a service offered by a service providerusing resources from a cloud infrastructure.

6. Conclusions and Future Work

In this article, we extended our investigation onpatience-based algorithms for auto-scaling strategies.Traditional mechanisms are based solely on resourceutilisation information and other system metrics, ignor-ing actual users’ needs with respect to their desiredQoS levels. As a consequence, resources are over-provisioned under scenarios where they are not strictlyneeded; that is, where additional resources do not leadto significant improvement in QoS.

We conducted extensive experiments using FIFO andLowest Patience First as scheduling algorithms, andauto-scaling triggers that rely either on utilisation or

11

user patience and utilisation. Our experiments consid-ered scenarios where the maximum number of resourcescould be either bounded or unbounded, and differentversions of the auto-scaling algorithms were designedto cope with both possibilities. Our experiments consid-ered non-trivial (i.e., larger than 0) provisioning time.

Experimental results suggest that patience-basedstrategies can indeed provide significant economic gainsto service providers. Namely, we observed that thesestrategies were able to reduce resource-time usage byapproximately 9% while keeping QoS at the same levelas those based solely on utilisation. We conclude thatthe identification of unnecessary buffers in QoS lev-els that will not necessarily improve user experiencein cloud environments is a key element for serviceproviders, as it may enable savings in infrastructurecosts and bring competitive advantage for adopters.

We believe the proposed strategies fill an importantgap in the literature. As sensors and Internet of Thingsbecome more pervasive, more data will be available thatcan be used to determine factors that influence users’ pa-tience. By using this data, one could certainly specialiseand improve the techniques presented here. With re-spect to resource management, the strategies presentedin this work handled horizontal resource elasticity andassumed that resources are pairwise indistinguishable.In future work, we aim to relax such assumptions, han-dling vertical elasticity and the allocation of multipletypes of virtual machine instances.

Acknowledgements

Some experiments presented in this paper were car-ried out using the Grid’5000 experimental testbed, be-ing developed under the Inria ALADDIN developmentaction with support from CNRS, RENATER and sev-eral Universities, as well as other funding bodies (seehttps://www.grid5000.fr). This work has also been sup-ported and partially funded by FINEP / MCTI, undersubcontract no. 03.14.0062.00.

References

[1] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph,R. H. Katz, A. Konwinski, G. Lee, D. A. Pat-terson, A. Rabkin, I. Stoica, M. Zaharia, Abovethe clouds: A Berkeley view of Cloud comput-ing, Technical report UCB/EECS-2009-28, Elec-trical Engineering and Computer Sciences, Uni-versity of California at Berkeley, Berkeley, USA(February 2009).

[2] R. Barga, D. Gannon, D. Reed, The client and thecloud: Democratizing research computing, IEEEInternet Computing 15 (1) (2011) 72–75.

[3] J. Ramsay, A. Barbesi, J. Preece, A psycholog-ical investigation of long retrieval times on theworld wide web, Interacting with computers 10 (1)(1998) 77–86.

[4] The communications market report: United king-dom (August 2013).

[5] F. Venturini, Five insights into consumers’ onlinevideo viewing and buying habits, Outlook, Pointof View (July 2013).

[6] C. Cardonha, M. D. Assuncao, M. A. S. Netto,R. L. F. Cunha, C. Queiroz, Patience-awarescheduling for cloud services: Freeing users fromthe chains of boredom, in: 11th InternationalConference on Service Oriented Computing (IC-SOC’13), 2013.

[7] M. A. S. Netto, et al., Leveraging attention scarcityto improve the overall user experience of cloudservices, in: Int. Conf. on Network and ServiceManagement (CNSM’13), 2013.

[8] R. L. F. Cunha, M. D. de Assuncao, C. Cardonha,M. A. S. Netto, Exploiting user patience for scal-ing resource capacity in cloud services, in: 7thIEEE International Conference on Cloud Comput-ing (IEEE Cloud 2014), IEEE, 2014, pp. 448–455.

[9] M. D. de Assuncao, M. A. S. Netto, F. Koch,S. Bianchi, Context-aware job scheduling forcloud computing environments, in: IEEE/ACMFifth International Conference on Utility andCloud Computing (UCC 2012) Workshops, IEEEComputer Society, Washington, USA, 2012, pp.255–262. doi:10.1109/UCC.2012.33.URL http://dx.doi.org/10.1109/UCC.

2012.33

[10] D. Kahneman, A. Tversky, Prospect theory: Ananalysis of decision under risk, Econometrica:Journal of the Econometric Society (1979) 263–291.

[11] M. A. S. Netto, C. Cardonha, R. L. F. Cunha,M. D. de Assuncao, Evaluating auto-scalingstrategies for cloud computing environments, in:22nd IEEE Int. Symp. on Modeling, Analysis andSimulation of Computer and TelecommunicationSystems (MASCOTS 2014), IEEE, 2014.

[12] M. Mao, M. Humphrey, A performance studyon the VM startup time in the cloud, in: IEEE5th International Conference on Cloud Computing(CLOUD 2012), 2012, pp. 423–430.

[13] J. Blazewicz, K. Ecke, G. Schmidt, J. Weglarz,Scheduling in Computer and Manufacturing Sys-tems, 2nd Edition, 1994.12

[14] M. L. Pinedo, Planning and Scheduling in Manu-facturing and Services, 2nd Edition, 2007.

[15] D. Tsafrir, Y. Etsion, D. G. Feitelson, Backfill-ing using system-generated predictions rather thanuser runtime estimates, IEEE Transactions on Par-allel and Distributed Systems 18 (6) (2007) 789–803.

[16] T. D. Braun, et al., A comparison of eleven staticheuristics for mapping a class of independenttasks onto heterogeneous distributed computingsystems, Journal of Parallel and Distributed com-puting.

[17] D. G. Feitelson, L. Rudolph, U. Schwiegelshohn,K. C. Sevcik, P. Wong, Theory and practicein parallel job scheduling, in: Workshop onJob Scheduling Strategies for Parallel Processing(JSSPP’97), 1997.

[18] N. D. Doulamis, A. D. Doulamis, E. A. Varvari-gos, T. A. Varvarigou, Fair scheduling algorithmsin grids, IEEE Transactions on Parallel and Dis-tributed Systems 18 (11) (2007) 1630–1648.

[19] J.-F. Pineau, Y. Robert, F. Vivien, Energy-aware scheduling of bag-of-tasks applications onmaster–worker platforms, Concurrency and Com-putation: Practice and Experience 23 (2) (2011)145–157.

[20] C. B. Lee, A. Snavely, Precise and realistic utilityfunctions for user-centric performance analysis ofschedulers, in: Int. Symp. on High-PerformanceDistributed Computing (HPDC’07), 2007.

[21] A. AuYoung, et al., Service contracts and ag-gregate utility functions, in: 15th IEEE Int.Symp. on High Performance Distributed Comput-ing (HPDC’06), 2006.

[22] D. F. Galletta, R. M. Henry, S. McCoy, P. Polak,Web site delays: How tolerant are users?, Journalof the Association for Information Systems 5 (1)(2004) 1–28.

[23] F. Alt, A. Sahami Shirazi, A. Schmidt, R. Atterer,Bridging waiting times on web pages, in: 14th Int.Conf. on Human-computer interaction with mo-bile devices and services (MobileHCI’12), ACM,New York, NY, USA, 2012, pp. 305–308.

[24] R. Atterer, M. Wnuk, A. Schmidt, Knowing theuser’s every move: user activity tracking forwebsite usability evaluation and implicit interac-tion, in: 15th Int. Conf. on World Wide Web(WWW’06), ACM, New York, NY, USA, 2006,pp. 203–212.

[25] C. R. Cunha, C. F. B. Jaccoud, Determiningwww user’s next access and its application to pre-

fetching, in: 2nd IEEE Symp. on Computers andCommunications (ISCC ’97), Washington, DC,USA, 1997, pp. 6–.

[26] N. Bhatti, A. Bouch, A. Kuchinsky, Integrat-ing user-perceived quality into web server design,Computer Networks 33 (1) (2000) 1–16.

[27] A. Bouch, A. Kuchinsky, N. Bhatti, Quality is inthe eye of the beholder: meeting users’ require-ments for internet quality of service, in: SIGCHIconference on Human factors in computing sys-tems, ACM, 2000, pp. 297–304.

[28] S. Taylor, Waiting for service: the relationship be-tween delays and evaluations of service, The Jour-nal of Marketing (1994) 56–69.

[29] L. Brown, N. Gans, A. Mandelbaum, A. Sakov,H. Shen, S. Zeltyn, L. Zhao, Statistical analysis ofa telephone call center: A queueing-science per-spective, Journal of the American Statistical Asso-ciation 100 (2005) 36–50.

[30] N. Gans, G. Koole, A. Mandelbaum, Tele-phone call centers: Tutorial, review, and researchprospects, Manufacturing & Service OperationsManagement 5 (2) (2003) 79–141.

[31] Z. Shen, S. Subbiah, X. Gu, J. Wilkes, Cloudscale:elastic resource scaling for multi-tenant cloud sys-tems, in: 2nd ACM Symposium on Cloud Com-puting, ACM, 2011, p. 5.

[32] Y. Chen, S. Alspaugh, R. Katz, Interactive ana-lytical processing in big data systems: A cross-industry study of mapreduce workloads, Proc.VLDB Endow.

[33] Z. Fadika, M. Govindaraju, DELMA: Dynami-cally ELastic MapReduce Framework for CPU-Intensive Applications, in: 11th IEEE/ACM Inter-national Symposium on Cluster, Cloud and GridComputing (CCGrid’11), 2011, pp. 454–463.

[34] M. Sedaghat, F. Hernandez-Rodriguez, E. Elm-roth, A virtual machine re-packing approach to thehorizontal vs. vertical elasticity trade-off for cloudautoscaling, in: ACM Cloud and Autonomic Com-puting Conference (CAC’13), ACM, 2013.

[35] M. Mao, M. Humphrey, Auto-scaling to minimizecost and meet application deadlines in cloud work-flows, in: Int. Conf. for High Performance Com-puting, Networking, Storage and Analysis (SC),IEEE, 2011, pp. 1–12.

[36] M. Mao, J. Li, M. Humphrey, Cloud auto-scaling with deadline and budget constraints, in:11th IEEE/ACM International Conference on GridComputing (GRID), IEEE/ACM, 2010, pp. 41–48.

13

impact of user patience on auto-scaling resource capacity for … · 2020. 8. 13. · impact of...

Documents