dynamic task offloading and resource allocation...

0090-6778 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCOMM.2019.2898573, IEEETransactions on Communications

1

Dynamic Task Offloading and Resource Allocationfor Ultra-Reliable Low-Latency Edge Computing

Chen-Feng Liu, Student Member, IEEE, Mehdi Bennis, Senior Member, IEEE,Mérouane Debbah, Fellow, IEEE, and H. Vincent Poor, Fellow, IEEE

(Invited Paper)

Abstract—To overcome devices’ limitations in performingcomputation-intense applications, mobile edge computing (MEC)enables users to offload tasks to proximal MEC servers forfaster task computation. However, current MEC system designis based on average-based metrics, which fails to account forthe ultra-reliable low-latency requirements in mission-criticalapplications. To tackle this, this paper proposes a new systemdesign, where probabilistic and statistical constraints are imposedon task queue lengths, by applying extreme value theory. The aimis to minimize users’ power consumption while trading off theallocated resources for local computation and task offloading. Dueto wireless channel dynamics, users are re-associated to MECservers in order to offload tasks using higher rates or accessingproximal servers. In this regard, a user-server association policyis proposed, taking into account the channel quality as well as theservers’ computation capabilities and workloads. By marryingtools from Lyapunov optimization and matching theory, a two-timescale mechanism is proposed, where a user-server associationis solved in the long timescale while a dynamic task offloadingand resource allocation policy is executed in the short timescale.Simulation results corroborate the effectiveness of the proposedapproach by guaranteeing highly-reliable task computation andlower delay performance, compared to several baselines.

Index Terms—5G and beyond, mobile edge computing (MEC),fog networking and computing, ultra-reliable low latency com-munications (URLLC), extreme value theory.

I. INTRODUCTION

MOTIVATED by the surging traffic demands spurred byonline video and Internet-of-things (IoT) applications,

including machine type and mission-critical communication(e.g., augmented/virtual reality (AR/VR) and drones), mobileedge computing (MEC)/fog computing are emerging tech-nologies that distribute computations, communication, control,and storage at the network edge [2]–[6]. When executing

This work was supported in part by the Academy of Finland projectCARMA, in part by the Academy of Finland project MISSION, in partby the Academy of Finland project SMARTER, in part by the INFOTECHproject NOOR, in part by the Nokia Bell-Labs project FOGGY, in part by theNokia Foundation, and in part by the U.S. National Science Foundation underGrant CNS-1702808. This paper was presented in part at the IEEE GlobalCommunications Conference Workshops, Singapore, December 2017 [1].

C.-F. Liu and M. Bennis are with the Centre for Wireless Communications,University of Oulu, 90014 Oulu, Finland (e-mail: [email protected];[email protected]).

M. Debbah is with the Large Networks and System Group, Cen-traleSupélec, Université Paris–Saclay, 91192 Gif-sur-Yvette, France, andalso with the Mathematical and Algorithmic Sciences Laboratory, HuaweiFrance Research and Development, 92100 Paris, France (e-mail: [email protected]).

H. V. Poor is with the Department of Electrical Engineering, PrincetonUniversity, Princeton, NJ 08544, USA (e-mail: [email protected]).

the computation-intensive applications at mobile devices, theperformance and user’s quality of experience are significantlyaffected by the device’s limited computation capability. Addi-tionally, intensive computations are energy-consuming whichseverely shortens the lifetime of battery-limited devices. Toaddress the computation and energy issues, mobile devicescan wirelessly offload their tasks to proximal MEC servers.On the other hand, offloading tasks incurs additional latencywhich cannot be overlooked and should be taken into accountin the system design. Hence, the energy-delay tradeoff hasreceived significant attention and has been studied in variousMEC systems [7]–[22].

A. Related Work

In [7], Kwak et al. focused on an energy minimization prob-lem for local computation and task offloading in a single-userMEC system. The authors further studied a multi-user system,which takes into account both the energy cost and monetarycost of task offloading [8]. Therein, the cost-delay tradeoff wasinvestigated in terms of competition and cooperation amongusers and offloading service provider. Additionally, the work[9] considered the single-user system and assumed that themobile device is endowed with a multi-core central processunit (CPU) to compute different applications simultaneously.In order to stabilize all task queues at the mobile device andMEC server, the dynamic task offloading and resource alloca-tion policies were proposed by utilizing Lyapunov stochasticoptimization in [7]–[9]. Assuming that the MEC server isequipped with multiple CPU cores to compute different users’offloaded tasks in parallel, Mao et al. [10] studied a multi-user task offloading and bandwidth allocation problem. Subjectto the stability of task queues, the energy-delay tradeoffwas investigated using the Lyapunov framework. Extendingthe problem of [10], the authors further took into accountthe server’s power consumption and resource allocation inthe system analysis [11]. In [12], a wireless powered MECnetwork was considered in which multiple users, without fixedenergy supply, are wirelessly powered by a power beaconto carry out local computation and task offloading. Takinginto account the causality of the harvested energy, this work[12] aimed at maximizing energy efficiency subject to thestability of users’ task queues. Therein, the tradeoff betweenenergy efficiency and average execution delay was analyzedby stochastic optimization. Xu et al. studied another energyharvesting MEC scenario, in which the edge servers are mainly



2

powered by solar or wind energy, whereas the cloud server hasa constant grid power [13]. Aiming at minimizing the long-term expected cost which incorporates the end-to-end delayand operational cost, the authors proposed a reinforcementlearning-based resource provisioning and workload offloading(to the cloud) to edge servers. Besides the transmission andcomputation delays, the work [14] took into account the cost(in terms of delay) of handover and computation migration,due to user mobility, in an ultra-dense network. Taking into thelong-term available energy constraint, an online energy-awarebase station association and handover algorithm was proposedto minimize the average end-to-end delay by incorporatingLyapunov optimization and multi-armed bandit theory [14]. Koet al. [15] analyzed the average latency performance, includingcommunication delay and computation delay, of a large-scalespatially random MEC network. Furthermore, an upper anda lower bound [15] on the average computation delay werederived by applying stochastic geometry and queuing theory.A hybrid cloud-fog architecture was considered in [16]. Thedelay-tolerable computation workloads, requested by the endusers, are dispatched from the fog devices to the cloudservers when delay-sensitive workloads are computed at thefog devices. The studied problem was cast as a network-widepower minimization subject to an average delay requirement[16]. Focusing on the cloud-fog architecture, Lee et al. [17]studied a scenario in which a fog node distributes the offloadedtasks to the connected fog nodes and a remote cloud serverfor cooperative computation. To address the uncertainty ofthe arrival of neighboring fog nodes, an online fog networkformation algorithm was proposed such that the maximal aver-age latency among different computation nodes is minimized[17]. Considering a hierarchical cloudlet architecture, Fan andAnsari [18] proposed a workload allocation (among differentcloudlet tiers) and computational resource allocation approachin order to minimize the average response time of a taskrequest. The authors further focused on an edge computing-based IoT network in which each user equipment (UE) canrun several IoT applications [19]. Therein, the objective wasto minimize the average response time subject to the delayrequirements of different applications. In [20], a distributedworkload balancing scheme was proposed for fog computing-empowered IoT networks. Based on the broadcast informationof fog nodes’ estimated traffic and computation loads, eachIoT device locally chooses the associated fog node in order toreduce the average latency of its data flow. In addition to thetask uploading and computation phases, the work [21] alsoaccounted for the delay in the downlink phase, where thecomputed tasks are fed back to the users. The objective wasto minimize a cost function of the estimated average delaysof the three phases. The authors in [22] studied a software-defined fog network, where the data service subscribers (DSSs)purchase the fog nodes’ computation resources via the dataservice operators. Modeling the average latency using queuingtheory in the DSS’s utility, a Stackelberg game and a many-to-many matching game were incorporated to allocate fog nodes’resources to the DSSs [22].

B. Our Contribution

While conventional communication networks were engi-neered to boost network capacity, little attention has beenpaid to reliability and latency performance. Indeed, ultra-reliable and low latency communication (URLLC) is one of thepillars for enabling 5G and is currently receiving significantattention in both academia and industry [23]–[25]. Regardingthe existing MEC literature, the vast majority considers theaverage delay as a performance metric or the quality-of-servicerequirement [13]–[22]. In other words, these system designsfocus on latency through the lens of the average. In the worksaddressing the stochastic nature of the task arrival process [7]–[12], their prime concern is how to maintain the mean ratestability of task queues, i.e., ensuring a finite average queuelength as time evolves [26]. However, merely focusing onthe average-based performance is not sufficient to guaranteeURLLC for mission-critical applications, which mandates afurther examination in terms of bound violation probability,high-order statistics, characterization of the extreme eventswith very low occurrence probabilities, and so forth [24].

The main contribution of this work is to propose a URLLC-centric task offloading and resource allocation framework, bytaking into account the statistics of extreme queue lengthevents. We consider a multi-user MEC architecture withmultiple servers having heterogeneous computation resources.Due to the UE’s limited computation capability and theadditional incurred latency during task offloading, the UEsneed to smartly allocate resources for local computation andthe amount of tasks to offload via wireless transmission if theexecuted applications are latency-sensitive or mission-critical.Since the queue value is implicitly related to delay, we treatthe former as a delay metric in this work. Motivated by theaforementioned drawbacks of average-based designs, we seta threshold for the queue length and impose a probabilisticrequirement on the threshold deviation as a URLLC constraint.In order to model the event of threshold deviation, we char-acterize its statistics by invoking extreme value theory [27]and impose another URLLC constraint in terms of higher-order statistics. The problem is cast as a network-wide powerminimization problem for task computation and offloading,subject to statistical URLLC constraints on the thresholddeviation and extreme queue length events. Furthermore, weincorporate the UEs’ mobility feature and propose a two-timescale UE-server association and task computation frame-work. In this regard, taking into account task queue stateinformation, servers’ computation capabilities and workloads,co-channel interference, and URLLC constraints, we associatethe UEs with the MEC servers, in a long timescale, by utilizingmatching theory [28]. Then, given the associated MEC server,task offloading and resource allocation are performed in theshort timescale. To this end, we leverage Lyapunov stochasticoptimization [26] to deal with the randomness of task ar-rivals, wireless channels, and task queue values. Simulationresults show that considering the statistics of the extremequeue length as a reliability measure, the studied partially-offloading scheme includes more reliable task execution thanthe scheme without MEC servers and the fully-offloading



3

Table ISUMMARY OF NOTATIONS

Notation Definition Notation Definition Notation DefinitionAi UE i’s task arrivals AL

i Split tasks for local computation AOi Split tasks for offloading

Aunit Unit task ALi

Moving time-averaged Split tasksfor local computation AO

i

Moving time-averaged Split tasksfor offloading

dLi Queue length bound for QLi dOi Queue length bound for QO

i dji Queue length bound for Zji

fi UE i’s CPU-cycle frequency fjiMEC server j’s allocated CPU-cycle frequency for UE i

fmaxj

MEC server j’s computation capa-bility per CPU core

fNetwork-wide CPU-cyclefrequency vector fj

MEC server j’s allocated CPU-cycle frequency vector hij

Channel gain between UE i toMEC server j

Iij Aggregate interference to hij Li UE i’s required processing density L Lyapunov functionn Time frame index N0 Power spectral density of AWGN Nj MEC server j’s CPU core number

Pi UE i’s transmit power Pmaxi UE i’s power budget PC

iUE i’s long-term time-averagedcomputation power

PTi

UE i’s long-term time-averagedtransmit power P

Network-wide transmit power vec-tor Pr(Iij) Estimated distribution of Iij

QLi

UE i’s local-computation queuelength QO

i UE i’s task-offloading queue length QL,(X)i

Virtual queue length for (15)

QL,(Y)i

Virtual queue length for (16) QO,(X)i

Virtual queue length for (17) QO,(Y)i


Q(X)ji

Virtual queue length for (20) Q(Y)ji

Virtual queue length for (21) QL,(Q)i


QO,(Q)i

Virtual queue length for (29) Q(Z)ji

Virtual queue length for (30) Q Combined queue vector

RijTransmission rate from UE i toMEC server j Rij

Moving time-averaged transmis-sion rate from UE i to MEC serverj

Rmaxij

Maximum offloading rate from UEi to MEC server j

S Set of MEC servers S Number of MEC servers t Time slot indexT0 Time frame length T Time frame U Set of UEsU Number of UEs V Lyapunov optimization parameter W MEC server’s dedicated bandwidth

XLi Excess value of QL

i XLi

Long-term time-averaged condi-tional expectation of XL

iXOi Excess value of QO

i

XOi

Long-term time-averaged condi-tional expectation of XO

iXji Excess value of Zji Xji

Long-term time-averaged condi-tional expectation of Xji

Y Li Square of XL

i Y Li

Long-term time-averaged condi-tional expectation of Y L

iY Oi Square of XO

i

Y Oi

Long-term time-averaged condi-tional expectation of Y O

iYji Square of Xji Yji

Long-term time-averaged condi-tional expectation of Yji

ZjiMEC server j’s offloaded-taskqueue for UE i

βLi Related weight of QL

i βOi Related weight of QO

i

βij Related weight of Zji βOi Estimated average of βO

i βj Estimated average of βji

ηijAssociation indicator between UEi and MEC server j η Network-wide association vector εLi

Tolerable bound violation probabil-ity for QL

i

εOiTolerable bound violation probabil-ity for QO

iεji

Tolerable bound violation probabil-ity for Zji

κ Computation power parameter

λi UE i’s average task arrival rate Ψi(η) UE i’s utility under a matching η Ψi(η)MEC server j’s utility under amatching η

σLi GPD scale parameter of XL

i σL,thi

Threshold for σLi σO,th

i

Threshold for the GPD scale pa-rameter of XO

i

σthji

Threshold for the GPD scale pa-rameter of Xji

τ Time slot length ξLi GPD shape parameter of XLi

ξL,thiThreshold for ξLi ξO,thi

Threshold for the GPD shape pa-rameter of XO

iξthji

Threshold on the GPD shape pa-rameter of Xji

scheme. In contrast with the received signal strength (RSS)-based baseline, our proposed UE-server association approachachieves better delay performance for heterogeneous MECserver architectures. The performance enhancement is moreremarkable in denser networks.

The remainder of this paper is organized as follows. Thesystem model is first specified in Section II. Subsequently, weformulate the latency requirements, reliability constraints, andthe studied optimization problem in Section III. In SectionIV, we detailedly specify the proposed UE-server associationmechanism as well as the latency and reliability-aware taskoffloading and resource allocation framework. The networkperformance is evaluated numerically and discussed in SectionV which is followed by Section VI for conclusions. Further-

more, for the sake of readability, we list all notations in TableI. The meaning of the notations will be detailedly defined inthe following sections.

II. SYSTEM MODEL

The considered MEC network consists of a set U of UUEs and a set S of S MEC servers. UEs have computationcapabilities to execute their own tasks locally. However, due tothe limited computation capabilities to execute computation-intense applications, UEs can wirelessly offload their tasks tothe MEC servers with an additional cost of communicationlatency. The MEC servers are equipped with multi-core CPUssuch that different UEs’ offloaded tasks can be computedin parallel. Additionally, the computation and communication



4

Local-computation queue and

task-offloading queue

𝑍"#(𝑡)𝜏𝑓"# 𝑡𝐿#

min𝑄#+ 𝑡 + 𝐴#+(𝑡), 𝜏𝑅#"(𝑡)

Server’s task computation rateOffloaded-task arrival Offloaded-task queue length

𝑄#2(𝑡)

𝐴#2(𝑡)𝜏𝑓#(𝑡)𝐿#

Split tasks for local computation

Local computation rate

Local-computation queue length

𝑄#+(𝑡)

𝐴#+(𝑡) 𝜏3𝑅#"(𝑡)

"

Task arrivals

Transmission rate

Task-offloading queue length𝐴#(𝑡) = 𝐴#2 𝑡 + 𝐴#+(𝑡)

Split tasks for offloading

MEC server

UE

MEC serverMEC server

UE UE

Offloaded-task queue

(a) System architecture.

𝑻𝟎1 2 𝑇0 + 1 𝟐𝑻𝟎

Time index 𝑡

𝟎 ……

UE-server association

Task offloading and resource allocation

……

UE-server association UE-server association

(b) Timeline of the two-timescale mechanism.

Figure 1. System model and timeline of the considered MEC network.

timeline is slotted and indexed by t ∈ N in which each timeslot, with the slot length τ , is consistent with the coherenceblock of the wireless channel. We further assume that UEs arerandomly distributed and moves continuously in the network,whereas the MEC servers are located in fixed positions. Sincethe UE’s geographic location keeps changing, the UE isincentivized to offload its tasks to a different server which iscloser to the UE, provides a stronger computation capability,or has the lower workload than the currently associated one. Inthis regard, we consider a two-timescale UE-server associationand task-offloading mechanism. Specifically, we group everysuccessive T0 time slots as a time frame, which is indexed byn ∈ Z+ and denoted by T (n) = [(n − 1)T0, · · · , nT0 − 1].In the beginning of each time frame (i.e., the long/slowtimescale), each UE is associated with an MEC server. Letηij(n) ∈ 0, 1 represent the UE-server association indicatorin the nth time frame, in which ηij(n) = 1 indicates thatUE i can offload its tasks to server j during time frame n.Otherwise, ηij(n) = 0. We also assume that each UE can onlyoffload its tasks to one MEC server at a time. The UE-serverassociation rule can be formulated asηij(n) ∈ 0, 1, ∀ i ∈ U , j ∈ S,∑

j∈Sηij(n) = 1, ∀ i ∈ U . (1)

Subsequently in each time slot, i.e., the short/fast timescale,within the nth frame, each UE dynamically offloads part ofthe tasks to the associated MEC server and computes theremaining tasks locally. The network architecture and timelineof the considered MEC network are shown in Fig. 1.

A. Traffic Model at the UE Side

The UE uses one application in which tasks arrive in astochastic manner. Following the data-partition model [5], weassume that each task can be computed locally, i.e., at the UE,or remotely, i.e., at the server. Different tasks are independentand can be computed in parallel. Thus, having the task arrivalsAi(t) in time slot t, each UE i divides its arrival into twodisjoint parts in which one part AL

i (t) is executed locally whenthe remaining tasks AO

i (t) will be offloaded to the server. Tasksplitting at UE i ∈ U can be expressed as

Ai(t) = ALi (t) +AO

i (t),

ALi (t), AO

i (t) ∈ 0, Aunit, 2Aunit, · · · .(2)

Here, Aunit represents the unit task which cannot be furthersplit. Moreover, we assume that task arrivals are independentand identically distributed (i.i.d.) over time with the averagearrival rate λi = E[Ai]/τ .

Each UE has two queue buffers to store the split tasks forlocal computation and offloading. For the local-computationqueue of UE i ∈ U , the queue length (in the unit of bits) intime slot t is denoted by QL

i (t) which evolves as

QLi (t+ 1) = max

QLi (t) +AL

i (t)− τfi(t)

Li, 0. (3)

Here, fi(t) (in the unit of cycle/sec) is the UE i’s allocatedCPU-cycle frequency to execute tasks when Li accounts forthe required CPU cycles per bit for computation, i.e., theprocessing density. The magnitude of the processing densitydepends on the performed application.1 Furthermore, givena CPU-cycle frequency fi(t), the UE consumes the amountκ[fi(t)]

3 of power for computation. κ is a parameter affectedby the device’s hardware implementation [10], [29]. For thetask-offloading queue of UE i ∈ U , we denote the queue length(in the unit of bits) in time slot t as QO

i (t). Analogously, thetask-offloading queue dynamics is given by

QOi (t+ 1) = max

QOi (t) +AO

i (t)−∑j∈S

τRij(t), 0, (4)

in which

Rij(t) = W log2

(1 +

ηij(n)Pi(t)hij(t)

N0W +∑

i′∈U\iηi′j(n)Pi′(t)hi′j(t)

),

(5)∀ i ∈ U , j ∈ S, is UE i’s transmission rate2 to offload tasks tothe associated MEC server j in time slot t ∈ T (n). Pi(t) andN0 are UE i’s transmit power and the power spectral densityof the additive white Gaussian noise (AWGN), respectively. Wis the bandwidth dedicated to each server and shared by itsassociated UEs. Additionally, hij is the wireless channel gainbetween UE i ∈ U and server j ∈ S, including path loss andchannel fading. We also assume that all channels experienceblock fading. In this work, we mainly consider the uplink, i.e.,offloading tasks from the UE to the MEC server, and neglect

1For example, the six-queen puzzle, 400-frame video game, seven-queenpuzzle, face recognition, and virus scanning require the processing den-sities of 1760 cycle/bit, 2640 cycle/bit, 8250 cycle/bit, 31680 cycle/bit, and36992 cycle/bit, respectively [7].

2All transmissions are encoded based on a Gaussian distribution.



5

the downlink, i.e., downloading the computed tasks from theserver. The rationale is that compared with the offloaded tasksbefore computation, the computation results typically havesmaller sizes [15], [30], [31]. Hence, the overheads in thedownlink can be neglected.

In order to minimize the total power consumption of re-source allocation for local computation and task offloading, theUE adopts the dynamic voltage and frequency scaling (DVFS)capability to adaptively adjust its CPU-cycle frequency [5],[29]. Thus, to allocate the CPU-cycle frequency and transmitpower, we impose the following constraints at each UE i ∈ U ,i.e.,

κ[fi(t)]3 + Pi(t) ≤ Pmax

i ,

fi(t) ≥ 0,

Pi(t) ≥ 0,

(6)

where Pmaxi is UE i’s power budget.

B. Traffic Model at the Server Side

We assume that each MEC server has distinct queue buffersto store different UEs’ offloaded tasks, where the queue length(in bits) of the UE i’s offloaded tasks at server j in time slot tis denoted by Zji(t). The offloaded-task queue length evolvesas

Zji(t+ 1) = maxZji(t) + min

QOi (t) +AO

i (t), τRij(t)

− τfji(t)

Li, 0

(7)

≤ maxZji(t) + τRij(t)−

τfji(t)

Li, 0, (8)

∀ i ∈ U , j ∈ S . Here, fji(t) is the server j’s allocated CPU-cycle frequency to process UE i’s offloaded tasks. Note thatthe MEC server is deployed to provide a faster computationcapability for the UE. Thus, we consider the scenario in whicheach CPU core of the MEC server is dedicated to at most oneUE (i.e., its offloaded tasks) in each time slot, and a UE’soffloaded tasks at each server can only be computed by oneCPU core at a time [9], [10]. The considered computationalresource scheduling mechanism at the MEC server is mathe-matically formulated as

∑i∈U

1fji(t)>0 ≤ Nj , ∀ j ∈ S,

fji(t) ∈ 0, fmaxj , ∀ i ∈ U , j ∈ S,

(9)

where Nj denotes the total CPU-core number of server j, fmaxj

is server j’s computation capability of one CPU core, and 1·is the indicator function. In (9), we account for the allocatedCPU-cyle frequencies to all UEs even though some UEs arenot associated with this server in the current time frame. Therationale will be detailedly explained in Section IV-D afterformulating the concerned optimization problem. Additionally,in order to illustrate the relationship between the offloaded-taskqueue length and the transmission rate, we introduce inequality(8) which will be further used to formulate the latency andreliability requirements of the considered MEC system andderive the solution of the studied optimization problem.

III. LATENCY REQUIREMENTS, RELIABILITYCONSTRAINTS, AND PROBLEM FORMULATION

In this work, the end-to-end delays experienced by thelocally-computed tasks AL

i (t) and offloaded tasks AOi (t) con-

sist of different components. When the task is computedlocally, it experiences the queuing delay (for computation) andcomputation delay at the UE. If the task is offloaded to theMEC server, the end-to-end delay includes: 1) queuing delay(for offloading) at the UE, 2) wireless transmission delay whileoffloading, 3) queuing delay (for computation) at the server,and 4) computation delay at the server. From Little’s law, weknow that the average queuing delay is proportional to theaverage queue length [32]. However, without taking the taildistribution of the queue length into account, solely focusingon the average queue length fails to account for the low-latencyand reliability requirement [24]. To tackle this, we focus on thestatistics of the task queue and impose probabilistic constraintson the local-computation and task-offloading queue lengths ofeach UE i ∈ U as follows:

limT→∞

1

T

T∑t=1

Pr(QLi (t) > dLi

)≤ εLi , (10)

limT→∞

1

T

T∑t=1

Pr(QOi (t) > dOi

)≤ εOi . (11)

Here, dLi and dOi are the queue length bounds when εLi 1and εOi 1 are the tolerable bound violation probabilities.Furthermore, the queue length bound violation also under-mines the reliability issue of task computation. For example,if a finite-size queue buffer is over-loaded, the incoming taskswill be dropped.

In addition to the bound violation probability, let us look atthe complementary cumulative distribution function (CCDF)of the UE’s local-computation queue length, i.e., FQL

i(q) =

Pr(QLi > q

), which reflects the queue length profile. If

the monotonically decreasing CCDF decays faster while in-creasing q, the probability of having an extreme queue lengthis lower. Since the prime concern in this work lies in theextreme-case events with very low occurrence probabilities,i.e., Pr

(QLi (t) > dLi

) 1, we resort to principles of extreme

value theory3 to characterize the statistics and tail distributionof the extreme event QL

i (t) > dLi . To this end, we firstintroduce the Pickands–Balkema–de Haan theorem [27].

Theorem 1 (Pickands–Balkema–de Haan theorem). Con-sider a random variable Q, with the cumulative distributionfunction (CDF) FQ(q), and a threshold value d. As thethreshold d closely approaches F−1Q (1), i.e., d → supq :FQ(q) < 1, the conditional CCDF of the excess valueX|Q>d = Q− d > 0, i.e., FX|Q>d(x) = Pr(Q− d > x|Q >d), can be approximated by a generalized Pareto distribution

3Extreme value theory is a powerful and robust framework to study thetail behavior of a distribution. Extreme value theory also provides statisticalmodels for the computation of extreme risk measures.



6

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5x/σ

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1CCDF

ξ = 1

ξ =1

2

ξ =1

3

ξ = 0

ξ = −

1

2

ξ = −1

ξ = −3

Figure 2. CCDFs of the GPDs for various shape pamameters ξ.

(GPD) G(x;σ, ξ), i.e.,

FX|Q>d(x) ≈ G(x;σ, ξ)

:=

(

1 + ξxσ

)−1/ξ, where x ≥ 0 and ξ > 0,

e−x/σ, where x ≥ 0 and ξ = 0,(1 + ξx

σ

)−1/ξ, where 0 ≤ x ≤ −σ/ξ and ξ < 0,

(12)

which is characterized by a scale parameter σ > 0 and ashape parameter ξ ∈ R.

In other words, the conditional CCDF of the excess valueX|Q>d converges to a GPD as d → ∞. However, from theproof [27] for Theorem 1, we know that the GPD provides agood approximation when FQ(d) is close to 1, e.g., FQ(d) =0.99. That is, depending on the CDF of Q, imposing a verylarge d might not be necessary for obtaining the approximatedGPD. Moreover, for a GPD G(x;σ, ξ), its mean σ/(1−ξ) andother higher-order statistics such as variance σ2

(1−ξ)2(1−2ξ) andskewness exist if ξ < 1, ξ < 1

2 , and ξ < 13 , respectively. Note

that the scale parameter σ and the domain x of G(x;σ, ξ) arein the same order. In this regard, we can see that G(σ;σ, 0) =e−1 = 0.37 at x = σ and G(3σ;σ, 0) = e−3 = 0.05 at x = 3σin (12). We also show the CCDFs of the GPDs for variousshape parameters ξ in Fig. 2, where the x-axis is indexed withrespect to the normalized value x/σ. As shown in Fig. 2, thedecay speed of the CCDF increases as ξ decreases. In contrastwith the curves with ξ ≥ 0, we can see that the CCDF decaysrather sharply when ξ ≤ −3.

Now, let us denote the excess value (with respect to thethreshold dLi in (10)) of the local-computation queue of eachUE i ∈ U in time slot t as XL

i (t)|QLi (t)>d

Li

= QLi (t)−dLi > 0.

By applying Theorem 1, the excess queue value can be ap-proximated by a GPD G(xi;σ

Li , ξ

Li ) whose mean and variance

are

E[XLi (t)|QL

i (t) > dLi]≈ σL

i

1− ξLi, (13)

Var(XLi (t)|QL

i (t) > dLi)≈ (σL

i )2

(1− ξLi )2(1− 2ξLi ), (14)

with the corresponding scale parameter σLi and shape param-

eter ξLi . In (13) and (14), we can find that the smaller σLi and

ξLi are, the smaller the mean value and variance. Since theapproximated GPD is just characterized by the scale and shapeparameters as mentioned previously, therefore, we imposethresholds on these two parameters, i.e., σL

i ≤ σL,thi and ξLi ≤

ξL,thi . The selection of threshold values can be referred to theabove discussions about the GPD, Fig. 2, and the magnitudeof the interested metric’s values. Subsequently, applying thetwo parameter thresholds and Var(XL

i ) = E[(XLi )2]−E[XL

i ]2

to (13) and (14), we consider the constraints on the long-termtime-averaged conditional mean and second moment of theexcess value of each UE’s local-computation queue length,i.e., ∀ i ∈ U ,

XLi = lim

T→∞

1

T

T∑t=1

E[XLi (t)|QL

i (t) > dLi]≤ σL,th

i

1− ξL,thi

,

(15)

Y Li = lim

T→∞

1

T

T∑t=1

E[Y Li (t)|QL

i (t) > dLi]

≤2(σL,thi

)2(1− ξL,thi

)(1− 2ξL,thi

) , (16)

with Y Li (t) = [XL

i (t)]2. Analogously, denoting the excessvalue, with respect to the threshold dOi , of UE i’s task-offloading queue length in time slot t as XO

i (t)|QOi (t)>dOi

=

QOi (t) − dOi > 0, we have the constraints on the long-term

time-averaged conditional mean and second moment

XOi = lim

T→∞

1

T

T∑t=1

E[XOi (t)|QO

i (t) > dOi]≤ σO,th

i

1− ξO,thi

,

(17)

Y Oi = lim

T→∞

1

T

T∑t=1

E[Y Oi (t)|QO

i (t) > dOi]

≤2(σO,thi

)2(1− ξO,thi

)(1− 2ξO,thi

) , (18)

for all UEs i ∈ U , in which σO,thi and ξO,thi are the thresholds

for the characteristic parameters of the approximated GPD, andY Oi (t) = [XO

i (t)]2.Likewise, the average queuing delay at the server is pro-

portional to the ratio of the average queue length to theaverage transmission rate. Referring to (8), we consider theprobabilistic constraint as follows:

limT→∞

1

T

T∑t=1

Pr

(Zji(t)

Rij(t− 1)> dji

)≤ εji, (19)

∀ i ∈ U , j ∈ S, with the threshold dji and tolerable violationprobability εji 1, on the offloaded-task queue length at



7

the MEC server. Rij(t − 1) = 1t

∑t−1τ=0Rij(τ) is the moving

time-averaged transmission rate. Similar to the task queuelengths at the UE side, we further denote the excess value,with respect to the threshold Rij(t − 1)dji, in time slot t asXji(t)|Zji(t)>Rij(t−1)dji = Zji(t)− Rij(t− 1)dji > 0 of theoffloaded-task queue length of UE i ∈ U at server j ∈ S andimpose the constraints as follows:

Xji = limT→∞

1

T

T∑t=1

E[Xji(t)|Zji(t) > Rij(t− 1)dji

]≤

σthji

1− ξthji, (20)

Yji = limT→∞

1

T

T∑t=1

E[Yji(t)|Zji(t) > Rij(t− 1)dji

]≤

2(σthji )

2

(1− ξthji )(1− 2ξthji ), (21)

with Yji(t) = [Xji(t)]2. Here, σth

ji and ξthji are the thresholdsfor the characteristic parameters of the approximated GPD.

We note that the local computation delay at the UE and thetransmission delay while offloading are inversely proportionalto the computation speed fi(t)/Li and the transmission rate∑j∈S Rij(t) as per (3) and (4), respectively. To decrease the

local computation and transmission delays, the UE shouldallocate a higher local CPU-cycle frequency and more transmitpower, which, on the other hand, incurs energy shortage. Sinceallocating a higher CPU-cycle frequency and more transmitpower can also further decrease the queue length, both (localcomputation and transmission) delays are implicitly taken intoaccount in the queue length constraints (10), (11), and (15)–(18). At the server side, the remote computation delay can beneglected because one CPU core with the better computationcapability is dedicated to one UE’s offloaded tasks at atime. On the other hand, the server needs to schedule itscomputational resources, i.e., multiple CPU cores, when theassociated UEs are more than the CPU cores.

Incorporating the aforementioned latency requirements andreliability constraints, the studied optimization problem isformulated as follows:

MP: minimizeη(n),f(t),P(t)

∑i∈U

(PCi + PT

i )

subject to (1) for UE-server association,(2) for task splitting,(6) and (9) for resource allocation,(10), (11), and (19) for

queue length bound violation,(15)–(18), (20), and (21) for the GPDs,

where PCi = lim

T→∞1T

∑T−1t=0 κ[fi(t)]

3 and PTi =

limT→∞

1T

∑T−1t=0 Pi(t) are the UE i’s long-term time-averaged

power consumptions for local computation and task offload-ing, respectively. η(n) = (ηij(n) : i ∈ U , j ∈ S) andP(t) = (Pi(t) : i ∈ U) denote the network-wide UE-serverassociation and transmit power allocation vectors, respectively.

In addition, f(t) = (fi(t), fj(t) : i ∈ U , j ∈ S) denotesthe network-wide computational resource allocation vector inwhich fj(t) = (fji(t) : i ∈ U , j ∈ S) is the computationalresource allocation vector of server j. To solve problem MP,we utilize techniques from Lyapunov stochastic optimizationand propose a dynamic task offloading and resource allocationpolicy in the next section.

IV. LATENCY AND RELIABILITY-AWARE TASKOFFLOADING AND RESOURCE ALLOCATION

Let us give an overview of the proposed task offloadingand resource allocation approach before specifying the details.In the beginning of each time frame, i.e., every T0 slots,we carry out a UE-server association, taking into accountthe wireless link strength, the UEs’ and servers’ compu-tation capabilities, their historical workloads, and URLLCconstraints (11) and (17)–(21). To this end, a many-to-onematching algorithm is utilized to associate each server withmultiple UEs. Afterwards, we focus on task offloading andresource allocation by solving three decomposed optimizationproblems, via Lyapunov optimization, in each time slot. Atthe UE side, each UE splits its instantaneous task arrivalsinto two parts, which will be computed locally and offloadedrespectively, while allocating the local computation CPU-cylefrequency and transmit power for offloading. At the serverside, each MEC server schedules its CPU cores to executethe UEs’ offloaded tasks. In the procedures (of task splittingand offloading, resource allocation, and CPU-core scheduling),the related URLLC constraints out of (10), (11), and (15)–(21)are considered. The details of our proposed approach will beillustrated in the remainder of this section.

A. Lyapunov Optimization Framework

We first introduce a virtual queue QL,(X)i for the long-

term time-averaged constraint (15) with the queue evolutionas follows:

QL,(X)i (t+ 1) = max

Q

L,(X)i (t)

+

(XLi (t+ 1)− σL,th

i

1− ξL,thi

)× 1QL

i (t+1)>dLi , 0

, (22)

in which the incoming traffic amount XLi (t + 1) ×

1QLi (t+1)>dLi and outgoing traffic amount σL,th

i

1−ξL,thi

×1QL

i (t+1)>dLi correspond to the left-hand side and right-hand side of the inequality (15), respectively. Note that [26]ascertains that the introduced virtual queue is mean rate stable,i.e., lim

t→∞E[|QL,(X)

i (t)|]t → 0, is equivalent to satisfying the

long-term time-averaged constraint (15). Analogously, for theconstraints (16)–(18), (20), and (21), we respectively introducethe virtual queues as follows:

QL,(Y)i (t+ 1) = max

Q

L,(Y)i (t) +

(Y Li (t+ 1)

−2(σL,thi

)2(1− ξL,thi

)(1− 2ξL,thi

))× 1QLi (t+1)>dLi , 0

, (23)



8

QO,(X)i (t+ 1) = max

Q

O,(X)i (t)

+

(XOi (t+ 1)− σO,th

i

1− ξO,thi

)× 1QO

i (t+1)>dOi , 0

, (24)

QO,(Y)i (t+ 1) = max

Q

O,(Y)i (t) +

(Y Oi (t+ 1)

−2(σO,thi

)2(1− ξO,thi

)(1− 2ξO,thi

))× 1QOi (t+1)>dOi , 0

, (25)

Q(X)ji (t+ 1) = max

Q

(X)ji (t) +

(Xji(t+ 1)

−σthji

1− ξthji

)× 1Zji(t+1)>Rij(t)dji, 0

, (26)

Q(Y)ji (t+ 1) = max

Q

(Y)ji (t) +

(Y kji(t+ 1)

−2(σth

ji )2

(1− ξthji )(1− 2ξthji )

)× 1Zji(t+1)>Rij(t)dji, 0

. (27)

Additionally, given an event B and the set of all possibleoutcomes Ω, we can derive E[1B] = 1 ·Pr(B) + 0 ·Pr(Ω \B) = Pr(B). By applying E[1B] = Pr(B), constraints (10),(11), and (19) can be equivalently rewritten as

limT→∞

1

T

T∑t=1

E[1QL

i (t)>dLi ]≤ εLi , (28)

limT→∞

1

T

T∑t=1

E[1QO

i (t)>dOi ]≤ εOi , (29)

limT→∞

1

T

T∑t=1

E[1Zji(t)>Rij(t−1)dji

]≤ εji. (30)

Then let us follow the above steps. The corresponding virtualqueues of (28)–(30) are expressed as

QL,(Q)i (t+ 1) = max

Q

L,(Q)i (t) + 1QL

i (t+1)>dLi − εLi , 0,

(31)

QO,(Q)i (t+ 1) = max

Q

O,(Q)i (t) + 1QO

i (t+1)>dOi − εOi , 0

,

(32)

Q(Z)ji (t+ 1) = max

Q

(Z)ji (t) + 1Zji(t+1)>Rij(t)dji − εji, 0

.

(33)

Now problem MP is equivalently transferred to [26]

MP’: minimizeη(n),f(t),P(t)

∑i∈U

(PCi + PT

i )

subject to (1), (2), (6), and (9),Stability of (22)–(27) and (31)–(33).

To solve problem MP’, we let Q(t) =(Q

L,(X)i (t), Q

L,(Y)i (t), Q

O,(X)i (t), Q

O,(Y)i (t), Q

(X)ji (t), Q

(Y)ji (t),

QL,(Q)i (t), Q

O,(Q)i (t), Q

(Z)ji (t) : i ∈ U , j ∈ S

)denote the

combined queue vector for notational simplicity and expressthe conditional Lyapunov drift-plus-penalty for slot t as

E[L(Q(t+ 1))−L(Q(t)) +

∑i∈U

V(κ[fi(t)]

3 +Pi(t))∣∣∣Q(t)

],

(34)

where

L(Q(t)) =1

2

∑i∈U

([Q

L,(X)i (t)

]2+[Q

L,(Y)i (t)

]2+[Q

O,(X)i (t)

]2+[Q

O,(Y)i (t)

]2+[Q

L,(Q)i (t)

]2+[Q

O,(Q)i (t)

]2)+

1

2

∑i∈U

∑j∈S

([Q

(X)ji (t)

]2+[Q

(Y)ji (t)

]2+[Q

(Z)ji (t)

]2)is the Lyapunov function. The term V ≥ 0 is a parameterwhich trades off objective optimality and queue length reduc-tion. Subsequently, plugging the inequality (maxx, 0)2 ≤x2, all physical and virtual queue dynamics, and (8) into (34),we can derive

(34) ≤ C + E[∑i∈U

[(Q

L,(X)i (t) +QL

i (t) + 2QL,(Y)i (t)QL

i (t)

+ 2[QLi (t)

]3)(ALi (t)− τfi(t)

Li

)× 1QL

i (t)+Ai(t)>dLi

+QL,(Q)i (t)× 1maxQL

i (t)+ALi (t)−τfi(t)/Li,0>dLi

]+∑i∈U

[(Q

O,(X)i (t) +QO

i (t) + 2QO,(Y)i (t)QO

i (t)

+ 2[QOi (t)

]3)(AOi (t)−

∑j∈S

τRij(t))× 1QO

i (t)+Ai(t)>dOi

+QO,(Q)i (t)× 1maxQO

i (t)+AOi (t)−

∑j∈S

τRij(t),0>dOi

]+∑i∈U

∑j∈S

[(Q

(X)ji (t) + Zji(t) + 2Q

(Y)ji (t)Zji(t) + 2

[Zji(t)

]3)(τRij(t)−

τfji(t)

Li

)× 1Zji(t)+τRmax

ij (t)>Rij(t−1)dji

+Q(Z)ji (t)× 1maxZji(t)+τRij(t)−τfji(t)/Li,0>Rij(t−1)dji

]+∑i∈U

V(κ[fi(t)]

3 + Pi(t))∣∣∣Q(t)

]. (35)

Here, Rmaxij (t) = W log2

(1+

Pmaxi hij(t)N0W

)is UE i’s maximum

offloading rate. Since the constant C does not affect thesystem performance in Lyapunov optimization, we omit itsdetails in (35) for expression simplicity. Note that a solution toproblem MP’ can be obtained by minimizing the upper bound(35) in each time slot t, in which the optimality of MP’ isasymptotically approached by increasing V [26]. To minimize(35), we have three decomposed optimization problems P1,P2, and P3 which are detailed and solved in the followingparts.

The first decomposed problem, which jointly associatesUEs with MEC servers and allocates UEs’ computational andcommunication resources, is given by

P1: minimizeη(n),f(t),P(t)

∑i∈U

∑j∈S

(βji(t)− βO

i (t))τW log2

(1

+ηij(n)Pi(t)hij(t)

N0W +∑

i′∈U\iηi′j(n)Pi′(t)hi′j(t)

)

+∑i∈U

[V(κ[fi(t)]

3 + Pi(t))− βL

i (t)τfi(t)

Li

]subject to (1) and (6),



9

with

βji(t) =(Q

(X)ji (t) + 2Q

(Y)ji (t)Zji(t) + 2

[Zji(t)

]3+ Zji(t)

)× 1

Zji(t)+τRmaxij (t)>Rij(t−1)dji

+Q(Z)ji (t) + Zji(t), (36)

βOi (t) =

(Q

O,(X)i (t) + 2Q

O,(Y)i (t)QO

i (t) + 2[QOi (t)

]3+QO

i (t))× 1QO

i (t)+Ai(t)>dOi +QO,(Q)i (t) +QO

i (t),

(37)

βLi (t) =

(Q

L,(X)i (t) + 2Q

L,(Y)i (t)QL

i (t) + 2[QLi (t)

]3+QL

i (t))× 1QL

i (t)+Ai(t)>dLi +QL,(Q)i (t) +QL

i (t). (38)

Note that in P1, the UE’s allocated transmit power is coupledwith the local CPU-cycle frequency. The transmit power alsodepends on the wireless channel strength to the associatedserver and the weight βji(t) of the corresponding offloaded-task queue, in which the former depends on the distance be-tween the UE and server when the latter is related to the MECserver’s computation capability and the number of associatedUEs. Therefore, the UEs’ geographic configuration and theservers’ computation capabilities should be taken into accountwhile we associate the UEs with the servers. Moreover, UE-server association, i.e., η(n), and resource allocation, i.e., f(t)and P(t), are performed in two different timescales, i.e., in thebeginning of each time frame and every time slot afterwards.We solve P1 in two steps, in which the UE-server associationis firstly decided. Then, given the association results, UEs’CPU-cycle frequencies and transmit powers are allocated.

B. UE-Server Association using Many-to-One Matching withExternalities

To associate UEs to the MEC servers, let us focus on thewireless transmission part of P1 and, thus, fix Pi(t) = Pmax

i

and fi(t) = 0, ∀ i ∈ U , at this stage. The wireless channel gainhij(t) and the weight factors βO

i (t) and βji(t) dynamicallychange in each time slot, whereas the UEs are re-associatedwith the servers in every T0 slots. In order to take the impactsof hij(t), βO

i (t), and βji(t) into account, we consider theaverage strength for the channel gain, i.e., letting hij(t) =E[hij ],∀ i ∈ U , j ∈ S, and the empirical average, i.e.,

βOi (n) =

1

(n− 1)T0

(n−1)T0−1∑t=0

βOi (t), ∀ i ∈ U ,

βj(n) =1

(n− 1)T0

n−1∑k=1

kT0−1∑t=(k−1)T0

∑i∈U

ηij(k)βji(t)∑i∈U

ηij(k), ∀ j ∈ S,

as the estimations of the weight factors βOi (t) and βji(t).

Here, βj(n) represents the estimated average weight factorof a single UE’s offloaded-task queue (at server j) since wealso take the average over all the associated UEs in each time

frame. Incorporating the above assumptions, the UE-serverassociation problem of P1 can be considered as

P1-1: maximizeη(n)

∑i∈U

∑j∈S

(βOi (n)− βj(n)

)log2

(1

+ηij(n)Pmax

i E[hij ]

N0W +∑

i′∈U\iηi′j(n)Pmax

i′ E[hi′j ]

)subject to (1).

However, due to the binary nature of optimization variablesand the non-convexity of the objective, P1-1 is an NP-hardnonlinear integer programming problem [33]. To tackle this,we invoke matching theory which provides an efficient andlow-complexity way to solve integer programming problems[34], [35] and has been utilized in various wireless communi-cation systems [22], [35]–[38].

Defined in matching theory, matching is a bilateral assign-ment between two sets of players, i.e., the sets of UEs andMEC servers, which matches each player in one set to theplayer(s) in the opposite set [28]. Referring to the structureof problem P1-1, we consider a many-to-one matching model[39] in which each UE i ∈ U is assigned to a server wheneach server j ∈ S can be assigned to multiple UEs. Moreover,UE i is assigned to server j if server j is assigned to UE i, andvice versa. The considered many-to-one matching is formallydefined as follows.

Definition 1. The considered many-to-one matching gameconsists of two sets of players, i.e., U and S, and the outcomeof the many-to-one matching is a function η from U × S tothe set of all subsets of U × S with

1) |η(i)| = 1,∀ i ∈ U ,2) |η(j)| ≤ U,∀ j ∈ S,3) j = η(i)⇔ i ∈ η(j).

Having a matching η, the UE-server association indicatorcan be specified as per, ∀ i ∈ U , j ∈ S,

ηij(n) = 1, if j = η(i),

ηij(n) = 0, otherwise.(39)

Noe that P1-1 is equivalent to a weighted sum rate max-imization problem in which

(βOi (n) − βj(n)

)log2

(1 +

Pmaxi E[hij ]

N0W+∑

i′∈U\iηi′j(n)P

maxi′ E[hi′j ]

)can be treated as UE i’s

weighted transmission rate provided that UE i is matched toserver j. Thus, the UE’s matching preference over the serverscan be chosen based on the weighted rates in the descendingorder. Since the rate is affected by the other UEs (i.e., theplayers in the same set of the matching game) via interferenceif they are matched to the same server, the UE’s preferencealso depends on the matching state of the other UEs. Theinterdependency between the players’ preferences is calledexternalities [35] in which the player’s preference dynamicallychanges with the matching state of the other players in thesame set. Thus, the UE’s preference over matching states



10

should be adopted. To this end, given a matching η withj = η(i), we define UE i’s utility as

Ψi(η) =(βOi − βj

)log2

(1

+Pmaxi E[hij ]

N0W +∑

i′∈U\i1η(i′)=jP

maxi′ E[hi′j ]

), ∀ i ∈ U , (40)

and server j’s utility as

Ψj(η) =∑i∈U

(βOi − βj

)log2

(1

+ηijP

maxi E[hij ]

N0W +∑

i′∈U\i1η(i′)=jP

maxi′ E[hi′j ]

), ∀ j ∈ S. (41)

The UE’s and server’s matching preferences are based on theirown utilities in a descending order. For notational simplicity.we remove the time index n in (40) and (41). Subsequently, weconsider the notion of swap matching to deal with externalities[39].

Definition 2. Given a may-to-one matching η, a pair of UEs(i, i′), and a pair of servers (j, j′) with j = η(i) and j′ =

η(i′), a swap matching ηi′j′

ij is

ηi′j′

ij =η \ (i, j), (i′, j′)

∪

(i, j′), (i′, j).

In other words, we have j′ = ηi′j′

ij (i) and j = ηi′j′

ij (i′)

in the swap matching ηi′j′

ij for the UE pair (i, i′) and serverpair (j, j′), whereas the matching state of the other UEs andservers remains identical in both η and ηi

′j′

ij . Furthermore, inDefinition 2, one of the UE pair in the swap operation, e.g.,i′, can be an open spot of server j′ in η with |η(j′)| < U . Inthis situation, we have |ηi

′j′

ij (j′)| − |η(j′)| = 1 and |η(j)| −|ηi′j′

ij (j)| = 1. Moreover, Ψi′(η) = 0 and Ψi′(ηi′j′

ij ) = 0.

Definition 3. For the matching η, (i, i′) is a swap-blockingpair [35] if and only if

1) ∀u ∈ i, i′, j, j′, Ψu(ηi′j′

ij ) ≥ Ψu(η),2) ∃u ∈ i, i′, j, j′ such that Ψu(ηi

′j′

ij ) > Ψu(η).

Therefore, provided that the matching state of the remainingUEs and servers is fixed, two UEs (i, i′) exchange theirrespectively matched servers if both UEs’ and both servers’utilities will not be worse off, and at lest one’s utility is betteroff, after the swap. The same criteria are applicable when theUE i changes to another server j′ from the current matchedserver j, i.e., i′ is an open spot of server j′.

Definition 4. A matching η is two-sided exchange-stable ifthere is no swap-blocking pair [39].

In summary, we first initialize a matching η and calculateutilities (40) and (41). Then, we iteratively find a swap-blockpair and update the swap matching until two-sided exchangestability is achieved. The steps to solve problem P1-1 by many-to-one matching with externalities are detailed in Algorithm1. In each iteration of Algorithm 1, there are

(U2

)+U(S − 1)

possibilities to form a swap-blocking pair, in which there are

(U2

)pairs of two different UEs. The term U(S − 1) accounts

for all the swap-blocking pairs consisting of a UE i ∈ U andan open spot of another MEC server j′ ∈ S \ j. Given that theUEs are more than the MEC servers in our considered network,the complexity of Algorithm 1 is in the order of O(U2) whichincreases binomially with the number of UEs. Additionally, letus consider an exhaustive search in problem P1-1. Since eachUE i ∈ U can only access one out of S servers, there are SU

association choices satisfying constraint (1). In the exhaustivesearch approach, the complexity increases exponentially withthe number of UEs.

C. Resource Allocation and Task Splitting at the UE SideNow denoting the representative UE i’s associated MEC

server as j∗, we formulate the UEs’ resource allocationproblem of P1 as

P1-2: minimizef(t),P(t)

∑i∈U

[(βj∗i(t)− βO

i (t))τW log2

(1

+Pi(t)hij∗(t)

N0W +∑

i′∈U\iηi′j∗Pi′(t)hi′j∗(t)

)

−βLi (t)τfi(t)

Li+ V

(κ[fi(t)]

3 + Pi(t))]

subject to (6).

When solving problem P1-2 in a centralized manner, eachUE i needs to upload its local information βL

i (t) and βOi (t)

in every time slot to a central unit, e.g., the associated server.This can incur high overheads, especially in dense networks.In order to alleviate this issue, we decompose the summation(over all UEs) in the objective and let each UE i ∈ Ulocally allocate its CPU-cycle frequency and transmit powerby solving

P1-2’: minimizefi(t),Pi(t)

(βj∗i(t)− βO

i (t))τW

×EIij∗[

log2

(1 +

Pi(t)hij∗(t)

N0W + Iij∗(t)

)]−β

Li (t)τfi(t)

Li+ V

(κ[fi(t)]

3 + Pi(t))

subject to (6).

The expectation is with respect to the locally-estimated dis-tribution, Pr

(Iij ; t

), of the aggregate interference Iij∗(t) =∑

i′∈U\i ηi′j∗Pi′(t)hi′j∗(t). Note that when V = 0, P1-2’ isequivalent to the rate maximization problem, where the powerbudget will be fully allocated. As V is gradually increased, wepay more attention on power cost reduction, and the solutionvalues will decrease correspondingly. Moreover, although thedownlink is not considered in this work, we implicitly assumethat the UE has those executed tasks and can locally trackthe state of the offloaded-task queue. In other words, the fullinformation about βj∗i(t) is available at the UE.

Lemma 1. The optimal solution to problem P1-2’ is that UEi allocates the CPU-cycle frequency

f∗i (t) =

√βLi (t)τ

3Liκ(V + γ∗)



11

Algorithm 1 UE-Server Association by Many-to-One Match-ing with Externalities

1: Initialize η(i) = argmaxj∈SE[hij ]

,∀ i ∈ U .

2: Calculate (40) and (41).3: repeat4: Select a pair of UEs (i, i′) or a UE i with an open

spot i′ of server j′.5: if (i, i′) is a swap-blocking pair of the current match-

ing η then6: Update η ← ηi

′j′

ij .7: Calculate (40) and (41).8: end if9: until No swap-blocking pair exists in the current matchingη.

10: Transfer η to the UE-server association indicator as per(39).

for local computation. The optimal allocated transmit powerP ∗i > 0 satisfies

EIij∗[ (

βOi (t)− βj∗i(t)

)τWhij∗

(N0W + Iij∗ + P ∗i hij∗) ln 2

]= V + γ∗

if EIij∗[(βO

i (t)−βj∗i(t))τWhij∗

(N0W+Iij∗ ) ln 2

]> V + γ∗. Otherwise, P ∗i =

0. Furthermore, the optimal Lagrange multiplier γ∗ is 0 ifκ[f∗i (t)]3+P ∗i (t) < Pmax

i . When γ∗ > 0, κ[f∗i (t)]3+P ∗i (t) =Pmaxi .

Proof. Please refer to Appendix A.

The second decomposed problem P2 given in (35) de-termines whether the arrival task is assigned to the local-computation queue or task-offloading queue. In this regard,each UE i ∈ U solves

P2: minimizeAL

i (t),AOi (t)

βLi (t)AL

i (t) + βOi (t)AO

i (t)

subject to Ai(t) = ALi (t) +AO

i (t),

ALi (t), AO

i (t) ∈ 0, Aunit, 2Aunit, · · · .

We can straightforwardly find an optimal solution(AL∗

i (t), AO∗i (t)) as(

AL∗i (t), AO∗

i (t))

= (Ai(t), 0), if βLi (t) ≤ βO

i (t),(AL∗i (t), AO∗

i (t))

= (0, Ai(t)), if βLi (t) > βO

i (t).(42)

D. Computational Resource Scheduling at the Server Side

The third decomposed problem P3 given in (35) is addressedat the server side, where each MEC server j ∈ S solves

P3: maximizefj(t)

∑i∈U

βji(t)fji(t)

Li

subject to (9),

to schedule its CPU cores. Note that the server considers theallocated CPU-cycle frequencies to all UEs in the objective,constraints, and optimization variables even though some UEsare not associated in the current time frame n. However, sincethe corresponding weight of the UE’s offloaded-task queue,

Algorithm 2 Computational Resource Scheduling at theServer

1: Initialize n = 1, Uj = i ∈ U|βji(t) > 0, and fji =0,∀ i ∈ U .

2: while n ≤ Nj and Uj 6= ∅ do3: Find i∗ = argmaxi∈Uj

βji(t)/Li

.

4: Let fji∗(t) = fmaxj .

5: Update n← n+ 1 and Uj ← Uj \ i∗.6: end while

Algorithm 3 Two-Timescale Mechanism for UE-Server As-sociation, Task Offloading, and Resource Allocation

1: Initialize t = 0, predetermine the system lifetime as T ,and set the initial queue values of (3), (4), (7), and (22)–(27) and (31)–(33) as zero.

2: repeat3: if t/T0 ∈ N then4: Run Algorithm 1 to associate the UEs with the

MEC servers.5: end if6: The UE allocates the local CPU-cycle frequency and

transmit power as per Lemma 1, and splits the task arrivalsfor local computation and offloading according to (42).

7: The server schedules its computational resources byfollowing Algorithm 2.

8: The UE updates the queue lengths in (3), (4), (22)–(25), (31), and (32).

9: The server updates the queue lengths in (7), (26), (27),and (33).

10: Update t← t+ 1.11: until t > T

i.e., βji(t), is taken into account in the objective, the resourceallocation at the server will not be over-provisioned. Theseinsights are illustrated as follows.

Assuming that the server is not able to complete a UE’soffloaded tasks in the previous time frame n − 1 and is notassociated with this UE in the current time frame n. Althoughthe offloaded-task queue length Zji(t) does not grow anymore,those incomplete tasks will experience severe delay if theyare ignored in the current time frame. Furthermore, sinceRij(t) that is considered in the URLLC constraints decreasesin the current frame, the weight βji(t) grows as per (26),(27), (33), and (36). In other words, the more severe theexperienced delay of the ignored UE’s incomplete tasks, thehigher the weight βji(t). To address this severe-delay issue,the server takes into account all UEs via βji(t) in the objective.Once the offloaded tasks are completed, βji(t) remains zeroin the rest time slots. In addition, for the UEs which havenot been associated with this server, we have βji(t) = 0.Therefore, the server only considers the UEs with βji(t) > 0while scheduling the computational resources. In summary, theoptimal solution to problem P3 is that server j dedicates itsCPU cores to, at most, Nj UEs with largest positive values ofbji(t)/Li. The steps of allocating the server’s CPU cores aredetailed in Algorithm 2.

After computing and offloading the tasks in time slot t, each



12

Table IISIMULATION PARAMETERS [7], [10], [17], [40], [41]

Parameter Value Parameter ValueT0 100 Aunit 1500 bytesκ 10−27 Watt · s3/cycle3 W 10 MHzN0 -174 dBm/Hz Pmax

i 30 dBmLi [1× 103, 4× 104] cycle/bit V 0λi [10, 150] kbps fmax

j 1010 cycle/sdLi 100AL

i (t− 1) (bit) εLi 0.01σL,thi

40 (Mbit) ξL,thi0.3

dOi 100AOi (t− 1) (bit) εOi 0.01

σO,thi

40 (Mbit) ξO,thi0.3

dji 20 sec εji 0.01σthji 40 (Mbit) ξthji 0.3

UE updates its physical and virtual queue lengths in (3), (4),(22)–(25), (31), and (32) when the MEC servers update (7),(26), (27), and (33) for the next slot t + 1. Moreover, basedon the transmission rate Rij(t), the UE empirically estimatesthe statistics of Iij for slot t + 1 as per Pr

(Iij ; t + 1

)=

1Iij(t)=Iij

t+2 +(t+1)Pr(Iij ;t)

t+2 . The procedures of the proposedtwo-timescale mechanism are outlined in Algorithm 3.

V. NUMERICAL RESULTS

We consider an indoor 100 × 100 m2 area in which fourMEC servers are deployed at the centers of four equal-sizequadrants, respectively. Each server is equipped with eightCPU cores. Additionally, multiple UEs, ranging from 30 to80, are randomly distributed. For task offloading, we assumethat the transmission frequency is 5.8 GHz with the pathloss model 24 log x + 20 log 5.8 + 60 (dB) [42], where x inmeters is the distance between any UE and server. Further,all wireless channels experience Rayleigh fading with unitvariance. Coherence time is 40 ms [43]. Moreover, we considerPoisson processes for task arrivals. The rest parameters for allUEs i ∈ U and servers j ∈ S are listed in Table II.4

We first verify the effectiveness of using thePickands–Balkema–de Haan theorem to characterizethe excess queue length in Fig. 3. Although the task-offloading queue is used to verify the results, the local-computation and offloaded-task queues can be used.Let us consider Li = 8250 cycle/bit, λi = 100 kbps,and dOi = 10AO

i (t − 1) bit for all 30 UEs. Then,given Pr

(QO > 10AO(∞)

)= 3.4 × 10−3 with

10AO(∞) = 3.96 × 104, Fig. 3(a) shows the CCDFsof exceedances XO|QO>3.96×104 = QO− 3.96× 104 > 0 andthe approximated GPD in which the latter provides a goodcharacterization for exceedances. Further, the convergenceof the scale and shape parameters of the approximatedGPD is shown in Fig. 3(b). Once convergence is achieved,characterizing the statistics of exceedances helps to locallyestimate the network-wide extreme metrics, e.g., the maximalqueue length among all UEs as in [44], and enables us toproactively deal with the occurrence of extreme events.

4ALi (t− 1) = 1

t

t−1∑τ=0

ALi (τ) and AO

i (t− 1) = 1t

t−1∑τ=0

AOi (τ).

0 10 20 30 40 50 60 70 80 90 100 110 120Excess queue length XO|QO>3.96×104 (kbit)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

CCDF

Numerial resultApproximated GPD with (σ, ξ) = (9× 103, 0.23)

(a) Tail distributions of the excess value of a UE’s task-offloading queue andthe approximated GPD of exceedances.

0 0.3 0.6 0.9 1.2 1.5 1.8 2.1 2.4 2.7 3Time index t

×104

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Approxim

ated

GPD

scaleparam

eter

σ

×104

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Approxim

ated

GPD

shap

eparam

eter

ξ

Shape parameter ξ

Scale parameter σ

(b) Convergence of the approximated GPD scale and shape parameters ofexceedances.

Figure 3. Effectiveness of applying the Pickands–Balkema–de Haan theoremin the MEC network.

Subsequently, we measure the ratio between the taskamounts split to the task-offloading queue and local-computation queues, i.e., ζ =

AOi (∞)

ALi (∞)

. If ζ < 1, the UE paysmore attention to local computation. More tasks are offloadedwhen ζ > 1. As shown in Fig. 4, more fraction of arrival tasksis offloaded for the intenser processing density L or highertask arrival rate λ. In these situations, the UE’s computationcapability becomes less supportable, and extra computationalresources are required as expected. Additionally, since strongerinterference is incurred in denser networks, the UE lowersthe task portion of offloading, especially when more compu-tational resources of the server are required, i.e., the cases ofhigher L and λ. In the scenario without MEC servers, the local



13

0.1 0.4 0.7 1 1.3 1.6 1.9 2.2 2.5 2.8 3.1 3.4 3.7Processing density L (cycle/bit)

×104

0

0.25

0.5

0.75

1

1.25

1.5

1.75

2

2.25

2.5

2.75

3

Tasksplitratioζ

0

20

40

60

80

100

120

140

160

180

200

220

240

Max

imum

sustainab

letask

arrivalrate

λsus(kbps)

λ = 150kbps, 30 UEsλ = 150kbps, 80 UEsλ = 40kbps, 30 UEsλ = 40kbps, 80 UEsNon-MEC

λ = 150kbps

λ = 40kbps

Figure 4. Task split ratio versus processing density.

100

101

102

103

104

105

Lyapunov optimization tradeoff parameter V

-10

-5

0

5

10

15

20

25

30

Average

pow

erconsumption

(dBm)

L = 8250 cycle/bit, λ = 50kbps, 80 UEsL = 8250 cycle/bit, λ = 50kbps, 30 UEsL = 8250 cycle/bit, λ = 100kbps, 30 UEsL = 8250 cycle/bit, λ = 200kbps, 30 UEsL = 31680 cycle/bit, λ = 50kbps, 30 UEs

Figure 5. Average power consumption versus Lyapunov optimization tradeoffparameter.

computation rate cannot be less than the task arrival rate tomaintain queue stability, i.e., 109/L ≥ λ. In this regard, wealso plot the curve of the maximum sustainable task arrivalrate λsus = 109/L without MEC servers in Fig. 4. Comparingthe curves of the MEC schemes with the maximum sustainabletask arrival rate, we can find

ζ > 1, when λ > 109/L,

ζ = 1, when λ = 109/L,

ζ < 1, otherwise.

That is, λsus is the watershed between task offloading and localcomputation. More than 50% of arrival tasks are offloaded ifthe task arrival rate is higher than λsus.

-10 -5 0 5 10 15 20 25 30Average power consumption (dBm)

10-1

100

101

102

103

Tasksplitratioζ

λ = 50kbps, total consumptionλ = 50kbps, local computationλ = 100kbps, total consumptionλ = 100kbps, local computationλ = 200kbps, total consumptionλ = 200kbps, local computation

Total consumption

Local consumption

(a) L = 8250 cycle/bit and 30 UEs.

-10 -5 0 5 10 15 20 25 30Average power consumption (dBm)

10-1

100

101

102

103

Tasksplitratioζ

L = 8250 cycle/bit, 80 UEs, total consumptionL = 8250 cycle/bit, 80 UEs, local computationL = 8250 cycle/bit, 30 UEs, total consumptionL = 8250 cycle/bit, 30 UEs, local computationL = 31680 cycle/bit, 30 UEs, total consumptionL = 31680 cycle/bit, 30 UEs, local computation

L = 8250 cycle/bit,total comsumption

L = 31680 cycle/bit

L = 8250 cycle/bit,local computation

(b) λ = 50 kbps

Figure 6. Task split ratio versus average power costs of local computationand total consumption.

Varying the Lyapunov tradeoff parameter V, we show thecorresponding average power consumption in Fig. 5 and fur-ther break down the power cost in Fig. 6. As realized from theobjective function of problem P1-2’, total power consumptionis reduced with increasing V in all network settings, and theminimal power cost can be asymptotically approached. Forany given total power consumption of Fig. 5, we investigate, inFig. 6, the corresponding task split ratio and the average powerconsumed by local computation. When less power is con-sumed, more tasks are assigned to the task-offloading queue.In other words, if the UE is equipped with weak computationcapability or has lower power budget, offloading most tasks(i.e., increasing ζ) helps to meet the URLLC requirements.Given that the same fraction of arrival tasks with a specific



14

80 85 90 95 100 105 110 115 120 125 130 135 140

Task arrival rate λ (kbps)

0.3

1

2

3

4

5

6Averag

eend-to-enddelay

(sec)

Non-MECFully-offloading, 80 UEs

10 20 30 40 50 60 70 80 90 100 110 120 130 140


0.06

0.09

0.12

0.15

0.18

0.21

0.24

0.27

0.3

Averag

eend-to-enddelay

(sec)

Non-MEC

Partially-offloading, 80 UEs

Partially-offloading, 30 UEs

Fully-offloading, 80 UEs

Fully-offloading, 30 UEs

(a) Average end-to-end delay.

10 20 30 40 50 60 70 80 90 100 110 120 130 140


0

5

10

15

20

25

30

35

40

45

50

99th

percentile

(kbit)

Non-MECFully-offloading, 80 UEsLocal-computation queue, 80 UEsTask-offloading queue, 80 UEs

(b) 99th percentile of the UE’s queue length with 80 UEs.

10 20 30 40 50 60 70 80 90 100 110 120 130 140


0

5

10

15

20

25

30

35

40

45

50

99th

percentile

(kbit)


(c) 99th percentile of the UE’s queue length with 30 UEs.

10 20 30 40 50 60 70 80 90 100 110 120 130 140


100

101

102

Meanan

dstan

darddeviation

(SD)of

exceedan

ce(kbit)

Mean, fully-offloadingSD, fully-offloadingMean, Task-offloading queueSD, Task-offloading queue

(d) Mean and standard deviation of exceedances with 30 UEs.

Figure 7. 1) Average end-to-end delay, 2) 99th percentile of the UE’s queue length, and 3) mean and standard deviation of exceedances over the 99thpercentile queue length, versus task arrival rate with L = 8250 cycle/bit.

processing density L is computed locally, the same portion ofpower will be consumed in local computation regardless ofthe arrival rate λ. In this regard, as shown in Fig. 6(a), thepower consumption gap between local computation and totalconsumption is around 5 dB at ζ = 10 for different values of λ.However, computation-intense tasks require higher CPU cyclesin local computation. Comparing the curves in Figs. 6(a)and 6(b), we can find that the gap is smaller with a largerL. Moreover, since higher rates and more intense processingdensities demand more resources for task execution, the UEconsumes more power when these two parameters L and λincrease as expected. Additionally, for the specific values ofL, λ and ζ, the locally-executed task amounts and requiredcomputational resources in local computation are identical for

different numbers of UEs. Thus, as shown in Fig. 6(b), thelocal power consumption is the same regardless of the networkdensity, but each UE consumes more transmit power (and totalconsumption) due to stronger transmission interference in thedenser network.

In addition to the discussed MEC architecture of thiswork, we consider another two baselines for performancecomparison: (i) a baseline with no MEC servers for offloading,and (ii) a baseline that offloads all the tasks to the MECserver due to the absence of local computation capability.In the following part, we compare the performance of theproposed method with these two baselines for various taskarrival rates in Fig. 7 and various processing densities in Fig. 8.We first compare the average end-to-end delay in Figs. 7(a)



15

0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5

Processing density L (cycle/bit)×10

4

1

2

3

4

5

6Averag

eend-to-enddelay

(sec)

Non-MECFully-offloading, 80 UEs

0 0.4 0.8 1.2 1.6 2 2.4 2.8 3.2 3.6 4


4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Averag

eend-to-enddelay

(sec)

Non-MECPartially-offloading, 80 UEsPartially-offloading, 30 UEsFully-offloading, 80 UEsFully-offloading, 30 UEs

(a) Average end-to-end delay.

0 0.4 0.8 1.2 1.6 2 2.4 2.8 3.2 3.6 4


4

0

20

40

60

80

100

120

140

160

180

200

99th

percentile

(kbit)


(b) 99th percentile of the UE’s queue length with 80 UEs.

0 0.4 0.8 1.2 1.6 2 2.4 2.8 3.2 3.6 4


4

0

20

40

60

80

100

120

140

160

180

200

99th

percentile

(kbit)


(c) 99th percentile of the UE’s queue length with 30 UEs.

0.4 0.8 1.2 1.6 2 2.4 2.8 3.2 3.6 4


4

100

101

102

Meanan

dstan

darddeviation

(SD)of

exceedan

ce(kbit)

Mean, fully-offloadingSD, fully-offloadingMean, Task-offloading queueSD, Task-offloading queue

(d) Mean and standard deviation of exceedances with 30 UEs.

Figure 8. 1) Average end-to-end delay, 2) 99th percentile of the UE’s queue length, and 3) mean and standard deviation of exceedances over the 99thpercentile queue length, versus processing density with λ = 100 kbps.

and 8(a). For the very small arrival rate and processing densityrequirement, the UE’s computation capability is sufficient toexecute tasks rapidly, whereas transmission delay and the extraqueuing delay incurred at the server degrade the performancein the fully-offloading and partially-offloading schemes. Whileincreasing L or λ, the UE’s computation capability becomesless supportable as mentioned previously. The two (fully- andpartially-) offloading schemes will eventually provide betterdelay performance in various network settings. Additionally,for the offloading schemes, the average end-to-end delay islarger in the denser network due to stronger interference andlonger waiting time for the server’s computational resources.Compared with the fully-offloading schemes, our approach hasa remarkable delay performance improvement in the denser

network since the fully-offloaded tasks incur tremendous queu-ing delay. In addition to the average end-to-end delay, wealso show the 99th percentile of the UE’s queue length, i.e.,q99 := F−1Q (0.99), as a reliability measure in Figs. 7(b), 7(c),8(b), and 8(c). When the proposed approach outperforms thenon-MEC scheme in terms of the average end-to-end delay,the local-computation queue has a lower 99th percentile thanthe 99th percentile queue length of the non-MEC scheme.Similar results can be found for the task-offloading queueand the queue of the fully-offloading scheme if our proposedapproach has a lower average end-to-end delay. In the 30-UEcase, although our proposed approach has a higher averageend-to-end delay than the fully-offloading scheme, splitting thetasks decreases the loading of each queue buffer and results



16

30 35 40 45 50 55 60 65 70 75 80Number of UEs

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4Average

end-to-enddelay

(sec)

Proposed, L = 8250 cycle/bit, λ = 150kbpsRSS-based, L = 8250 cycle/bit, λ = 150kbpsProposed, L = 31680 cycle/bit, λ = 50kbpsRSS-based, L = 31680 cycle/bit, λ = 50kbpsProposed, L = 36992 cycle/bit, λ = 40kbpsRSS-based, L = 36992 cycle/bit, λ = 40kbps

Proposed

RSS-based

(a) Four MEC servers with 2, 4, 8, 16 CPU cores.

30 35 40 45 50 55 60 65 70 75 80

Number of UEs

0.12

0.13

0.14

0.15

0.16

0.17

0.18

0.19

0.2

0.21

0.22

Average

end-to-enddelay

(sec)

Proposed, L = 8250 cycle/bit, λ = 150kbpsRSS-based, L = 8250 cycle/bit, λ = 150kbpsProposed, L = 31680 cycle/bit, λ = 50kbpsRSS-based, L = 31680 cycle/bit, λ = 50kbpsProposed, L = 36992 cycle/bit, λ = 40kbpsRSS-based, L = 36992 cycle/bit, λ = 40kbps

(b) Four MEC servers with 8, 8, 8, 8 CPU cores.

Figure 9. Average end-to-end delay versus number of UEs for different UE-server association schemes.

the lower queue length. Let us zoom in on the excess queuevalue over the 99th percentile in the 30-UE case, the mean andstandard deviation of exceedances, i.e., E[Q − q99|Q > q99]and

√Var(Q− q99|Q > q99), are investigated in Figs. 7(d)

and 8(d), where we can find that our approach has a smalleramount and more concentrated extent of the extreme events.Although fully offloading tasks achieves lower delay in thesparse network, the partially-offloading scheme can lowerthe loading in the queue buffer. In practice, if the queuebuffer is equipped with the finite-size storage, our proposedapproach can properly address the potential overflowing issueand achieve more reliable task computation.

Finally in Figs. 9 and 10, we show the advantage of ourproposed UE-server association approach in the heterogeneous

10 15 20 25 30 35 40 45 50

SNR (dB)

10-4

10-3

10-2

10-1

100

CDF

ProposedRSS-based

(a) Four MEC servers with 2, 4, 8, 16 CPU cores.

10 15 20 25 30 35 40 45 50

SNR (dB)

10-4

10-3

10-2

10-1

100

CDF

ProposedRSS-based

(b) Four MEC servers with 8, 8, 8, 8 CPU cores.

Figure 10. CDF of the wireless channel SNR between the UE and theassociated MEC server for different UE-server association schemes withL = 8250 cycle/bit, λ = 150 kbps, and 80 UEs.

MEC architecture. As a baseline, we consider the mechanismin which the UE accesses the MEC server with the highestRSS. Provided that the deployed MEC servers have differentcomputation capabilities (in terms of CPU cores), associatingsome UEs with the stronger-capability server but the lowerRSS can properly balance the servers’ loadings although thetransmission rates are sacrificed. As a result, compared withthe baseline, the waiting time for the server’s computationalresources and the end-to-end delay of the proposed associationapproach are alleviated. The advantage is more prominent inthe dense network since there are more UEs waiting for theserver’s resources. If the servers’ computation capabilities areidentical, associating the UE with the server with the lower



17

RSS does not give a computation gain. Thus, the proposedapproach and baseline have identical association outcomeand delay performance irrespective of the network setting.We further show the CDF of the wireless channel signal-to-noise ratio (SNR), measured as Pmax

i E[hij ]N0W

, between the UEand its associated server in Fig. 10. When the MEC serversare homogeneous, the proposed approach and baseline havesimilar association results as mentioned above. This is verifiedby the 1 dB gap at the 1st percentile of the SNR. Nevertheless,in the heterogeneous MEC architecture, the gap is 7 dB at the1st percentile of the SNR and reduces to 1 dB until the 30thpercentile.

VI. CONCLUSIONS

The goal of this work is to propose a new low-latency andreliable communication-computing system design for enablingmission-critical applications. In this regard, the URLLC re-quirement has been formulated with respect to the thresholddeviation probability of the task queue length. By leveragingextreme value theory, we have characterized the statistics ofthe threshold deviation event with a low occurrence probabilityand imposed another URLLC constraint on the high-oderstatistics. The studied problem has been cast as UEs’ com-putation and communication power minimization subject tothe URLLC constraints. Furthermore, incorporating techniquesfrom Lyapunov stochastic optimization and matching theory,we have proposed a two-timescale framework for UE-serverassociation, task offloading, and resource allocation. UE-serverassociation is formulated as a many-to-one matching gamewith externalities which is addressed, in the long timescale,via the notion of swap matching. In every time slot, each UEallocates the computation and communication resources, andsplits the task arrivals for local computation and offloading.In parallel, each server schedules its multiple CPU cores tocompute the UEs’ offloaded tasks. Numerical results haveshown the effectiveness of characterizing the extreme queuelength using extreme value theory. Partially offloading tasksprovides more reliable task computation in contrast with thenon-MEC and fully-offloading schemes. When the servershave different computation capabilities, the proposed UE-server association approach achieves lower delay than theRSS-based baseline, particularly in denser networks. In ourfuture work, we will investigate distributed machine learningtechniques [45] to further reduce latency and reliability.

APPENDIX APROOF OF LEMMA 1

We first express the Lagrangian of problem P1-2’ as

(βj∗i(t)− βO

i (t))τW × EIij∗

[log2

(1 +

Pi(t)hij∗(t)

N0W + Iij∗(t)

)]− βL

i (t)τfi(t)

Li+ γ(κ[fi(t)]

3 + Pi(t)− Pmaxi

)+ V

(κ[fi(t)]

3 + Pi(t))− α1fi(t)− α2Pi(t), (43)

where γ, α1, and α2 are the Lagrange multipliers. Taking thepartial differentiations of (43) with respect to fi and Pi, wehave

∂(43)∂fi

= −βLi (t)τ

Li+ 3κ(V + γ)[fi(t)]

2 − α1,

∂(43)∂Pi

= EIij∗[ (

βj∗i(t)− βOi (t)

)τWhij∗

(N0W + Iij∗ + Pihij∗) ln 2

]+ V + γ − α2.

Subsequently, since P1-2’ is a convex optimization problem,we apply the Karush–Kuhn–Tucker (KKT) conditions to de-rive the optimal solution in which the optimal CPU-cyclef∗i (t), optimal transmit power P ∗i (t), and optimal Lagrangemultipliers (i.e., γ∗, α∗1, and α∗2) satisfy

f∗i (t) =

√√√√α∗1 +βLi (t)τ

Li

3κ(V + γ∗), (44)

EIij∗[ (


)τWhij∗

(N0W + Iij∗ + P ∗i (t)hij∗(t)) ln 2

]= V + γ∗ − α∗2,

(45)f∗i (t) ≥ 0, α∗1 ≥ 0, α∗1f

∗i (t) = 0, (46)

P ∗i (t) ≥ 0, α∗2 ≥ 0, α∗2Pi(t) = 0, (47)

and κ[f∗i (t)]3 + P ∗i (t)− Pmax

i ≤ 0,

γ∗ ≥ 0,

γ(κ[f∗i (t)]3 + P ∗i (t)− Pmax

i

)= 0.

(48)

From (44) and (46), we deduce

f∗i (t) =

√βLi (t)τ

3Liκ(V + γ∗).

Additionally, from (45) and (47), we can find that if

EIij∗[ (βO

i (t)− βj∗i(t))τWhij∗

(N0W + Iij∗) ln 2

]> V + γ∗,

we have a positive optimal transmit power P ∗ij > 0 whichsatisfies

EIij∗[ (


)τWhij∗

(N0W + Iij∗ + P ∗i hij∗) ln 2

]= V + γ∗.

Otherwise, P ∗i = 0. Moreover, note that γ∗ is 0 if κ[f∗i (t)]3 +P ∗i (t) < Pmax

i . When γ∗ > 0, κ[f∗i (t)]3 + P ∗i (t) = Pmaxi .

REFERENCES

[1] C.-F. Liu, M. Bennis, and H. V. Poor, “Latency and reliability-awaretask offloading and resource allocation for mobile edge computing,” inProc. IEEE Global Commun. Conf. Workshops, Dec. 2017, pp. 1–7.

[2] M. Chiang and T. Zhang, “Fog and IoT: An overview of researchopportunities,” IEEE Internet Things J., vol. 3, no. 6, pp. 854–864, Dec.2016.

[3] M. Chiang, S. Ha, C.-L. I, F. Risso, and T. Zhang, “Clarifying fogcomputing and networking: 10 questions and answers,” IEEE Commun.Mag., vol. 55, no. 4, pp. 18–20, Apr. 2017.

[4] S. Wang, X. Zhang, Y. Zhang, L. Wang, J. Yang, and W. Wang, “Asurvey on mobile edge networks: Convergence of computing, cachingand communications,” IEEE Access, vol. 5, pp. 6757–6779, 2017.

[5] Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, “A surveyon mobile edge computing: The communication perspective,” IEEECommun. Surveys Tuts., vol. 19, no. 4, pp. 2322–2358, Fourth quarter2017.



18

[6] C. Mouradian, D. Naboulsi, S. Yangui, R. H. Glitho, M. J. Morrow, andP. A. Polakos, “A comprehensive survey on fog computing: State-of-the-art and research challenges,” IEEE Commun. Surveys Tuts., vol. 20,no. 1, pp. 416–464, First quarter 2018.

[7] J. Kwak, Y. Kim, J. Lee, and S. Chong, “DREAM: Dynamic resourceand task allocation for energy minimization in mobile cloud systems,”IEEE J. Sel. Areas Commun., vol. 33, no. 12, pp. 2510–2523, Dec. 2015.

[8] Y. Kim, J. Kwak, and S. Chong, “Dual-side optimization for cost-delaytradeoff in mobile edge computing,” IEEE Trans. Veh. Technol., vol. 67,no. 2, pp. 1765–1781, Feb. 2018.

[9] Z. Jiang and S. Mao, “Energy delay tradeoff in cloud offloading formulti-core mobile devices,” IEEE Access, vol. 3, pp. 2306–2316, 2015.

[10] Y. Mao, J. Zhang, S. H. Song, and K. B. Letaief, “Power-delay tradeoffin multi-user mobile-edge computing systems,” in Proc. IEEE GlobalCommun. Conf., Dec. 2016, pp. 1–6.

[11] ——, “Stochastic joint radio and computational resource managementfor multi-user mobile-edge computing systems,” IEEE Trans. WirelessCommun., vol. 16, no. 9, pp. 5994–6009, Sep. 2017.

[12] S. Mao, S. Leng, K. Yang, Q. Zhao, and M. Liu, “Energy efficiency anddelay tradeoff in multi-user wireless powered mobile-edge computingsystems,” in Proc. IEEE Global Commun. Conf., Dec. 2017, pp. 1–6.

[13] J. Xu, L. Chen, and S. Ren, “Online learning for offloading andautoscaling in energy harvesting mobile edge computing,” IEEETrans. Cogn. Commun. Netw., vol. 3, no. 3, pp. 361–373, Sep. 2017.

[14] Y. Sun, S. Zhou, and J. Xu, “EMM: Energy-aware mobility managementfor mobile edge computing in ultra dense networks,” IEEE J. Sel. AreasCommun., vol. 35, no. 11, pp. 2637–2646, Nov. 2017.

[15] S. Ko, K. Han, and K. Huang, “Wireless networks for mobile edgecomputing: Spatial modeling and latency analysis,” IEEE Trans. WirelessCommun., vol. 17, no. 8, pp. 5225–5240, Aug. 2018.

[16] R. Deng, R. Lu, C. Lai, T. H. Luan, and H. Liang, “Optimal workloadallocation in fog-cloud computing toward balanced delay and powerconsumption,” IEEE Internet Things J., vol. 3, no. 6, pp. 1171–1181,Dec. 2016.

[17] G. Lee, W. Saad, and M. Bennis, “An online secretary framework forfog network formation with minimal latency,” in Proc. IEEE Int. Conf.Commun., May 2017, pp. 1–6.

[18] Q. Fan and N. Ansari, “Workload allocation in hierarchical cloudletnetworks,” IEEE Commun. Lett., vol. 22, no. 4, pp. 820–823, Apr. 2018.

[19] ——, “Application aware workload allocation for edge computing-basedIoT,” IEEE Internet Things J., vol. 5, no. 3, pp. 2146–2153, Jun. 2018.

[20] ——, “Towards workload balancing in fog computing empowered IoT,”IEEE Trans. Netw. Sci. Eng., 2019, to be published.

[21] M. Molina, O. Muñoz, A. Pascual-Iserte, and J. Vidal, “Joint schedulingof communication and computation resources in multiuser wirelessapplication offloading,” in Proc. IEEE 25th Annu. Int. Symp. Pers.,Indoor, Mobile Radio Commun., Sep. 2014, pp. 1093–1098.

[22] H. Zhang, Y. Xiao, S. Bu, D. Niyato, F. R. Yu, and Z. Han, “Computingresource allocation in three-tier IoT fog networks: A joint optimizationapproach combining Stackelberg game and matching,” IEEE InternetThings J., vol. 4, no. 5, pp. 1204–1215, Oct. 2017.

[23] B. Soret, P. Mogensen, K. I. Pedersen, and M. C. Aguayo-Torres,“Fundamental tradeoffs among reliability, latency and throughput incellular networks,” in Proc. IEEE Global Commun. Conf. Workshops,Dec. 2014, pp. 1391–1396.

[24] M. Bennis, M. Debbah, and H. V. Poor, “Ultrareliable and low-latencywireless communication: Tail, risk, and scale,” Proc. IEEE, vol. 106,no. 10, pp. 1834–1853, Oct. 2018.

[25] C.-P. Li, J. Jiang, W. Chen, T. Ji, and J. Smee, “5G ultra-reliable andlow-latency systems design,” in Proc. Eur. Conf. Net. Commun., Jun.2017, pp. 1–5.

[26] M. J. Neely, Stochastic Network Optimization with Application toCommunication and Queueing Systems. San Rafael, CA, USA: Morganand Claypool, Jun. 2010.

[27] S. Coles, An Introduction to Statistical Modeling of Extreme Values.London, U.K.: Springer, 2001.

[28] A. E. Roth and M. A. O. Sotomayor, Two-Sided Matching: A Studyin Game-Theoretic Modeling and Analysis. New York, N.Y., USA:Cambridge University Press, 1992.

[29] T. D. Burd and R. W. Brodersen, “Processor design for portablesystems,” J. VLSI Signal Process. Syst., vol. 13, no. 2-3, pp. 203–221,Aug. 1996.

[30] P. Mach and Z. Becvar, “Mobile edge computing: A survey on architec-ture and computation offloading,” IEEE Commun. Surveys Tuts., vol. 19,no. 3, pp. 1628–1656, Third quarter 2017.

[31] C. You, K. Huang, H. Chae, and B. Kim, “Energy-efficient resource al-location for mobile-edge computation offloading,” IEEE Trans. WirelessCommun., vol. 16, no. 3, pp. 1397–1411, Mar. 2017.

[32] J. D. C. Little, “A proof for the queuing formula: L = λW ,” OperationsResearch, vol. 9, no. 3, pp. 383–387, 1961.

[33] R. Hemmecke, M. Köppe, J. Lee, and R. Weismantel, “Nonlinear integerprogramming,” in 50 Years of Integer Programming 1958-2008: Fromthe Early Years to the State-of-the-Art, M. Jünger, T. M. Liebling,D. Naddef, G. L. Nemhauser, W. R. Pulleyblank, G. Reinelt, G. Rinaldi,and L. A. Wolsey, Eds. Berlin/Heidelberg, Germany: Springer, 2010,ch. 15, pp. 561–618.

[34] Y. Gu, W. Saad, M. Bennis, M. Debbah, and Z. Han, “Matching theoryfor future wireless networks: Fundamentals and applications,” IEEECommun. Mag., vol. 53, no. 5, pp. 52–59, May 2015.

[35] J. Zhao, Y. Liu, K. K. Chai, Y. Chen, and M. Elkashlan, “Many-to-manymatching with externalities for device-to-device communications,” IEEEWireless Commun. Lett., vol. 6, no. 1, pp. 138–141, Feb. 2017.

[36] M. S. Elbamby, M. Bennis, and W. Saad, “Proactive edge computing inlatency-constrained fog networks,” in Proc. Eur. Conf. Netw. Commun.,Jun. 2017, pp. 1–6.

[37] C. Perfecto, J. Del Ser, and M. Bennis, “Millimeter-wave V2V com-munications: Distributed association and beam alignment,” IEEE J. Sel.Areas Commun., vol. 35, no. 9, pp. 2148–2162, Sep. 2017.

[38] B. Di, L. Song, and Y. Li, “Sub-channel assignment, power allocation,and user scheduling for non-orthogonal multiple access networks,” IEEETrans. Wireless Commun., vol. 15, no. 11, pp. 7686–7698, Nov. 2016.

[39] E. Bodine-Baron, C. Lee, A. Chong, B. Hassibi, and A. Wierman,“Peer effects and stability in matching markets,” in Proc. 4th Int. Symp.Algorithmic Game Theory, 2011, pp. 117–129.

[40] A. Al-Shuwaili and O. Simeone, “Energy-efficient resource allocationfor mobile edge computing-based augmented reality applications,” IEEEWireless Commun. Lett., vol. 6, no. 3, pp. 398–401, Jun. 2017.

[41] A. P. Miettinen and J. K. Nurminen, “Energy efficiency of mobile clientsin cloud computing,” in Proc. 2nd USENIX Conf. on Hot Topics in CloudComputing, Jun. 2010.

[42] Radiocommunication Sector of ITU, “P.1238-9: Propagation data andprediction methods for the planning of indoor radiocommunicationsystems and radio local area networks in the frequency range 300 MHzto 100 GHz,” ITU-R, Tech. Rep., Jun. 2017.

[43] Z. Pi and F. Khan, “System design and network architecture for amillimeter-wave mobile broadband (MMB) system,” in Proc. 34th IEEESarnoff Symp., May 2011, pp. 1–6.

[44] C.-F. Liu and M. Bennis, “Ultra-reliable and low-latency vehiculartransmission: An extreme value theory approach,” IEEE Commun. Lett.,vol. 22, no. 6, pp. 1292–1295, Jun. 2018.

[45] J. Park, S. Samarakoon, M. Bennis, and M. Debbah, “Wireless networkintelligence at the edge,” CoRR, vol. abs/1812.02858, pp. 1–32, 2018.

Chen-Feng Liu (S’17) received the B.S. degreefrom National Tsing Hua University, Hsinchu, Tai-wan, in 2009, and the M.S. degree in communica-tions engineering from National Chiao Tung Uni-versity, Hsinchu, in 2011. He is currently pursuingthe Ph.D. degree with the University of Oulu, Oulu,Finland. In 2012, he joined Academia Sinica asa Research Assistant. In 2014, he was a VisitingResearcher with the Singapore University of Tech-nology and Design, Singapore. He was also a Vis-iting Ph.D. Student with the University of Houston,

Houston, TX, USA, and New York University, New York, NY, USA, in 2016and 2018, respectively. His current research interests include 5G communi-cations, mobile edge computing, ultra-reliable low latency communications,and wireless artificial intelligence.



19

Mehdi Bennis (S’07–AM’08–SM’15) received theM.Sc. degree from the École Polytechnique Fédéralede Lausanne, Switzerland, and the Eurecom In-stitute, France, in 2002, and the Ph.D. degree inelectrical engineering in 2009 focused on spectrumsharing for future mobile cellular systems. He iscurrently an Associate Professor with the Univer-sity of Oulu, Finland, and an Academy of FinlandResearch Fellow. He has authored or co-authoredover 100 research papers in international confer-ences, journals, book chapters, and patents. His main

research interests are in radio resource management, heterogeneous networks,game theory, and machine learning. He has given numerous tutorials atthe IEEE flagship conferences. He was also a recipient of several prizes,including the 2015 Fred W. Ellersick Prize from the IEEE CommunicationsSociety (ComSoc), the 2016 IEEE COMSOC Best Tutorial Prize, the 2017EURASIP Best Paper Award for the Journal on Wireless Communications andNetworking, and recently the Best Paper Award at the European Conferenceon Networks and Communications 2017.

Mérouane Debbah (S’01–AM’03–M’04–SM’08–F’15) received the M.Sc. and Ph.D. degrees fromthe École Normale Supérieure Paris-Saclay, France.In 1996, he joined the École Normale SupérieureParis-Saclay. He was with Motorola Labs, Saclay,France, from 1999 to 2002, and also with the ViennaResearch Center for Telecommunications, Vienna,Austria, until 2003. From 2003 to 2007, he wasan Assistant Professor with the Mobile Communica-tions Department, Institut Eurecom, Sophia Antipo-lis, France. From 2007 to 2014, he was the Director

of the Alcatel-Lucent Chair on Flexible Radio. Since 2007, he has beena Full Professor with CentraleSupélec, Gif-sur-Yvette, France. Since 2014,he has been a Vice-President of the Huawei France Research Center andthe Director of the Mathematical and Algorithmic Sciences Lab. He hasmanaged 8 EU projects and more than 24 national and international projects.His research interests lie in fundamental mathematics, algorithms, statistics,information, and communication sciences research. He is an IEEE Fellow,a WWRF Fellow, and a Membre émérite SEE. He was a recipient of theERC Grant MORE (Advanced Mathematical Tools for Complex NetworkEngineering) from 2012 to 2017. He was a recipient of the Mario BoellaAward in 2005, the IEEE Glavieux Prize Award in 2011, and the QualcommInnovation Prize Award in 2012. He received 19 best paper awards, amongwhich the 2007 IEEE GLOBECOM Best Paper Award, the Wi-Opt 2009 BestPaper Award, the 2010 Newcom++ Best Paper Award, the WUN CogComBest Paper 2012 and 2013 Award, the 2014 WCNC Best Paper Award,the 2015 ICC Best Paper Award, the 2015 IEEE Communications SocietyLeonard G. Abraham Prize, the 2015 IEEE Communications Society Fred W.Ellersick Prize, the 2016 IEEE Communications Society Best Tutorial PaperAward, the 2016 European Wireless Best Paper Award, the 2017 Eurasip BestPaper Award, the 2018 IEEE Marconi Prize Paper Award, and the Valuetools2007, Valuetools 2008, CrownCom 2009, Valuetools 2012, SAM 2014, and2017 IEEE Sweden VT-COM-IT Joint Chapter best student paper awards. Heis an Associate Editor-in-Chief of the journal Random Matrix: Theory andApplications. He was an Associate Area Editor and Senior Area Editor ofthe IEEE TRANSACTIONS ON SIGNAL PROCESSING from 2011 to 2013 andfrom 2013 to 2014, respectively.

H. Vincent Poor (S’72–M’77–SM’82–F’87) re-ceived the Ph.D. degree in EECS from PrincetonUniversity in 1977. From 1977 until 1990, he wason the faculty of the University of Illinois at Urbana-Champaign. Since 1990 he has been on the facultyat Princeton, where he is currently the MichaelHenry Strater University Professor of Electrical En-gineering. During 2006 to 2016, he served as Deanof Princeton’s School of Engineering and AppliedScience. He has also held visiting appointments atseveral other universities, including most recently at

Berkeley and Cambridge. His research interests are in the areas of informationtheory and signal processing, and their applications in wireless networks,energy systems and related fields. Among his publications in these areas isthe recent book Information Theoretic Security and Privacy of InformationSystems (Cambridge University Press, 2017).

Dr. Poor is a member of the National Academy of Engineering and theNational Academy of Sciences, and is a foreign member of the ChineseAcademy of Sciences, the Royal Society, and other national and internationalacademies. He received the Marconi and Armstrong Awards of the IEEECommunications Society in 2007 and 2009, respectively. Recent recognitionof his work includes the 2017 IEEE Alexander Graham Bell Medal, HonoraryProfessorships at Peking University and Tsinghua University, both conferredin 2017, and a D.Sc. honoris causa from Syracuse University also awardedin 2017.

dynamic task offloading and resource allocation...

Documents