sensitivity in communication networks: an optimization...
TRANSCRIPT
Sensitivity in Communication Networks: AnOptimization Theoretic Perspective
Malcolm EganINSA Lyon, INRIA
29th June 2018
1 / 33
Motivation
Figure: Zimmermann, IEEE Transactions on Communications, 28(4),1980.
Communication network design is aided by conceptual modelswith general structure:
I Open Systems Interconnection (OSI)
I TCP/IP
2 / 33
Motivation
I Another complementary model of communication:
Figure: Roth, Introduction to Coding Theory.
3 / 33
Motivation
I These models simplify design by breaking it into three steps:
1. Optimize each component.2. Optimize the coupling between each component.3. Repeat.
I Why? Joint optimization of the whole system is oftenintractable.
I Think of user scheduling + energy harvesting + power control+ security + ...
Question: Is there a way to characterize the impact of thecoupling between optimized components?
4 / 33
A Generic Problem in Networking
Consider two sets X and Y.
maximizex∈X
f (x ; y)
subject to gi (x ; y) ≤ 0, i ∈ Ihj(x ; y) = 0, j ∈ J
f , gi , hj are continuous on X × Y.
The parameter x ∈ X is the optimization variable.
The parameter y ∈ Y is the coupling variable or sideinformation.
The function f (x ; y) is the network utility function.
5 / 33
Example I: Power Control in Parallel Channels
Figure: LaSorte et al., Proc. GLOBECOM 2008.
Figure: Lozano et al., IEEE Trans. Info. Theory, 20066 / 33
Example I: Power Control in Parallel Channels
maximizepk∈R, k=1,2,...,n
n∑k=1
1
2log
(1 +|hk |2pkσ2N + Ik
)
subject ton∑
k=1
pk ≤ Pmax
pk ≥ 0, k = 1, 2, . . . , n.
The optimization variable is the power vector p = [p1, . . . , pn]
The coupling variables are Ik ,Pmax, σN .
The network utility function is the sum-rate.
7 / 33
Example II: Queue Stability
Figure: Moghadam et al., IEEE ICC, 2015
Figure: Afshang et al., Asilomar, 2015
8 / 33
Example II: Queue Stability
maximizep0,β0,m0
p0ρ0 log(1 + β0)e−ρpλ0κd2β2/α0
s.t.(
1− e−ρpλ0κd2β2/α0
)1+m0
≤ ε, θ0 = p0e−ρpλ0κd2β
2/α0
1−(
1− e−ρpλ0κd2β2/α0
)1+m0
θ0 ≥ p0
[1+m0∑i=1
(1 + m0
i
)(−1)i+1e−pλ0(i−1)κd2β
2/α0
]−1
> µ0.
Nardelli, P. et al., “Throughput optimization in wireless networks understability and packet loss constraints,” IEEE Transactions on MobileComputing, vol. 13, no. 8, pp. 1883-1895, 2014.
9 / 33
Example II: Queue Stability
maximizep0,β0,m0
p0ρ0 log(1 + β0)e−ρpλ0κd2β2/α0
s.t.(
1− e−ρpλ0κd2β2/α0
)1+m0
≤ ε, θ0 = p0e−ρpλ0κd2β
2/α0
1−(
1− e−ρpλ0κd2β2/α0
)1+m0
θ0 ≥ p0
[1+m0∑i=1
(1 + m0
i
)(−1)i+1e−pλ0(i−1)κd2β
2/α0
]−1
> µ0.
The optimization variables are the channel access probability,SINR threshold and number of retransmissions.
The coupling variables include the density of the network andpacket arrival rates.
The network utility function is the throughput. 10 / 33
Example III: Information Capacity in Non-Gaussian Noise
y(t) = r−η/20 h0(t) x0(t)︸ ︷︷ ︸
user
+ N(t)︸︷︷︸CN (0,σ2)
+∑
i∈Φ\{0}
r−η/2i hi (t)xi (t)
︸ ︷︷ ︸I(t)
Egan, M., et al., “Wireless communication in dynamic interference”, Proc.GLOBECOM, 2017.
de Freitas, M. et al., “Capacity bounds for additive symmetric alpha-stablenoise channels,” IEEE Trans. Info. Theory, 2017.
11 / 33
Example III: Information Capacity in Non-Gaussian Noise
Consider a stationary, memoryless additive noise channel
Y = X + N
supµ∈P
I (µ,PY |X )
subject to µ ∈ Λ
The optimization variable is µ an element of the set ofprobability measures on (C,B(C)).
The coupling variables are constraints Λ and the channel PY |X .
The network utility function is the mutual information.
12 / 33
A Strategy
I Consider a differentiable function f : Rn → R, which admits aTaylor series representation
f (x + ‖e‖e) = f (x) + ‖e‖Def (x)T e + o(‖e‖).
(e is unit norm).
I This yields
|f (x + ‖e‖e)− f (x)| ≤ ‖Def (x)‖‖e‖+ o(‖e‖),
i.e., the sensitivity.
Question: what is the directional derivative of the optimalvalue function of an optimization problem?
13 / 33
A Strategy
I In the case of vector, smooth optimization problems there is agood theory.
I E.g., the following classical proposition.
Proposition
Let the real valued function f (x, y) : Rn × R→ R be twicedifferentiable on a compact convex subset X of Rn+1, strictlyconcave in x. Let x∗(y) be the optimal value of f on X and denoteψ(y) = f (x∗(y), y). Then, the derivative of ψ(y) is
ψ′(y) = fy (x∗(y), y).
I Generalizations due to Danskin and Gol’shtein.I Non-convexity.I Vector coupling parameters.
14 / 33
A Strategy
The strategy applies directly to problems in networking withoptimization and coupling parameters in Rn.
For example:
I Example I: Power control
I Example II: Queue stability
Example III has optimization variables that do not lie in Rn. Wewill return to this case later.
15 / 33
Example I: Power Control in Parallel Channels
Recall we are interested in the optimization problem:
R∗(I ) = maximizepk∈R, k=1,2,...,n
n∑k=1
1
2log
(1 +|hk |2pkσ2N + I
)
subject ton∑
k=1
pk ≤ Pmax
pk ≥ 0, k = 1, 2, . . . , n.
Let p∗k be the optimal solution for I = 0.
Applying the strategy, we obtain
|R∗(I )− R∗(0)| ≤
∣∣∣∣∣n∑
k=1
1
2
|hk |2p∗kσ2N
1
σ2N + |hk |2p∗k
∣∣∣∣∣ |I |+ o(I ).
16 / 33
Example I: Power Control in Parallel Channels
|R∗(I )− R∗(0)| ≤
∣∣∣∣∣n∑
k=1
1
2
|hk |2p∗kσ2N
1
σ2N + |hk |2p∗k
∣∣∣∣∣ |I |+ o(I ).
Remarks:
1. To understand the impact of the interference, theoptimization problem only needs to be solved once.
2. The impact of the interference is an approximately linearchange in the bound.
3. Approach generalizes to MIMO channels (more later).
17 / 33
Example II: Queue Stability
Recall we are now concerned with the optimization problem
maximizep0,β0,m0
p0ρ0 log(1 + β0)e−ρpλ0κd2β2/α0
s.t.(
1− e−ρpλ0κd2β2/α0
)1+m0
≤ ε, θ0 = p0e−ρpλ0κd2β
2/α0
1−(
1− e−ρpλ0κd2β2/α0
)1+m0
θ0 ≥ p0
[1+m0∑i=1
(1 + m0
i
)(−1)i+1e−pλ0(i−1)κd2β
2/α0
]−1
> µ0.
This is a non-convex problem. What is the impact of theprobability the transmitter has a non-empty queue, ρ0?
18 / 33
Example II: Queue Stability
Theorem (Danskin’s Theorem)
Let X be a metric space and U be a normed space and let X be acompact subset of X . Suppose that for all x ∈ X , the functionf (x , ·) is differentiable, that f (x , u) and Duf (x , u) are continuouson X × U. If X is finite dimensional and the directional derivativeof the optimal value function v(u) = infx∈X f (x , u) is continuous,then
v ′(u, d) = minx∈S(u)
Duf (x , u)d ,
where v ′(u, d) is the directional derivative of v(u) in the directiond , and S(u) ⊂ X is1 the set of optimal x ∈ X for fixed u ∈ U.
1Note that the feasible set X is independent of u.19 / 33
Example II: Queue Stability
Translating this result into our setting, let Λ(ρ0,r ) be the set ofoptimal p∗0 , β
∗0 ,m
∗0 (channel access probability, SINR threshold and
number of retransmissions), for a given ρ0,r .
Then, for any ρ0 ∈ [0, 1],
|R∗(ρ0,r )− R∗(ρ0)|
≤∣∣∣∣ max(p∗0 ,β
∗0 ,m
∗0 )∈Λ(ρ0,r )
p∗0 log(1 + β∗0)e−ρpλ0κd2(β∗0 )2/α
∣∣∣∣× |ρ0,r − ρ0|+ o(|ρ0,r − ρ0|)
20 / 33
Example II: Queue Stability
Remarks:
1. It is possible to apply sensitivity analysis techniques fornon-convex problems.
2. Provides a means of decoupling the problem of ensuring thequeue is occupied to the problem of designing the accessprotocol.
3. In principle possible to work in the infinite dimensional settingif an appropriate derivative exists.
When do infinite dimensional problems arise?
21 / 33
Example III: Information Capacity in Non-Gaussian Noise
I Consider general constraints (allowing for continuous inputs).
C (Λ) = supµ∈P
I (X ;Y )
subject to µ ∈ Λ,
I E.g., Λ = Λp = {µ : Eµ[|X |p] ≤ b}.
I The discrete approximation of C (Λ) is then defined as
C (Λ∆) = supµ∈P
I (X ;Y )
subject to µ ∈ Λ∆,
where
Λ∆ = ∪∆>∆P(∆Z) ∩ Λ
∆Z = {∆z : z ∈ Z}
22 / 33
Example III: Information Capacity in Non-Gaussian Noise
I The capacity sensitivity is:
CΛ→Λ∆= |C (Λ)− C (Λ∆)|,
I I.e., the cost of discreteness.
Question: how do we bound the capacity sensitivity when theconstraint is perturbed?
Egan, M., Perlaza, S.M. and Kungurtsev, V., “Capacity sensitivity inadditive non-Gaussian noise channels,” Proc. IEEE InternationalSymposium on Information Theory, Aachen, Germany, Jun. 2017.
Egan, M. and Perlaza, S.M., “Capacity approximation of continuouschannels by discrete inputs,” Proc. CISS Invited Paper, 2018.
23 / 33
Example III: Information Capacity in Non-Gaussian Noise
Question: how do we bound the capacity sensitivity when theconstraint is perturbed?
A recipe:
(i) Establish continuity.
(ii) Obtain a bound via regular subgradients.
24 / 33
Example III: Information Capacity in Non-Gaussian Noise
Theorem (Egan, Perlaza 2018)
Let Λ be a non-empty compact subset of P. If the mutualinformation I (·, pN) is weakly continuous on Λ, thenC (Λ∆)→ C (Λ) as ∆→ 0.
(i) Gaussian modelI pN(x) = 1√
2πσ2exp
(−x2/(2σ2)
), σ > 0.
I Λ = {µ : Eµ[X 2] ≤ b}, b > 0.
(ii) Cauchy modelI pN(x) = 1
πγ(
1+( xγ )2
) , γ > 0.
I Λ = {µ : Eµ[|X |r ] ≤ b}, b > 0.
(iii) Inverse Gaussian model
I pN(x) =√
λ2πx3 exp
(−λ(x−γ)2
2γ2x
), x > 0, λ, γ > 0.
I Λ = {µ : Eµ[X ] ≤ b}, b > 0.
25 / 33
Example III: Information Capacity in Non-Gaussian Noise
DefinitionConsider a function f : Rn → R and a point x ∈ Rn with f (x)finite. For a vector, v ∈ Rn, v is a regular subgradient of f at x,denoted by v ∈ ∂f (x), if there exists δ > 0 such that for allx ∈ Bδ(x)
f (x) ≥ f (x) + vT (x− x) + o(|x− x|).
I Related to subgradients in convex optimization.
I What are conditions for existence?
26 / 33
Example III: Information Capacity in Non-Gaussian Noise
Theorem (Rockafellar and Wets 1997)
Suppose f : Rn → R is finite and lower semicontinuous at x ∈ Rn.Then, there exists a sequence xk →
fx with ∂f (xk) 6= ∅ for all k.
Rockafellar, R. and Wets, R., Variational Analysis. Berlin Heidelbeg:Springer-Verlag, 1997
27 / 33
Example III: Information Capacity in Non-Gaussian Noise
Theorem (Egan, Perlaza 2018)
Suppose that Λ is a non-empty compact subset of P and themutual information I : P → R is weakly continuous on Λ . IfC = supµ∈Λ I (µ, pN) <∞, then for all ε > 0 there exists v ∈ Rsuch that for ∆ sufficiently small,
C (Λ)− C (Λ∆)− ε ≤ |v |∆ + o(∆)
holds.
28 / 33
Discussion: Design Implications
29 / 33
Discussion: Design Implications
I Desirable to optimize the power control p:
R∗(vec(W)) = maxp
k∑i=1
log
1 +|h†iwi |2 pi
‖wi‖2
σ2 +∑k
j=1,j 6=i |h†iwj |2
pj‖wj‖2
subject to
k∑i=1
pi ≤ pmax
pi ≥ 0, i = 1, . . . , k.
(1)
I No simple solution! (Except when W = I)
I What is the effect of changing the precoder?
30 / 33
Discussion: Design Implications
10−3
10−2
10−1
100
0
0.1
0.2
0.3
0.4
0.5
0.6
Error Norm
Avera
ge R
ate
Loss (
nats
)
Approximate UB, n = 4
Approximate UB, n = 2
LB, n = 4
LB, n = 2
Figure: Plot of rate-loss for varying error norm ‖e‖, with pmax = 5dB andn = k antennas.
31 / 33
Discussion: Design Implications
In the MIMO optimization problem, using more complicated (buthigher performance) signal processing leads to higher complexityalgorithms.
Sensitivity analysis provides a means of understanding thecomplexity-performance tradeoffs involved in system optimization
Sensitivity analysis allows us to ask when theperformance gains are worth the complexity costs.
32 / 33
Conclusions
The design of modern communication networks involves theoptimization of many coupled components.
Sensitivity analysis of optimization problems provides tools in orderto understand how coupling impacts optimized components.
Key applications of sensitivity analysis:
I Coupling parameter selection.
I Impact of imperfect models.
Well developed theory waiting to be applied.
33 / 33