on kaczmarz signal processing technique in massive mimo
Post on 04-Feb-2022
1 Views
Preview:
TRANSCRIPT
arX
iv:1
904.
0437
6v1
[ee
ss.S
P] 8
Apr
201
91
On Kaczmarz Signal Processing Technique in
Massive MIMO
Victor Croisfelt Rodrigues∗ and Taufik Abrao∗
∗ Department of Electrical Engineering (DEEL), State University of Londrina (UEL),
Londrina, Brazil
E-mail: victorcroisfelt@gmail.com, taufik@uel.br
Abstract
To exploit the benefits of massive multiple-input multiple-output (M-MIMO) technology in scenarios where
base stations need to be inexpensive and thus possibly equipped with simple hardware, the computational
complexity of classical signal processing schemes for spatial multiplexing must be reduced. This calls for
suboptimal or ”relaxed” schemes that perform well the required combining/precoding steps and simultaneously
achieve low computational complexities. An approach based on the iterative Karczmarz algorithm (KA) has
been recently investigated with this intention, assuring to execute well without the knowledge of any a priori
statistics of the channel responses. Herein, modifications are proposed on this first KA-based attempt, aiming
to improve its rate of convergence with a rising performance-computational complexity trade-off solution for
the M-MIMO combining/precoding problems. Such improvements are motivated by observed negative effects
of pathloss and shadowing, originally presented here, over the rate of convergence granted by the KA, which
consequently worsens its achievable spectral efficiency (SE). Specifically, the randomized version of KA (rKA)
is used when conceiving the signal processing schemes due to its global convergence properties, where our
adaptations can be seen as a fairness policy for algorithm initialization that guarantees the converge of all
combining/precoding vectors, whether they be from center located users or those located at the edge of the
cell. Two different rKA-based schemes are in fact discussed, of which only one is considered applicable,
given the constraints placed by the underlying scenario. To ensure that the proposed method outclasses the
original proposal, a mathematical characterization is presented along with numerical results that employ more
realistic system and channel conditions. More precisely, the simulations embrace the least-squares and minimum
mean-squared error estimators of the channel responses, also taking into account uncorrelated and correlated
Rayleigh fading channels. The spatial correlation of the latter is modeled using the exponential correlation
model combined with large-scale fading variations over the array, which compound is demonstrated to be
prejudicial to algorithm performance. The computational complexity of the modified, most promising rKA-
based scheme, which intends to obtain the receive combining and transmit precoding matrices, is derived
2
in terms of complex multiplication/division matrix operations. Computational complexity gains are found for
the focused, relaxed strategy in moderately and highly loaded M-MIMO scenarios, but this computational
advantage is seen as detrimental to the SE when considering the performance afforded by classical schemes.
Index Terms
Massive MIMO; Combining; Precoding; Kaczmarz algorithm; Computational complexity.
I. INTRODUCTION
Massive multiple-input multiple-output (M-MIMO) has the ability to allow that all user equipment
(UE) concurrently exploit the same time-frequency resources to communicate through space-division
multiple access. Although non-linear M-MIMO signal processing techniques are typically credited as
optimal implementation strategies, it is well known that linear schemes achieve adequate performance-
computational complexity results that allow for a good spatial differentiation of the UEs. This is a
direct consequence of the use of massive antenna arrays that, basically, suppress the interference
among UEs [1], [2]. In each coherence block, therefore, the base station (BS) needs to estimate the
channel responses between the BS and UEs’ antennas in order to spatially differentiate the UEs. This
knowledge is acquired through a pilot training phase, which turns out to be generally feasible by the
application of a time-division duplex (TDD) architecture. In view of the fact that this channel state
information (CSI) is proportional to the number of BS and UEs’ antennas adopted in the system,
the classical computation of the combining/precoding schemes usually delivers in a substantially large
computational complexity. For this reason, it is desirable to relax further the computational complexity
related to canonical linear schemes, or simply canonical schemes, so that the use of low processing
hardware units in the BS can be allowed at the same time that the benefits brought by the usage of
a large number of antennas are still availed during communications, whose set would also culminate
in attractive economic gains.
Particularly, a good strategy to reduce the computational complexity of signal processing schemes
has been the search for suboptimal techniques. These called ”relaxed” schemes are envisaged to lessen
the complexity of the canonical ones and, simultaneously, achieve similar capacity results to them.
The recent literature contains different proposals to obtain the aforementioned objectives, such as
the polynomial expansion method that reformulates the classical schemes based on the Taylor series
[3]; being this approach also known as the truncated polynomial expansion (TPE). This technique
provides a comparable performance in terms of spectral efficiency (SE) with a reduced computational
3
complexity when compared to canonical schemes. For the TPE-based regularized zero-forcing (RZF),
the achieved performance is near to the classical fashion of normally executing such approach, while
the computational complexity is equivalent to that obtained for the simple maximum-ratio scheme [3].
In spite of the fact that TPE delivers good results, it exceedingly relies on the statistical knowledge of
the second order of the channel responses, that is, the information of the channel covariance matrices
must be known at the receiver. As can be seen from [4], estimating the second order is undoubtedly
challenging and computationally prohibitive to be obtained, especially when the dimension of M
becomes high, rendering it undesirable in the context under consideration without even need to take
into account the severe storage requirements.
In contrast to TPE, the authors of [5] proposed the application of the Kaczmarz algorithm (KA) over
the combining/precoding problems of M-MIMO. A remarkable advantage of the KA-based methods
over TPE or even other schemes lies in the fact that the former does not require the knowledge
of covariance matrices to achieve adequate performance with compelling complexity. The focused
procedure was initially proposed by Kaczmarz in [6] as an iterative technique for solving determined
and overdetermined (OD) set of linear equations (SLE). With the increasing popularity of stochastic
gradient techniques and machine learning, KA has been revived [7] and applied to other problems,
such as solving quadratic equations [8]. Recently, a randomized version of the KA (rKA) to solve
consistent OD SLEs was introduced and analyzed in [9]. This random approach of the KA allows
an expected exponential rate of convergence and is therefore much more reliable, unlike the case of
classical KA convergence that depends tremendously on the way that the equations are arranged in the
SLE. For M-MIMO, the authors of [5] derived a mathematical framework to examine the performance
of the KA, more specifically rKA, applied to emulate some linear signal processing techniques. The
rKA-based schemes of [5] consider the resolution of the signal estimation problem, i.e., they are
used to estimate in the receiver the signal sent/transmitted by/for each user in each symbol period.
Herein, we re-consider the emulation of some canonical linear schemes through the rKA as originally
proposed in [5] with the intention of reducing their computational complexities, thus giving further
insights about the algorithm capabilities and possible improvements to it for the case where the BS
is composed of a massive number of antennas, but is now equipped with simple and cheap hardware.
Different from [5], our aim is to obtain the receive combining matrix through rKA and, then,
acquire the transmit precoding matrix by resorting to the uplink-downlink (UL-DL) duality; stressing
that the authors of [5], motivated by the consideration of scenarios with slow mobility of the UEs,
4
directly describe the rKA to solve the signal estimation problem in each symbol period. The analysis
here considers the solution emulation of classical zero-forcing (ZF) and RZF strategies by using the
rKA, under the motivation that these techniques lead to a good performance-computational complexity
tradeoff [2]. In addition to the presentation of a more general point of view with respect to mobility,
we propose modifications upon the signal processing strategies based on rKA suggested in [5],
demonstrating the effectiveness of our proposals theoretically and numerically. The refinements in
the algorithm were motivated owing to the pathloss and shadowing effects be seen as detrimental to
the rKA functionality, as reveled in the conference paper [10], where we broadly introduce the proposed
modifications that will be further elaborated here. To answer if the rKA reaches a computationally-
efficient approach in general, a computational complexity analysis considering complex matrix-to-
matrix multiplications/divisions is also derived. This study is performed once the authors in [5]
only presented the complexity related to rKA-based schemes in terms of the big-O notation, which,
evidently, does not support accurate conclusions. We have additionally adopted more realistic M-MIMO
channel conditions, where the use of classical channel estimators are incorporated in simulations.
Besides that, spatially correlated channels are generated by the adoption of the exponential correlation
model with large-scale fading variations over the array, thus demonstrating the impact of two different
sources of spatial correlation on the convergence of proposed algorithms: (a) one generated from the
proximity of the array elements and (b) the other having its origin in the unequal contribution of
power from the scatterers chaotically distributed along the propagation medium. This paper is then
organized as follows. Section II first describes the M-MIMO system model in question, defining the
SE bound that we shall use and the combining/precoding signal estimation problems as optimization
problems; indeed, the latter will be where the Kaczmarz’s framework is employed as a possible solver,
thereby emulating the capacity results originating from the canonical solutions. The key properties of
the KA and its random variant, the rKA, are discussed in Section III, while they are applied as a way
to solve, in two different ways, the combining/precoding optimization problems in Section IV. Section
V evaluates the computational complexity of the rKA-based schemes 1. Numerical results aiming at a
better characterization of the rKA-based scheme that presents the best appeal in a much more realistic
M-MIMO context are demonstrated in Section VI. The main conclusions are then summarized in
Section VII.
1Note the use of the randomized version of KA here. In fact, as will be seen and motivated further, the rKA provides more reliable
results, which corroborates the future focus on signal processing schemes based on it.
5
Notations: Italics letters denote scalars, whereas boldface uppercase and lowercase represent matrices
and columns vectors, respectively. The superscript (·)H denotes the Hermitian transpose and NC(·, ·)stands for the circularly symmetric complex Gaussian distribution. E {x} holds for the expected value
of a random variable x. The N × N identity matrix is indicated by IN , whereas a vector of N and
a matrix of N × N zero elements are represented, respectively, as 0N and 0N×N . ‖·‖2 and ‖·‖F are
the l2-norm and Frobenius norm, respectively. The inner product between two vectors is denoted as:
〈·, ·〉. The concatenation is represented by [·, ·] and the selection of the row i and column j of A is
performed as [A]i,j . In particular, the ith element of a vector a can be seen as ai.
II. SYSTEM MODEL
In this section, a canonical single-cell M-MIMO is presented, aiming to introduce the SE bound
utilized to evaluate the signal processing techniques and to state the signal estimation as an optimization
problem to be solved later through the Kaczmarz’s premises. The setup considered here is comprised of
a BS equipped with M antennas serving K single-antenna UEs. Presupposing that the coherence blocks
with a length of τc (coherence interval) symbols are statistically independent over their realizations
(block-fading model) [1], [2], let gk ∈ CM denote the channel from UE k to the BS. In addition, it is
adopted a correlated Rayleigh fading model, whereby the channel can be written as gk ∼ NC(0,Rk).
The covariance matrix Rk ∈ CM×M thus embodies effects such as pathloss and spatial channel
correlation, where both correspond to large-scale propagation phenomena, while the complex Gaussian
distribution stands for the small-scale fading [11]. At this moment, one can assume a pilot training
phase to obtain the CSI that follows the features of the TDD scheme. The pilot sequences attributed to
the UEs are regarded as mutually orthogonal, since it is desired to combat the intra-cell interference.
As a consequence of this process, the BS acquires the estimated channel matrix G ∈ CM×K being
gk ∈ CM its kth column, which represents the estimated channel vector of UE k.
A. Uplink Data Transmission
In the UL, the BS receives the data symbols transmitted by all UEs, yielding the following linearly
combined signal: y =√
ρul∑K
i=1 gisi + n; where ρul is the normalized UL transmit power, si ∼NC(0, 1) stands for the data symbol sent by UE i, and n ∈ CM ∼ NC(0M , IM) denotes the receiver
noise. The performance of a selected receive combining vector, vk ∈ CM , will be evaluated using the
6
UL SE metric obtained through the application of the so-called use-and-forget technique2. The net
ergodic and lower bounded UL SE for UE k is then [1], [2]
SEulk =
τulτc
log2
(
1 + γul
k
)
[bit/s/Hz], (1)
with the effective signal-to-interference-plus-noise ratio (SINR) given as
γul
k=
ρul|E{vHkgk}|2
ρul∑K
i=1 E{|vHkgi|}2 − ρul|E{vH
kgk}|2 + E{‖vk‖22}, (2)
where τul is the fraction of the coherence interval spent in UL. Supposing that only the UL is occurring,
τul = τc − τp being τp the length of the pilot training phase. The expectations in (2) are taken with
respect to all the randomness related to the channel estimates and combining schemes.
B. Downlink Data Transmission
The BS sends data to its respective UEs during the DL, where each data bearing has to be precoded
with the aim of spatially direct it towards the desired UE. This directional beam control is performed
through a transmit precoding vector given as wk ∈ CM for the kth UE. Thus, the first stage of
DL is to steer the information signal to be transmitted by the BS via a precoding process stated as
x =∑K
i=1wiςi; where ςi ∼ NC(0, 1) is the message signal sent by the BS for UE i. An important
remark to be made is that ‖wi‖22 = 1 such that the transmit precoding process does not interfere in
the signal power. The precoded signal is then transmitted by the BS, while the UEs are awaiting for
its reception. Recall that once the TDD scheme is assumed, the UL and DL channels are reciprocal
within each τc-block. Keeping this in mind, the UE k receives: yk =√
ρdl∑K
i=1 gHi x + n; where ρdl
is the normalized DL transmit power. It can be seen that each UE is affected by all UEs’ precoding
vectors and, for this reason, the selection of wi becomes a difficult problem to solve. An heuristic
way to resolve the selection of the precoding vectors across the UEs is through the use of the UL-DL
duality, which will be better discussed and deeply exploited in the sequence. The UE k can estimate
its respective message signal by computing the average precoding channel E{gHkx}. This procedure
allows the derivation of a hardening bound for the DL SE similar to (1) with an alike definition of τdl,
which is given in [2, p. 317] and extremely relies on attaining the channel hardening effect. The SE
bound of the DL is not directly evaluated here, since the UL evaluation is sufficient to demonstrate
the effectiveness of the work proposals without any loss of generality.
2This denomination stems from the exploitation of the channel estimates only to compute the receive combining vector, whereas the
CSI is not used in the detection process.
7
C. Combining/Precoding Schemes: Problem Statement
Throughout this section, the ZF and RZF receive combining schemes are presented and then,
relying on the UL-DL duality, their respective transmit precoding vectors are obtained. For ease of
exposition, the receive combining and transmit precoding matrices are defined as V ∈ CM×K and
W ∈ CM×K with vk and wk being their kth columns, respectively. It is furthermore demonstrated
that the introduced canonical schemes can be seen as the solution of optimization problems. As we
will see, the optimization problems can be treated as SLEs and thus can be naturally solved through
rKA, as conducted in Section IV, giving rise to rKA-based signal processing schemes.
The RZF is a sophisticated approach that takes into account the regularization of the estimated
channel matrix by the signal-to-noise ratio (SNR). This regularization factor can mathematically aid
the inversion of G and, physically, it weights the rates of interference suppression and desired signal
maximization provided by this scheme. The RZF is given as [2]
VRZF = G(
GHG+ ξIK
)−1
, (3)
where ξ = 1ρul
is the regularization factor, defined as the inverse of the SNR. This approach is
recommended to be used in scenarios where a good estimation performance is achieved, and the
inter-cell interference is not duly strong. The latter only valid for a multi-cell case.
Notice that if the SNR is high, ξ → 0, and/or a large M is applied, tr(GHG)≫ tr (ξIK), RZF can
be approximated as [1], [2]
VZF = G(
GHG)−1
. (4)
The term ZF is provident from the fact that GHVZF = IK , noting that VZF is the pseudo-inverse of
GH. Consequently, the ZF aims to reduce all the intra-cell interference, while it assures the power
level of the desired signals. When the receive combining is applied, namely, (VZF)Hy, however, the
real channels are different from the estimated ones, whose fact translates into residual interference [2].
In addition, note that not even all the UEs have high SNR, hence, the ZF is expected to give lower
performance results as compared to RZF. It should be clear that the ZF can be comprehended as the
RZF case with ξ = 0.
In order to separate and estimate the signals of each UE, the receive combining scheme is applied
over the received signal in the BS. This can be expressed as
s = VHy, (5)
8
where s ∈ CK is the vector with the estimates of the data symbols transmitted by all UEs in one
communication instant and V can be either the RZF or the ZF filter. In view of that, the RZF scheme
can be stated as the optimal solution of the signal estimation problem handled above, as it is presented
in Corollary 1 and proved in the sequel, based on [5].
Corollary 1. From (3) and (5), the RZF signal estimate is
s = (VRZF)Hy =(
GHG+ ξIK
)−1
GHy, (6)
thus, it is possible to infer that s is an optimum solution for the estimation problem, since it is the
optimal answer of argmin∈CK‖G− y‖22 + ξ‖‖22.
Proof. The first derivative in relation to of the minimum cost function above is 2GH(G−y)+2ξ.
The optimal solution that minimizes the given derivative is therefore ⋆ = (GHG+ ξIK)−1GHy; then,
one can contemplate that the given answer corresponds to s.
The same can be done for the ZF when considering ξ = 0, however, an analysis focused on the
more general case characterized by the RZF scheme will be assumed hereafter. The optimization
problem given in Corollary 1 can be rewritten compactly as ‖B − y0‖22, where B = [G;√ξIK ] is
an (M +K) ×K matrix with the estimated channel matrix and the factorized regularization factor,
and y0 = [y; 0] is an (M +K)-dim vector with the received signal and a zero vector. Thus, s can be
expressed as
s = (BHB)−1BHy0 = (GHG+ ξIK)−1GHy. (7)
These definitions will be used to properly apply the rKA as an alternative way to solve the optimization
problem established in Corollary 1.
As anticipated before, the selection of the transmit precoding vectors is a very complex problem
to be solved, owing to its self-dependence, i.e., a precoding vector is affected by all the precoding
vectors of the system. An heuristic way to solve it is the UL-DL duality, a direct consequence of
considering TDD mode, by which the transmit precoding vector of UE k can be computed as [2]
wk =vk
‖vk‖2(8)
for any receive combining scheme. Recall that the normalization is related to the desired fact that the
transmit precoding vector should not interfere with the transmitted power. The above definition allows
us to obtain the transmit precoding vectors for ZF, RZF, and also KA- or rKA-based schemes without
the exact solution of the precoding problem. This result is deeply explored in the next sections.
9
III. KACZMARZ AND RANDOMIZED KACZMARZ ALGORITHMS
The mathematical concepts behind the KA are now briefly described based on the Kaczmarz’s
seminal work [6], in addition to those underlying the rKA, an extension of KA provided by Strohmer
and Vershynin in [9], which has a strict rate of convergence that allows more reliable results when
compared to the classical KA. For this purpose, we consider, throughout this section, the canonical
and generic SLE Ax = b, where A ∈ Cm×n is the matrix of constant coefficients, x ∈ Cn is the
vector of unknowns, and b ∈ Cm is the vector of known offset coefficients. Note that, under the
consideration of an underdetermined (UD) SLE (m < n), A is a full row-rank matrix, whereas for
an OD SLE (m > n), A is a full column-rank matrix. Besides that, the SLE is deemed consistent,
which means that it possesses only one possible solution.
The KA can be written as [6], [9]
xt+1 = xt +bi − 〈ai,x
t〉‖ai‖22
ai, (9)
where t is the iteration index and ai = ([A]i,:)H ∈ Cn is the ith round-robin selected row of A; similarly,
bi is the ith element of b. It is important to stay clear that the selected row i is a deterministic variable
for the classical KA and can then be computed circularly, for instance, as i = mod(t,m) + 1. We
describe the KA in detail as follows. First, the KA is initialized with an arbitrary solution, x0, seeking
to find the genuine answer, x. It then takes a row i in a specific KA iteration t that is successively
chosen from the order of the equations in A, i.e., sweeping from a1 to am. After that, KA performs
the computation bi − 〈ai,xt〉, where notice that if 〈ai,x
t〉 = bi, the solution is not updated, resulting
in xt+1 = xt; if not, the KA projects the current approximate solution, xt, onto the hyperplane
defined by 〈ai,xt〉 = bi, thus securing the solution consistency and performing the update of xt+1.
These successive orthogonal projections tend to lead x0 to x for a suitable number of KA iterations.
In summary, the KA is an efficient manner to find the solution of an SLE, which strives to guide
a random guess in the direction of the desired answer, obeying the maximum energy perturbation
principle [5].
Despite the KA be an extremely powerful solver, mainly for OD SLEs, its rate of convergence
is not duly known and is therefore not satisfactorily explored. This occurs due to the KA’s rate of
convergence be a difficult metric to be estimated, since there is a strong dependence on the way the
equations are arranged in A; which fact affects the so-called KA’s update schedule, namely, the manner
the equations are selected as the number of iterations grows. Eventually, some works identified that the
10
random choice of the rows of A can guarantee a rigorous rate of convergence for the KA. From this
observation, [9] proposed the rKA scheme that assures an expected exponential rate of convergence,
which is of capital importance to ensure a reliable KA solution in terms of reproducible results.
The rKA randomly and independently chooses the rows of A in its update schedule, founded on a
metric that measures the relevance of the rows. In this case, the random KA-variant model described
in [9] can be expressed as
xt+1 = xt +br(t) − 〈ar(t),x
t〉∥
∥ar(t)
∥
∥
2
2
ar(t), (10)
where the random picked row in rKA iteration t is denoted as r(t) ∈ {1, 2, . . . , m}. In particular, each
r(t) has a sample probability given as Pr(t) ∈ [0, 1] that embraces a vector with the specified sampling
distribution of all rows equal to p = (P1, . . . , Pm) ∈ Rm+ . It should be observed that
∑mr(t)=1 Pr(t) = 1.
This sample probability is defined to minimize the expected mean-squared error (MSE) at a given
rKA iteration t, which fact occurs when [9]
Pr(t) :=‖ar(t)‖22‖A‖2F
. (11)
In fact, the above Pr(t) is suboptimal3 and can be interpreted as an amount that measures the fraction
of energy of a given row with respect to the total energy of the system. Although suboptimal, this
probability distribution has demonstrated effectiveness and, hence, will be adopted in this work. The
main convergence result of the rKA established in [5] and [9] is summarized by the following corollary.
Corollary 2. The expected rate of convergence of the rKA is
E{‖xt − x⋆‖22} ≤ (1− κX (A))t‖x0 − x⋆‖22, (12)
with x⋆ being the optimal solution and x0 an arbitrary initial guess. Moreover, κX (A) is nominated
as the average gain and defined as
κX (A) := minϑ∈X ,ϑ6=0
‖Aϑ‖22‖A‖2F‖ϑ‖22
, (13)
where one should observe that the above metric is totally independent of the sample probability and
is proportional to the condition number4 of A.
3Optimization of the probability may not be useful in M-MIMO scenario, since the CSI changes every τc. Indeed, as it shall be
demonstrated, A is directly linked to G. Altogether, it will be required to solve optimization problems for each τc, wherein these
solutions require unwanted hardware load [5], [12], given the posed BS restrictions.
4Remember that ‖A‖22‖A−1‖2F is usually defined as the condition number of A.
11
Proof. The proof can be found in [5, p. 11].
To gain further insights, the above corollary defines in (12) the quantity of rKA iterations (pro-
jections) required to take the initial solution x0 closest to x⋆, based on an average gain provided by
the rKA. This convergence is exponentially reaching the average inaccuracy limit that is given by the
left-hand side of (12). Note that the higher the (1−κX (A)), more rKA iterations are needed to lower
the initial error ‖x0 − x⋆‖22 towards the coveted boundary. It is thus desirable to have a high value
of κX (A) so as to shrink the number of rKA iterations necessary to achieve, suitably, the expected
error limit. This means that there is an irrefutable connection between the κX (A) and the convergence
of the rKA. As a result, the best-case of convergence is represented by the maximum value that the
average gain can assume, while its minimum stands for the worst-case scenario of convergence. It is
also worth mentioning explicitly that the number of rKA iterations is inversely proportional to the
average gain. Because of the above convergence result, which brings additional reliability, our focus,
from now on, is the use of rKA in the design of signal processing schemes based on the Kaczmarz’s
methodology5.
IV. OBTAINING THE COMBINING/PRECODING MATRICES VIA RANDOMIZED KACZMARZ
ALGORITHM
The rKA is now presented as a valid approach to solve the receive combining problem established
in Corollary 1, which can be interpreted as if the rKA was emulating the solution offered by the RZF
scheme. More precisely, two rKA-based RZF strategies are presented to obtain the receive combining
matrix, by which the transmit precoding matrix can be derived using the UL-DL duality, described
in (8), without loss of generality. An alternative way to solve the optimization problem considered in
Corollary 1 is its statement through the resolution of an OD SLE given as B = y0. Notice that this
SLE is OD, since the number of equations, M +K, is greater than the number of unknowns, K, in
a typical M-MIMO scenario [13]. Moreover, observe that this problem is not consistent, because of
the arbitrariness inserted by the receiver noise in the signal received by the BS, y, which makes that
the OD SLE possesses an infinite number of solutions. If the rKA is applied to solve this problem,
a convergence bias or residual error is eventually introduced in the solution due to the inconsistency.
In [5], this problem was circumvented by splitting B = y0 with ⋆ = s into two stages. The first
5One must stay clear that the receive combining and transmit precoding based therefore on the rKA can also be implemented with
the KA, however, the converge will immensely rely on the way that the equations are ordered in the SLE to be solved.
12
strives to remodel the OD SLE to a consistent form, leading to an UD SLE; then, another OD SLE
is derived based on the genuine stated problem (B = y0). We present this in depth below.
A. The Two Steps in Solving the SLE
The first goal is to obtain an estimate of y0, which term embodies the inconsistency. To this end,
y0 ∈ C(M+K) is first defined as
y0 := Bs(a)= B(BHB)−1GHy, (14)
where in (a) the relation exposed in (7) was applied. By making a parallel with the original SLE,
BHy0 = , the y0 can be estimated via the following SLE:
BHy0 = (BHB)(BHB)−1GHy = GHy. (15)
The above SLE is UD inasmuch as the number of equations, K, is smaller than the unknowns amount,
M +K. Even more, it is consistent in view of all the solutions are within the subspace generated by
the columns of BH. Regardless of not being optimum, y0 can be estimated through rKA. That being
the case, it is natural to derive a second SLE given as Bs = y0, wherein both OD and consistency
conditions are comprised. The solution of this second SLE based on rKA, however, is not strictly
necessary, seeing that the estimate of s can be acquired through the K last rows of y0. This means
that solely part of the UD SLE has to be solved, which can be easily seen when y0 is replaced in
Bs = y0 as its definition in (14), doing that we get
Bs = y0 = B(BHB)−1GHy
s = BHB(BHB)−1GHy(a)= BHy0,
(16)
where in (a) it was used the equality in (15).
B. Parallel rKA-Based RZF Scheme
The receive combining matrix can be obtained through the implementation of K rKAs in parallel
of which the kth one is solving an SLE of the form: BHc = ek. By already adopting the Kaczmarz’s
notation, ct ∈ C(M+K) = [ut; zt] with ut ∈ CM and zt ∈ CK . Further, ek ∈ CK is the canonical
basis vector with [ek]k = 1 and 0 otherwise. This analogous problem is essentially founded in the UD
SLE exposed in (15) and the property given in (16), being its parallel resolution properly described
in Algorithm 1. It is remarkable to see that, in the parallelized form, ek is replacing GHy from its
13
underlying SLE in (15), which only activates the resolution for the equations related to the kth rKA. In
other words, this aspect suggests that the kth rKA is solving the receive combining vector associated
with UE k. In fact, the kth rKA solution does not actually yields vk, but it provides the vector dk ∈ CK
that is given solely by the last K rows of ct, i.e., zt; thereby, vk = Gdk.
Notably, the execution of each rKA can be envisioned to run concurrently in different hardware
units, where each unit is independently or quasi-independently reaching its own receive combining
vector. This independence is very related to the way that the randomness of the update schedule in
(10) is interpreted at a given context, i.e., how the random selection of rows are associated across the
hardware units. With this in mind, two possibilities are plausible to give satisfactory results: (a) all the
K rKAs can share the random rows, r(t), and (b) each rKA has its self-realized arbitrariness. This
discussion is better deepened in Section V, which treats about computational complexity; however,
realize that both approaches provide suitable convergence, as established in Corollary 2.
In spite of the fact that Algorithm 1 was indirectly6 assessed in [5], some changes are included here.
To put them in context, it is reasonable to infer that the sample probability Pr(t) is generally impacted
by the pathloss and shadowing due to their relation with the estimated channel powers (see Pr(t) in
step 10 of Algorithm 1). If p is detrimentally affected, the rKA’s update schedule is also degraded. At
this moment, one must be noted that if row k is not selected in the kth rKA, the solution update will
not even occur, making the desired convergence unattainable. This also happens because of the use
of zero vectors to initialize the state variables in c0. The zero vectors, however, are desirable, since
other arbitrary values, like a vector of ones, can cause a power bias over the rKA-estimated receive
combining vectors. In summary, the UEs close to the BS (center-UEs) have significant higher values
of Pr(t) than those for UEs located at the edge of the cell (edge-UEs). This disparity leads to a poor or
impossible convergence of the combining vectors of edge-UEs obtained through Algorithm 1. More
insightful results about the consequences of large-scale effects over p are carried out in Section VI-A.
Striving to overcome this negative effect, we observed that a suitable and non-random manner to
initialize the kth rKA and thus guarantee its updating consists of forcing the selection of the kth row
in the first run; as performed in step 8 of Algorithm 1. This proceeding can also provide a better
average initial guess, which translates into a better average gain that improves the rate of convergence
presented in Corollary 2, as shall be evidenced in the sequel. What makes the proposed procedure an
6Recall that the authors of [5] discuss the rKA as an approach to directly obtain s, motivated by slow mobility scenarios. Herein, we
analyze a most generic case, where obtaining individually the combining/precoding matrices is of paramount importance.
14
Algorithm 1 Parallel (PARL) approach of rKA to estimate the RZF receive combining matrix
1: Input: Estimated channel matrix G ∈ CM×K , inverse level of SNR ξ, number of UEs K , and number of rKA
iterations TrKA.
2: Initialization: Specify the state vectors ut ∈ CM and zt ∈ CK with u0 = 0M and z0 = 0K ; either DRZF ∈ CK×K
as the factorized version of GHVRZF.
3: Procedure:
4: for i← 1 to K do
5: Compute the canonical basis, ei ∈ CK , where [ei]i = 1 and [ei]j = 0, ∀j 6= i.
6: for t← 0 to TrKA − 1 do
7: if t = 0 then
8: Pick the ith row of GH, gH
r(t) ∈ C1×M , for r(t) = i.
9: else
10: Pick the r(t)th row of GH, gH
r(t) ∈ C1×M , with r(t) ∈ {1, 2, . . . ,K} drawn based on p with Pr(t) =‖gr(t)‖
22+ξ
‖G‖2F+Kξ
.
11: end if
12: Compute the residual ηt :=[ei]r(t)−〈gr(t),u
t〉−ξzt
r(t)
‖gr(t)‖22+ξ
.
13: Update ut+1 = ut + ηtgr(t).
14: Update zt+1r(t) = ztr(t) + ηt and repeat the other positions: zt+1
j = ztj , ∀j 6= r(t).
15: end for
16: Update [DRZF]:,i = zTKA−1.
17: end for
18: Output: VRZF = GDRZF.
attractive way to initialize Algorithm 1 is the fact that edge-UEs can now fairly obtain their receive
combining vectors, even with the tough preference given to the center-UEs in the computation of
Pr(t), making the parallel (PARL) rKA-based RZF scheme more robust to practical regimes and,
consequently, enhancing its convergence towards the aimed solutions. One can additionally conjecture
that the proposed initialization approach that can be seen as a hybrid algorithm between KA and
rKA is not strictly necessary to UEs that have high values of Pr(t), for the simple reason that they
are more probable to occur when drawing r(t). As a consequence, an optimization problem to find
which one of the K rKAs has to be force-initialized could be suggested. Note that the gain provided
by this probable optimization, however, is merely one more iteration for each rKA, and its finding
may produce a considerable and undesired computational complexity. This unattractive optimization
procedure was therefore not considered in this work.
15
The convergence performance of Algorithm 1 is defined in Theorem 1. Through this result, some
comments about the average gain bounds are made in Remarks 1 and 2. Recall that the limits of
κX (BH) are inversely proportional to the respective bounds of the number of rKA iterations. These
remarks will hence be important for the computational evaluation of the rKA-based RZF schemes.
Theorem 1. Consider the kth UD SLE BHc = ek, where BH ∈ CK×(M+K) is the matrix of constant
coefficients, c ∈ C(M+K) is the vector of unknowns, and ek ∈ CK is the canonical basis. By applying
the rKA to solve this system, as done in Algorithm 1, it converges towards the optimal solution c⋆
bounded by the following average error:
E{‖ct − c⋆‖22} ≤ (1− κX (BH))t‖c(1) − c⋆‖22, (17)
with κX (BH) =
λmin(BHB)
‖B‖2F(a)=
λmin(GHG) + ξ
‖G‖2F +Kξ, (18)
where in (a), B is replaced by its definition stated in (7). Note that c(1) can be seen as the initial
solution of the rKA, which also provides an average gain based on the KA:
κ(1)X (BH) = min
ϑ∈X ,ϑ6=0
‖bH
kϑ‖22‖bk‖22‖ϑ‖22
, (19)
that is very dependent on the row index k. One should observe that the gap c(1) − c⋆ can thus be
made smaller on average than the provided by the adoption of a zero or any other random solution
as the initial one.
Proof. The proof is given in Appendix B.
Remark 1 (Average Gain Bounds: Generic Correlated Covariance Matrices with Gaussian Estimated
Channel Matrix). If the elements of G are distributed as NC(0, 1)7, the eigenvalues of G are upper
limited by 1 and then, if all eigenvalues are 1, we have ‖G‖2F = K. Thereby, the average gain stated
in (18) is limited as the following manner [5]:
λmin(GHG)
‖G‖2F≤ λmin(G
HG) + ξ
‖G‖2F +Kξ≤ 1
K, (20)
where ξ ≥ 0. From the above inequality, it is possible to presume that the higher the ξ, the faster
the convergence of Algorithm 1, since κX (BH) also turns out to be higher. Consequently, the ZF
filter case, i.e., ξ = 0 is expected to support a slower convergence than the provided by emulating
7For the canonical channel estimators analyzed in the numerical results, this distribution does not hold, since the estimated powers
are certainly smaller than the true ones.
16
the RZF scheme through rKA8. In addition to that, one should be observed that edge-UEs have a
better average gain than center-UEs; namely, even though the edge-UEs are poorly selected by the
randomized approach, they reach the solution more quickly than center-UEs.
Remark 2 (Average Gain Bounds: Uncorrelated Rayleigh Fading with Gaussian Estimated Channel
Matrix). When the uncorrelated Rayleigh fading channel modeling is considered, i.e., Rk = βkIM ,
where βk is the average large-scale fading coefficient for UE k, and the elements of G are distributed
as NC(0, 1), the limits of κX (BH) can be strictly established based on the random matrix theory. This
is done by solving the characteristic polynomial of GHG, finding its minimum, and then applying the
statistical characteristic of G, yielding [5]:
κX (BH) ≥ λmin(G
HG)
‖G‖2F=
(√M −
√K)2
MK=
(
1−√
KM
)2
K, (21)
where KM
is the UE-BS antenna ratio, also known as the loading factor in M-MIMO. From this result,
the number of rKA iterations is therefore definitively associated with the value of the loading factor.
Interestingly, as the value of KM
is close to 1, the average gain tends to be zero. This means that the
number of rKA iterations needed to comprise the worst-case convergence condition goes to infinity,
i.e., the convergence can be considered unattainable in that case. One needs to keep in mind, however,
that the UE-BS antenna ratio in M-MIMO is typically KM∈ [0, 0.5] [1], [2].
C. Linear Operator rKA-Based RZF Scheme
The following key property of rKA has to be introduced so that we can present the second rKA-based
method for obtaining the RZF receive combining matrix.
Corollary 3. Supposing any given SLE in its trivial form Ax = b, where A ∈ Cm×n is the matrix
of constant coefficients, x ∈ Cn is the vector of unknowns, and b ∈ Cm is the vector of known offset
coefficients with m and n representing the number of equations and unknowns, respectively. One can
solve this SLE via rKA with an initial guess x0, yielding in the tth rKA iteration an approximate
solution denoted as xt. This estimate can be expressed in terms of a linear operator (LO) realized
over A as xt = Lt(A)b, being Lt(A) ∈ Cn×m only dependent on A and the randomness attained to
the rKA convergence; however, it is apart from b.
8The results that will be obtained for the rKA emulating the RZF scheme can be easily expanded to the ZF case, but, since ZF has
a worse rate of convergence than RZF, the expected computational gains must be reduced for the former scheme.
17
Proof. The proof follows [5, p. 20].
As a result of the consistent OD SLE Bs = y0 and Corollary 3, it is easy to see that st = Lt(B)y0
is similar to the signal estimation expression in (5) with the proviso that Lt(B) is a K × (M +K)
matrix, whereas VH is K ×M ; the same can be said for y0 and y. It is therefore possible to realize
that the M first columns of the LO tend to approach the RZF receive combining matrix hermitian
for a suitable number of rKA iterations. This is the concept behind Algorithm 2, but the proposed
procedure was conceived aiming at estimating the LO over BH; by founding on BHs′ = y′0, where
s′ = Bs ∈ C(M+K) and y′0 = BHy0 ∈ CK . In this case, the control of p does not depend on M +K
rows, but on K rows by which the UEs are represented. Again, this management is desirable, since
the large-scale effects negatively impact the distribution range of p. An attempt to reduce this harmful
influence was to use the first K iterations to initialize the rKA, as realized in step 7 of Algorithm 2.
Theorem 2 establishes the convergence performance of Algorithm 2, where its proof is quite similar
to the conducted for Theorem 1. The Remarks 1 and 2 are also valid here.
Theorem 2. Let BHs′ = y′0 be an OD SLE with BH ∈ CK×(M+K) denoting the matrix of constant
coefficients, s′ ∈ C(M+K) the vector of unknowns, and y′0 ∈ CK the vector of known consistent
estimates of the received signal. As can be seen from Algorithm 2, the given SLE can be solved
through rKA. The convergence of this procedure is limited by the average error between the solution
obtained in rKA iteration t and the optimal (s′)⋆, given as
E{
‖(s′)t − (s′)⋆‖22}
≤ (1− κX (BH))t ‖(s′)(K) − (s′)⋆‖22, (22)
with κX (BH) =
λmin (BHB)
‖B‖2F. (23)
The initial solution (s′)(K) also carries an average gain, which is
κ(K)X (BH) = min
ϑ∈X ,ϑ6=0
K∑
k=1
‖bHkϑ‖22
‖bk‖22‖ϑ‖22. (24)
The above average gain can be better than that the provided by any other arbitrary solution.
V. COMPUTATIONAL COMPLEXITY ANALYSIS
Since canonical and proposed schemes based on rKA are adequately described, we now turn our
attention to find a fair way to analyze and compare their computational complexities. This investigation
aids to answer the ad hoc question that desires to verify if the rKA-based strategies pave the way for
18
computationally light signal processing schemes. Our main goal is basically to define functions that
describe the costs of computational complexity of the signal processing schemes in view of matrix-
to-matrix multiplication and matrix inversion, together with the consideration of the same operations
for vectors. This computational finding strategy is motivated by the work conducted in [2, App. B,
p. 558], wherein the authors give a very detailed framework upon the costs of those elementary
matrix/vector operations. To emphasize, complex multiplications/divisions are then taken into account
in the computational complexity measuring functions, while additions and subtractions are neglected
with the premise that the latter are not as heavy as the former from hardware point of view. With this
in mind, the examination of the computational cost will be divided into the UL and DL phases.
Algorithm 2 Linear Operator (LO) approach of rKA to estimate the RZF receive combining matrix
1: Input: Estimated channel matrix G ∈ CM×K , inverse level of SNR ξ, number of UEs K and number of rKA iterations
TKA.
2: Initialization: Define the linear operator Lt(BH) ∈ C(M+K)×K with L0(BH) = 0(M+K)×K .
3: Obtain B = [G;√ξIK ].
4: Procedure:
5: for t← 0 to TKA − 1 do
6: if t < K then
7: Pick the kth row of BH, bH
r(t) ∈ C1×(M+K), where r(t) = k for t ∈ {1, 2, . . . ,K}.8: else
9: Pick the r(t)th row of BH, bH
r(t) ∈ C1×(M+K), with r(t) ∈ {1, 2, . . . ,K} drawn based on p with Pr(t) =‖br(t)‖
22
‖B‖2F
.
10: end if
11: Compute the canonical basis, er(t) ∈ CK , where [er(t)]r(t) = 1 and [er(t)]j = 0, ∀j 6= r(t).
12: Update Lt+1(BH) =
(
I(M+K) −br(t)b
Hr(t)
‖br(t)‖22
)
Lt(BH) +br(t)e
Hr(t)
‖br(t)‖22
.
13: end for
14: Output: VRZF = [LTKA−1(BH)]1:M,:.
A. Computational Cost: Uplink
Since its inception the idea behind the PARL rKA-based RZF scheme is to detach the solution of
the receive combining matrix onto K low complexity hardware unit points, i.e., each unit point has a
simple, cheap, and dedicated hardware responsible to obtain the receive combining vector relative to its
respective UE through the rKA. From Algorithm 1, it can be observed that most of the computational
load executed by each dedicated hardware is related to the computation of the sampling distribution
19
p in step 10 and of the residual ηt in step 13. In particular, the computational cost assigned to p is
considered to be the knowledge of all the sample probabilities, Pr(t), in such a way the generation
of some r(t)s comprises of drawing random numbers based on p. This generation can be done in a
distributed or centralized way, as already mentioned. The former stands for a scenario in which each
kth hardware unit is computing its own p and, hence, independently generates its values of r(t); while
the latter denotes a scenario where a central hardware point is quantifying p and afterwards sharing
the draw of r(t)s across the other K hardware units (hardware units are free from computing p). As
each hardware unit has to compute its p, the distributed fashion is said to be the most computationally
heavy. The centralized method, however, can lead to a substantial loss of time, which represents a
fraction of the available coherence interval, by communicating with the central hardware point. We
should therefore expect an impact on performance by adopting the centralized way, especially in
overcrowded scenarios where the value of the loading factor tends to be close to the upper limit
defined here as KM
= 0.5. A computational comparison between both manners of generating r(t)
values is carried out in Section VI-D.
Another key discussion point for the PARL rKA-based RZF scheme is related to the signal recovery
problem described in (5). In view of the fact that each kth unit is actually computing dk and not vk
in Algorithm 1, it is appropriate to perform the following computations in a central hardware point
so as to obtain the signal estimate on a given communication instant:
VPARL = GD, (25a)
s = (VPARL)Hy. (25b)
where (25a) is computed only one time and takes MK2 complex operations, while the signal recovery
in (25b) leads to MK and has to be computed for all τul instants. A centralized fashion is considered
for these computations, because the time spent in obtaining the information needed to perform the
calculations in a distributed or centralized way seems to be approximately equal. A brief note on the
computational complexity of Algorithm 2 is now in order. The linear operator (LO) rKA-based RZF
scheme has the computation of p in step 9 and the update of the LO in step 14 as the main hardware
loads. Different from the PARL approach, Algorithm 2 is essentially co-processed (centralized).
Table I summarizes the obtained complexity measuring functions for rKA-based schemes emulating
the classical RZF strategy, in addition to the canonical results from [2], where both were obtained based
on the computational framework provided by the same work. Remember that this mathematical struc-
20
ture takes into account only complex multiplications and divisions from elementary matrix-to-matrix
(and vector-to-vector) operations. The reception column entails the computational cost associated with
the signal estimation expressed in (5). Some remarks can then be made regarding the given results. The
computation of the receive combining matrix for Algorithm 1 seems to be the most computationally
light, but this value is very dependable on the number of rKA iterations, TrKA. The PARL rKA-based
RZF scheme shows, however, the greatest reception complexity and the fact that, when considering the
distributed fashion, the calculation of the sampling distribution adds a considerable extra computational
floor. The LO procedure (Algorithm 2) is by far the most costly in computational terms, thus it will
not be considered in the following analysis. It is important to emphasize, however, that this procedure
plays a good way into the design of networks envisioning to apply rKA-based techniques. This is
corroborated by its rate of convergence be fairly similar to the one obtained for the PARL rKA-based
RZF scheme, as we observe through simulation results, and also by the fact that simulation tools, like
Matlab, have powerful built-in functions to well execute matrix operations.
TABLE I
COMPUTATIONAL COMPLEXITY FOR RECEIVE COMBINING SCHEMES PER COHERENCE INTERVAL.
SchemeReceive combining matrix Reception Sampling distribution
Multiplications Divisions Multiplications Multiplications
RZF 3K2M
2+ 3KM
2+ K
3−K
3K τulMK –
ZF 3K2M
2+ KM
2+ K
3−K
3K τulMK –
PARL rKA-based RZF
(Algorithm 1)2MTKA – ⋄τul(MK) +MK2 ◦2MK
LO rKA-based RZF
(Algorithm 2)(K3 + 2K2M + 2K2 +KM2 +
3KM + 2K +M2 + 2M)TKA
– τulMK 2K2 + 2MK
⋄ Reception for Algorithm 1 results from a centralized computational approach (see (25b)). ◦ PARL sampling distribution can be
obtained by either a centralized way (computational cost is ignored) or a distributed way (2M for each of the K hardware units).
B. Computational Cost: Downlink
The computational complexity for the DL phase is composed of two different components: (a)
transmit precoding matrix computations, which are specified in (8), and (b) the transmission stage
that denotes the computational burden related to the computation of x (precoded signal) defined in
Section II-B. In the special case provided by the UL-DL duality, there is a favorable equality among
the computational cost of all the schemes, both canonical and rKA-based. For the latter, it is important
21
to remember that the receive combining matrix is already available at any given symbol period inside
τc, as demonstrated by the output values of Algorithms 1 (see also (25a)) and 2. The computation of
the precoding matrices consume MK complex operations, whereas the calculation of the precoded
signals takes τdlMK.
VI. NUMERICAL RESULTS
Let us now consider a square-network layout9 to gather some qualitative results on the applicability
of the Kaczmarz’s methodology in an M-MIMO scenario. A single-cell case adopting the parameters
exhibited in Table II is considered throughout the simulations. The cell covers an area of 0.25× 0.25
km2 with a center-BS equipped with M antennas spatially multiplexing K UEs. The UEs are uniformly
distributed inside the cell area with a minimum distance of 35 m to the BS. The average large-scale
fading denoting the pathloss effect for UE k is evaluated as: βk = Γ− 10α log10(dk) [dB]; where dk
is the Euclidean distance between UE k and the BS, Γ is the pathloss constant term at a reference
distance of 1 m, and α is the pathloss exponent. Since otherwise affirmed, the ordinary values of
Γ = −35.3 dB and α = 3.76 are used based on a non-line of sight dense urban scenario [1], [2].
TABLE II
PARAMETERS CONSIDERING A SQUARE-NETWORK LAYOUT FOR A SINGLE-CELL M-MIMO SCENARIO.
Parameter Value
Single-cell area 0.25 × 0.25 km2
BS antennas (M ) 100
# UEs (K) 10; 30; 50
Bandwidth 20 MHz
Receiver noise power −91 dBm
UL transmit power per UE 20 dBm
Antenna correlation factor (r) 0.5
Coherence interval (τc) 200 symbols
Pilot length (τp) K symbols
Pathloss exponent (α) 1.00; 3.76; 4.00
Pathloss constant term (Γ) −35.3 dB
Shadowing standard deviation (σ) 4 dB
Channel estimation LS and MMSE
9The simulation settings considered here are motivated by the numerical results realized in [2].
22
Canonically, the covariance matrix model for the uncorrelated Rayleigh fading is Rk = βk10f/10IM ,
where f ∼ N (0, σ2) stands for the large-scale fading (shadowing) with zero mean and standard
deviation σ, which is constant to all elements of the array [1], [13]. Motivated by the results obtained
in [11], it is also assumed a spatial correlation model composed by the exponential correlation model
for a uniform linear array (ULA) with large-scale fading variations over the array [14], [15], [16].
The exponential correlation model stands for spatial antenna correlation, while shadowing fluctuations
represent the power variability across the antenna elements caused by the disordered distribution of the
scatters in the environment. One should be emphasized that this last effect is empirically justified by
the conclusions acquired in [14]. That being said, the (m,n)th element of the covariance matrix for the
kth UE under this spatially correlated channel is defined as [Rk]m,n = βkrn−mei(n−m)θk10(fm+fn)/20,
where m,n ∈ {1, . . . ,M}, r ∈ [0, 1] is the antenna correlation factor, and θk is the angle-of-arrival
of the kth UE, which is uniformly distributed in [−π,+π) when considering an horizontal ULA.
f1, . . . , fm, . . . , fM ∼ N (0, σ2) are independent and identically distributed random variables that form
the random fluctuations of the large-scale fading with zero mean and a standard deviation σ. We take
this moment to define the moderate spatial correlation for the model just presented as the scenario in
which r = 0.5 and σ = 4 dB, as considered in [4]. This moderately spatial correlation setting will be
vastly used in the simulations below.
As we are considering a single-cell with mutually orthogonal pilots, the channel estimation process
is observed to be only noise limited. Some important observations about the covariance matrices of
the channel estimates can then be made to further clarify the evaluation carried out in this section.
One of the effects of channel estimation over the eigenvalues of the true channel’s covariance matrix
is generally the reduction of their magnitudes, once all estimate is comprised of error [2]. This error
is seen higher for the least-squares (LS) estimator in comparison to the minimum mean-squared error
(MMSE) estimator, where the difference is due to the use or not of prior statistical information. Another
known key result is that the stronger the eigenvalue associated with a given eigendirection, the easier
it becomes the channel estimate related to this spatial direction [2]. Note that a higher eigenvalue
actually translates in an increase of the SNR, which explains the best channel estimate. Since spatial
correlation impacts the eigenvalues of the channel covariance matrix, where few of them even carry a
large fraction of the power, as shown in [11] for the considered model, spatially correlated channels
are therefore susceptible to a better estimate. For a strong spatial correlation, less eigenvalues have
considerable power, further enhancing the channel estimation related to these spatial directions that
23
are the most significant for establishing the link between the BS antennas and a UE. However, when
the structure of the channel is not duly exploited, as in the case of the LS estimator, the scaling of the
channel estimate is difficult to be right defined, and thus only the channel direction can be estimated
decently [11].
A. Impacts Detrimental to Sample Probability
As can be seen from step 10 of Algorithm 1, the sample probability quantifies the relative power
of a given UE in relation to the system. Pr(t) is therefore mainly related to the power of the estimated
channels, whereby in our model is also accounting for the pathloss and shadowing effects. One can then
be assumed that this value is very sensitive to large/small-scale fading, channel estimation procedure,
and even impacted by spatial correlation10; being some of these effects illustrated further on.
Fig. 1. CDFs of the sample probability Pr(t) defined in step 10 of Algorithm 1 and averaged over independent small-scale fading
realizations with K
M= 0.1, r = 0.5, and σ = 4 dB. The probabilities were also generated for two different values of α and when
applying the LS/MMSE channel estimators. The true channel case indicates the perfect knowledge of CSI in the BS.
By treating Pr(t) as a random variable, Fig. 1 gives the cumulative distribution functions (CDFs)
of the average sample probability, specified in step 10 of Algorithm 1, for the PARL rKA-based RZF
10The values of M and K influence p as well.
24
scheme. It is clear to see that Pr(t) is strongly influenced by pathloss, since the CDF curves are shifted
to the left as α grows. This result reaffirms the well-known fact that edge-UEs are condemned to
have a minor level of power than the center-UEs; ergo, the former also own minor values of Pr(t)
that become more and more distant from those obtained for center-UEs as α increases. Note that the
channel estimate has a reasonable impact as well. The curves for the MMSE estimator are more to
the left for α = 4, thus showing that this estimator can obtain smaller values of Pr(t) in comparison
to LS. This is a direct consequence of the fact that the MMSE channel estimator works well under
low SNR. When using the MMSE, there is a small deviation to the right of the spatially correlated
curves in relation to the uncorrelated case that can be explained by the improvement of the channel
estimation which was in turn made possible by the moderate level of spatial correlation. Recall that this
positive result is only partly obtained for the LS estimator and therefore the negative effect of setting
a reasonable estimate for the power scaling factor becomes more evident in this case. In summary,
one can conclude that the edge-UEs may have very low values of Pr(t) when compared to center-UEs
in scenarios of practical interest. This fact hinders the choice of the equations attached to the edge-
UEs, as can be seen in step 10 of Algorithm 1, making it impossible to perform the rKA’s update
schedule correctly. As a result, the genuine rKA applied to this more practical context has a restricted
performance.
To illustrate the consequences of poorly scaled sampling distribution on performance, Fig. 2 exhibits,
as a function of the number of rKA iterations and for different channel estimates, the average UL SE
per UE achieved for the then equivalent PARL rKA-based RZF scheme secondarily conceived in [5]
and the proposed here by means of the solution presented in Algorithm 1. We stress that the former
does not deem with the practical effect visualized in Fig. 1, whereas our proposed methodology takes
into account these harmful differences observed in Pr(t) among the center- and edge-UEs through
a fair forced initialization process. It is clearly visible that the proposed scheme has surpassed its
correspondent one [5], since our proposal always converges faster to the reference bounds given by
the RZF SE. Besides that, the average UL SEs obtained are in agreement with the results of [11],
where the authors have adopted the same spatial correlation model. The superiority in performance
of spatially correlated channels stems from the fact that the spatial correlation arising from antenna
correlation and unequal contribution of the antennas is, on average, corroborative to the SE bound
defined in (1) following the effects on the channel estimation explained above. In addition, it is
important to keep in mind that the use-and-forget technique, used to define (1), is very sensitive to
25
Fig. 2. Average UL SE per UE as a function of TrKA for the canonical RZF combining and its emulations performed by the PARL
rKA-based schemes. The first PARL algorithm follows the ideas of [5] and does not have an initialization (abbreviated as init.) procedure;
the other proposed (abbreviated as prop.) by us in Algorithm 1 has a hybrid initialization that guarantees the convergence of all receive
combining vectors. Three different channel estimates are considered: (a) idealistic case where the true channel is known at the receiver,
(b) LS channel estimates, and (c) MMSE channel estimates. The scenario under evaluation consists of K
M= 0.1, r = 0.5, and σ = 4
dB.
the assurance of channel hardening, which, in turn, is dramatically impacted by the spatial correlation
phenomenon, as reported in [2].
In conclusion, large-scale effects have a noticeable impact on performance and their concern in
the design of the rKA-based schemes applied to M-MIMO allows better results. We emphasize that
the improvements achieved by the proposed algorithm does not imply any increase in computational
complexity compared to that based on the original idea where our proposal also appeals because of its
broader applicability. Some other inferences can be thus pointed out from the results obtained here.
As the fact that when the true channel is known on the receiver side, it seems necessary more rKA
iterations than the case of MMSE channel estimates to follow the boundaries of SE, which in turn
needs more rKA iterations than the worst channel estimation case constituted of LS channel estimates.
This and other effects are better characterized in the sequel.
B. Rate of Convergence: Number of Iterations Required
Our concern now is to define a notion of convergence based on the number of rKA iterations
required to reach a predefined error limit by considering the canonical RZF SE as the true value.
26
This will give us more insight into how the algorithm converges and will also help us establish
a way to evaluate in the sequel its computational complexity. The gap in percentage between the
SEs of the canonical RZF scheme and of its emulation performed through Algorithm 1 is shown in
Fig. 3 as a function of the average number of rKA iterations for uncorrelated/moderately correlated
scenarios, different channel estimates, and different loading factor values. Starting with the last, the
figure shows that a better convergence is obtained for smaller values of KM
, as expected from earliest
ideas conceived from Remark 2. This is mainly due to the expansion of the solution subspace when
solving the SLE that increases the interference among UEs. A puzzling result is then observed from
Fig. 3 which embraces the perception that the better the CSI is known at the receiver, the worse the
algorithm convergence. The reason why this occurs comes from Remark 1, in which it was observed
that a larger value of ξ tends to help the algorithm convergence. In fact, as better is the channel
estimation, the better is also the characterization of power for the estimated channels which in turn
lessens the average value of ξ. Estimating the channels with a more naive approach therefore improves
the convergence of Algorithm 1, however, one should bear in mind that the achievable average SE is
also diminished. We also note that better convergence results are obtained for the uncorrelated case
in comparison to moderately spatially correlated channels. To explain this, we must again return to
Remark 1, where we observe that the average gain is severely impacted by λmin(GHG). Since the
fraction of eigenvalues that carries most of the power is decreased, the spatial correlation imposes
smaller values of λmin(GHG), thus worsening the convergence of the algorithm. The penalty brought
by the considered spatial correlation model is best detailed in the next section. Overall, the different
effects indicated here are consistent with the theory developed in the paper and the results discussed
in [5], but here with a more comprehensive point of view.
Recalling that the BS is equipped with simple, cheap hardware, we now focus on getting the
average number of rKA iterations necessary to achieve a defined error tolerance for the most realistic
implementation scenario in which LS channel estimation is most likely to be employed due to its
computational simplicity [11], [14]11. For loading factors equal to 0.1, 0.3, and 0.5, respectively, the
average numbers of rKA iterations needed to reach an error of 10% are roughly: 93, 589, and 1799
for uncorrelated channels, and 95, 655, and 1983 for moderately correlated channels. While an error
11The MMSE estimator has been used so far, only to provide insights as the ideal case of channel estimation. It goes, however,
against the placed paradigm, which considers a BS equipped with simple hardware that is unable to estimate the second moment of
channel responses.
27
of 1% demands 293, 1815, and 4960 for uncorrelated channels, and 95, 655, and 1983 for moderately
correlated channels, both with the same order in relation to the loading factors12.
Fig. 3. Percentage gap between the UL SEs of the canonical RZF scheme and its correspondent emulation performed through the
PARL rKA-based scheme (Algorithm 1) as a function of the average number of rKA iterations with M = 100, r = 0.5, and σ = 4 dB.
C. Rate of Convergence: Spatial Correlation Effects
Before moving on to complexity analysis, we set out to better understand how the considered spatial
correlation model impacts the algorithm convergence. To this end, the results of Fig. 3 were extended13,
as can be seen in Fig. 4, by considering both r and σ dimensions for KM
= 0.1. When r is varying and
σ = 0 dB, although there are small convergence improvements in very particular regions, the general
behavior tells us that the antenna spatial correlation is detrimental to the performance of the PARL
rKA-based RZF scheme. The same result is completely obtained in the opposite case wherein the
12These values were obtained through a linear interpolation between the points of the curves in Fig. 3, wherein an interval of 100
average number of iterations was adopted between neighboring points. We also highlight that the exposed data were rounded towards
the nearest upper integer value.
13For the sake of clarity, we have omitted the true channel case; but we stress that the obtained responses are still valid under this
context.
28
spatial correlation arising from the environment disorder is considered; confirming thus the findings
obtained previously. Fig. 3 also corroborates the expansion of the idea that better channel estimates
deteriorate the convergence capacity of Algorithm 1. Again, the reason for the results about spatial
correlation effects proceeds from, strictly speaking, the relevant changes in the eigenstructure of G
inferred by the degree of spatial correlation that depends on the values of r and σ (see Remark 1).
(a) Percentage gap as a function of TrKA and r with σ = 0 dB. (b) Percentage gap as a function of TrKA and σ with r = 0.
Fig. 4. Percentage gap between the SEs of the canonical RZF scheme and its correspondent emulation performed through the PARL
rKA-based scheme (Algorithm 1) as a function of the average number of rKA iterations (TrKA) with K
M= 0.1. The graph also considers
the variation of the gap in: (a) the antenna correlation r dimension and (b) shadowing standard deviation σ dimension.
D. Computational Complexity: Canonical RZF Scheme vs PARL rKA-Based RZF Scheme
By utilizing the average number of rKA iterations acquired in Section VI-B for the scenario of
interest aforementioned, which embodies moderately spatially correlated channels and LS channel
estimation, we are now in position to analyze, computationally, both the canonical RZF scheme and
the PARL rKA-based RZF scheme; where, it is equally important to recall that Algorithm 1 can
be implemented in a distributed or a centralized way. That being the case, once the computational
complexities for DL are the same (see Section V), Table III captures the trade-off between the tolerated
error in average performance and the average computational cost (CC) based solely on Table I (UL
regime) for both ways of implementing the PARL rKA-based RZF scheme and using RZF as cost
reference. We define and also show in Table III an auxiliary metric called the reference ratio (RR) as
the percentage between the CC of canonical and proposed schemes when we consider the classical
29
TABLE III
AVERAGE COMPUTATIONAL COMPLEXITY COST AND REFERENCE RATIOS FOR THE CANONICAL AND PROPOSED SCHEME.
K
M
RZF
Scheme
PARL rKA-Based RZF Scheme
Error Bound of 10% Error Bound of 1%
Distributed Centralized Distributed Centralized
CCAverage CC
(uncorr./corr.)
RR [%]
(uncorr./corr.)
Average CC
(uncorr./corr.)
RR [%]
(uncorr./corr.)
Average CC
(uncorr./corr.)
RR [%]
(uncorr./corr.)
Average CC
(uncorr./corr.)
RR [%]
(uncorr./corr.)
0.1 16840 20600/21000 22.33/24.70 18600/19000 10.45/12.83 60600/68600 259.86/307.36 58600/66600 247.98/295.49
0.3 148520 123800/137000 −16.64/ − 7.76 117800/131000 −20.68/ − 11.80 369000/386600 148.45/160.30 363000/380600 144.41/156.26
0.5 424200 369800/406600 −12.82/ − 4.15 359800/396600 −15.18/ − 6.51 1002000/1022400 136.21/141.02 992000/1012400 133.85/138.66
CC - Computational Cost; RR - Reference Ratio.
strategy as the reference value. Given that positive values of RR hence denote a computational loss
of Algorithm 1, negative values are sought to answer if the Kaczmarz’s methodology is a strong
candidate to reduce canonical computational complexity, declared as one of the main goals of this
paper. The results in Table III importantly say that substantial computational gains are obtained for an
error bound of 10% with KM
= 0.3 (moderately crowded) and KM
= 0.5 (overcrowded), and under both
channel types. Note that the price to pay for computational benefits is an expected decrease in average
performance of 10%. On the other hand, when we subject Algorithm 1 for a better operation, that is, we
reduce the error bound to 1%, no computational gains are seen in favor of the PARL rKA-based RZF
scheme. Algorithm 1 can therefore markedly reduce the computational complexity of the RZF classical
scheme, but it should never resemble the canonical achievable SE with an efficient leveling solution
between performance and computational complexity. One can lastly observe that the centralized way
of executing the proposed scheme is more computationally light than the distributed approach. We
stress, however, that the centralization process causes some expected losses in performance that are
not being taken into account.
VII. CONCLUSIONS
In this paper, we revisit the solution to the problem of computational relaxation for canonical
signal processing schemes, given the recently proposed strategy that adopts premises based on the
KA/rKA as a relevant attempt to avoid any prior statistical knowledge of the channels. As a first
contribution, we have re-interpreted the original proposal aiming to derive the procedure of obtaining
the combining/precoding matrices through rKA, thus generalizing the paradigm for scenarios with
higher mobility and looking for the derivation of new insights. Some modifications in the original
30
rKA-based schemes were then proposed, relying on a more pragmatic novelty shown here that large-
scale effects are expected to degrade the algorithm performance. Specially, we have proposed a hybrid
initialization approach for the rKA-based schemes that has numerically outperformed the methods
available in literature, since it assures a faster average convergence of rKA under uncorrelated and
correlated fading scenarios, and for whatever channel estimation procedure is used. Subsequently,
a computational complexity framework of the two different rKA-based RZF schemes was suitably
derived underlying matrix-to-matrix (vector-to-vector) multiplications/divisions. On one hand, we nu-
merically shown that, if a performance loss of 10% is negligible, the most promising rKA-based scheme
can substantially reduce the computational complexity of the canonical RZF scheme for moderately
crowded and overcrowded scenarios in which a BS with simple and cheap hardware is employed.
On the other hand, the proposed scheme must not resemble the performance results of the canonical
RZF strategy with a computationally-efficient solution. Still, the theoretical characterization of the
convergence function with respect to the different system parameters would be interesting to better
understand the applicability of the proposed rKA-based schemes. This is indeed left as a potential
avenue for future research.
APPENDIX A: USEFUL RESULTS
The following definitions are considering an SLE in its canonical form Ax = b, where A ∈ Cm×n
is the matrix of the constant coefficients, x ∈ Cn is the vector of unknowns, and b ∈ Cm is the vector
of known offset coefficients. The subspace generated by the columns of A is denoted as X ⊂ Cn.
This SLE is solved via rKA in which the random row selection of A is denoted as a random variable
R ∈ {1, 2, . . . , m} and a specific random row picked at iteration t is z ∈ {1, 2, . . . , m}. It is assumed
that z has a generic sample probability given as Pz ∈ [0, 1], whereby it is possible to define a probability
vector as p = (P1, . . . , Pm) ∈ Rm+ . An remarkable property is that
∑mz=1 Pz = 1.
Definition 1. Random Rank-1 Projection Operator. The random rank-1 projection operator of a random
chosen row R is [5]: PR := (aRaHR)/‖aR‖22 , where PR is an n× n matrix.
Definition 2. Average Random Rank-1 Projection Operator. From Definition 1, it is possible to specify
the average projection operator as [5]: P := E {PR} =∑m
z=1 Pz(azaHz )/‖az‖22, where the expectation
is w.r.t. R. Thus, P is an n× n positive-semidefinite matrix with tr(P) = ∑mz=1 Pz = 1.
31
Definition 3. Average Gain. The average gain is a suitable metric that measures the exploitation
provided by the average projection operator over X seeking obtain x, given as [5]: κX (A,p) :=
minϑ∈X ,ϑ6=0(ϑHPϑ)/‖ϑ‖22, where one observes that κX relies on A and p. Besides, it is possible to
infer that κX (A,p) ∈ [0, 1min{m,n}
]. This last result is a consequence of tr(P) = 1, and of the subspace
X has its dimension generally determined by min{m,n}.
APPENDIX B: PROOF OF THEOREM 1
Using Definition 2 and (11), the average projection operator of BH is [5]
P =m∑
z=1
‖bH
r(t)‖22‖B‖2F
bH
r(t)br(t)
‖br(t)‖22=
BBH
‖B‖2F. (26)
From Definition 3 and (26), the average gain provided by BH can be obtained as [5]
κX (BH) = min
ϑ∈X ,ϑ6=0
‖BHϑ‖22‖BH‖2F‖ϑ‖22
(a)= min
ε∈CK ,ε6=0
‖BHBε‖22‖BH‖2F‖Bε‖22
=λmin (B
HB)
‖B‖2F, (27)
wherein (a) it was used the fact that any vector in X ⊂ CM+K , which is the subspace generated by the
columns of BH, can be written as Bε with ε ∈ CK , as made in [5]. Finally, the average error of the
first, deterministic iteration of Algorithm 1 can be acquired through Definition 1: Pk = (bHkbk)/‖bk‖22.
As k is deterministic, P = Pk and then, completing the proof, the average gain provided by the first
iteration is
κ1X (B
H) = minϑ∈X ,ϑ6=0
ϑHPϑ‖ϑ‖22
= minϑ∈X ,ϑ6=0
‖bHkϑ‖22
‖bk‖22‖ϑ‖22. (28)
REFERENCES
[1] T. L. Marzetta, E. G. Larsson, H. Yang, and H. Q. Ngo, Fundamentals of Massive MIMO. Cambridge, UK: Cambridge University
Press, 2016.
[2] E. Bjornson, J. Hoydis, and L. Sanguinetti, Massive MIMO Networks: Spectral, Energy, and Hardware Efficiency. Delft, The
Netherlands: Now Publishers, 2017.
[3] A. Kammoun, A. Muller, E. Bjornson, and M. Debbah, “Linear Precoding Based on Polynomial Expansion: Large-Scale Multi-Cell
MIMO Systems,” IEEE Journal of Selected Topics in Signal Processing, vol. 8, no. 5, pp. 861 – 875, 2014.
[4] E. Bjornson, L. Sanguinetti, and M. Debbah, “Massive MIMO with Imperfect Channel Covariance Information,” in 2016 50th
Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 2016, pp. 974–978.
[5] M. N. Boroujerdi, S. Haghighatshoar, and G. Caire, “Low-Complexity Statistically Robust Precoder/Detector Computation for
Massive MIMO Systems,” IEEE Transactions on Wireless Communications, vol. 17, no. 10, pp. 6516–6530, 2018.
[6] S. Kaczmarz, “Angenaherte Auflosung von Systemen linearer Gleichungen,” Bulletin International de l’Academie Polonaise des
Sciences et des Lettres. Classe des Sciences Mathematiques et Naturelles. Serie A, Sciences Mathematiques, vol. 35, pp. 355–357,
1937.
32
[7] D. Needell, N. Srebro, and R. Ward, “Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm,”
Advances in Neural Information Processing Systems, pp. 1017–1025, 2014.
[8] Y. Chi and Y. M. Lu, “Kaczmarz Method for Solving Quadratic Equations,” IEEE Signal Processing Letters, vol. 23, no. 9, pp.
1183–1187, 2016.
[9] T. Strohmer and R. Vershynin, A Randomized Solver for Linear Systems with Exponential Convergence. Springer, Berlin,
Heidelberg, 2006, vol. 4110.
[10] V. Croisfelt Rodrigues, J. C. Marinello Filho, and T. Abrao, “Kaczmarz Precoding and Detection for Massive MIMO Systems,”
in Accepted on: IEEE Wireless Communications and Networking Conference, Marrakech, Morroco, 2019.
[11] E. Bjornson, J. Hoydis, and L. Sanguinetti, “Massive MIMO Has Unlimited Capacity,” IEEE Transactions on Wireless
Communications, vol. 17, no. 1, pp. 574–590, 2018.
[12] L. Dai, M. Soltanalian, and K. Pelckmans, “On the Randomized Kaczmarz Algorithm,” IEEE Signal Processing Letters, vol. 21,
no. 3, pp. 330–333, 2014.
[13] T. L. Marzetta, “Noncooperative cellular wireless with unlimited numbers of base station antennas,” IEEE Transactions on Wireless
Communications, vol. 9, no. 11, pp. 3590–3600, 2010.
[14] X. Gao, O. Edfors, F. Tufvesson, and E. G. Larsson, “Massive MIMO in Real Propagation Environments: Do All Antennas
Contribute Equally?” IEEE Transactions on Communications, vol. 63, no. 11, pp. 3917–3928, 2015.
[15] X. Gao, O. Edfors, F. Rusek, and F. Tufvesson, “Massive MIMO Performance Evaluation Based on Measured Propagation Data,”
IEEE Transactions on Wireless Communications, vol. 14, no. 7, pp. 3899–3911, 2015.
[16] S. Loyka, “Channel Capacity of MIMO Architecture Using the Exponential Correlation Matrix,” IEEE Communications Letters,
vol. 5, no. 9, pp. 369–371, 2001.
top related