on kaczmarz signal processing technique in massive mimo

On Kaczmarz Signal Processing Technique in

Massive MIMO

Victor Croisfelt Rodrigues∗ and Taufik Abrao∗

∗ Department of Electrical Engineering (DEEL), State University of Londrina (UEL),

Londrina, Brazil

E-mail: victorcroisfelt@gmail.com, taufik@uel.br

Abstract

To exploit the benefits of massive multiple-input multiple-output (M-MIMO) technology in scenarios where

base stations need to be inexpensive and thus possibly equipped with simple hardware, the computational

complexity of classical signal processing schemes for spatial multiplexing must be reduced. This calls for

suboptimal or ”relaxed” schemes that perform well the required combining/precoding steps and simultaneously

achieve low computational complexities. An approach based on the iterative Karczmarz algorithm (KA) has

been recently investigated with this intention, assuring to execute well without the knowledge of any a priori

statistics of the channel responses. Herein, modifications are proposed on this first KA-based attempt, aiming

to improve its rate of convergence with a rising performance-computational complexity trade-off solution for

the M-MIMO combining/precoding problems. Such improvements are motivated by observed negative effects

of pathloss and shadowing, originally presented here, over the rate of convergence granted by the KA, which

consequently worsens its achievable spectral efficiency (SE). Specifically, the randomized version of KA (rKA)

is used when conceiving the signal processing schemes due to its global convergence properties, where our

adaptations can be seen as a fairness policy for algorithm initialization that guarantees the converge of all

combining/precoding vectors, whether they be from center located users or those located at the edge of the

cell. Two different rKA-based schemes are in fact discussed, of which only one is considered applicable,

given the constraints placed by the underlying scenario. To ensure that the proposed method outclasses the

original proposal, a mathematical characterization is presented along with numerical results that employ more

realistic system and channel conditions. More precisely, the simulations embrace the least-squares and minimum

mean-squared error estimators of the channel responses, also taking into account uncorrelated and correlated

Rayleigh fading channels. The spatial correlation of the latter is modeled using the exponential correlation

model combined with large-scale fading variations over the array, which compound is demonstrated to be

prejudicial to algorithm performance. The computational complexity of the modified, most promising rKA-

based scheme, which intends to obtain the receive combining and transmit precoding matrices, is derived

in terms of complex multiplication/division matrix operations. Computational complexity gains are found for

the focused, relaxed strategy in moderately and highly loaded M-MIMO scenarios, but this computational

advantage is seen as detrimental to the SE when considering the performance afforded by classical schemes.

Index Terms

Massive MIMO; Combining; Precoding; Kaczmarz algorithm; Computational complexity.

I. INTRODUCTION

Massive multiple-input multiple-output (M-MIMO) has the ability to allow that all user equipment

(UE) concurrently exploit the same time-frequency resources to communicate through space-division

multiple access. Although non-linear M-MIMO signal processing techniques are typically credited as

optimal implementation strategies, it is well known that linear schemes achieve adequate performance-

computational complexity results that allow for a good spatial differentiation of the UEs. This is a

direct consequence of the use of massive antenna arrays that, basically, suppress the interference

among UEs [1], [2]. In each coherence block, therefore, the base station (BS) needs to estimate the

channel responses between the BS and UEs’ antennas in order to spatially differentiate the UEs. This

knowledge is acquired through a pilot training phase, which turns out to be generally feasible by the

application of a time-division duplex (TDD) architecture. In view of the fact that this channel state

information (CSI) is proportional to the number of BS and UEs’ antennas adopted in the system,

the classical computation of the combining/precoding schemes usually delivers in a substantially large

computational complexity. For this reason, it is desirable to relax further the computational complexity

related to canonical linear schemes, or simply canonical schemes, so that the use of low processing

hardware units in the BS can be allowed at the same time that the benefits brought by the usage of

a large number of antennas are still availed during communications, whose set would also culminate

in attractive economic gains.

Particularly, a good strategy to reduce the computational complexity of signal processing schemes

has been the search for suboptimal techniques. These called ”relaxed” schemes are envisaged to lessen

the complexity of the canonical ones and, simultaneously, achieve similar capacity results to them.

The recent literature contains different proposals to obtain the aforementioned objectives, such as

the polynomial expansion method that reformulates the classical schemes based on the Taylor series

[3]; being this approach also known as the truncated polynomial expansion (TPE). This technique

provides a comparable performance in terms of spectral efficiency (SE) with a reduced computational

complexity when compared to canonical schemes. For the TPE-based regularized zero-forcing (RZF),

the achieved performance is near to the classical fashion of normally executing such approach, while

the computational complexity is equivalent to that obtained for the simple maximum-ratio scheme [3].

In spite of the fact that TPE delivers good results, it exceedingly relies on the statistical knowledge of

the second order of the channel responses, that is, the information of the channel covariance matrices

must be known at the receiver. As can be seen from [4], estimating the second order is undoubtedly

challenging and computationally prohibitive to be obtained, especially when the dimension of M

becomes high, rendering it undesirable in the context under consideration without even need to take

into account the severe storage requirements.

In contrast to TPE, the authors of [5] proposed the application of the Kaczmarz algorithm (KA) over

the combining/precoding problems of M-MIMO. A remarkable advantage of the KA-based methods

over TPE or even other schemes lies in the fact that the former does not require the knowledge

of covariance matrices to achieve adequate performance with compelling complexity. The focused

procedure was initially proposed by Kaczmarz in [6] as an iterative technique for solving determined

and overdetermined (OD) set of linear equations (SLE). With the increasing popularity of stochastic

gradient techniques and machine learning, KA has been revived [7] and applied to other problems,

such as solving quadratic equations [8]. Recently, a randomized version of the KA (rKA) to solve

consistent OD SLEs was introduced and analyzed in [9]. This random approach of the KA allows

an expected exponential rate of convergence and is therefore much more reliable, unlike the case of

classical KA convergence that depends tremendously on the way that the equations are arranged in the

SLE. For M-MIMO, the authors of [5] derived a mathematical framework to examine the performance

of the KA, more specifically rKA, applied to emulate some linear signal processing techniques. The

rKA-based schemes of [5] consider the resolution of the signal estimation problem, i.e., they are

used to estimate in the receiver the signal sent/transmitted by/for each user in each symbol period.

Herein, we re-consider the emulation of some canonical linear schemes through the rKA as originally

proposed in [5] with the intention of reducing their computational complexities, thus giving further

insights about the algorithm capabilities and possible improvements to it for the case where the BS

is composed of a massive number of antennas, but is now equipped with simple and cheap hardware.

Different from [5], our aim is to obtain the receive combining matrix through rKA and, then,

acquire the transmit precoding matrix by resorting to the uplink-downlink (UL-DL) duality; stressing

that the authors of [5], motivated by the consideration of scenarios with slow mobility of the UEs,

directly describe the rKA to solve the signal estimation problem in each symbol period. The analysis

here considers the solution emulation of classical zero-forcing (ZF) and RZF strategies by using the

rKA, under the motivation that these techniques lead to a good performance-computational complexity

tradeoff [2]. In addition to the presentation of a more general point of view with respect to mobility,

we propose modifications upon the signal processing strategies based on rKA suggested in [5],

demonstrating the effectiveness of our proposals theoretically and numerically. The refinements in

the algorithm were motivated owing to the pathloss and shadowing effects be seen as detrimental to

the rKA functionality, as reveled in the conference paper [10], where we broadly introduce the proposed

modifications that will be further elaborated here. To answer if the rKA reaches a computationally-

efficient approach in general, a computational complexity analysis considering complex matrix-to-

matrix multiplications/divisions is also derived. This study is performed once the authors in [5]

only presented the complexity related to rKA-based schemes in terms of the big-O notation, which,

evidently, does not support accurate conclusions. We have additionally adopted more realistic M-MIMO

channel conditions, where the use of classical channel estimators are incorporated in simulations.

Besides that, spatially correlated channels are generated by the adoption of the exponential correlation

model with large-scale fading variations over the array, thus demonstrating the impact of two different

sources of spatial correlation on the convergence of proposed algorithms: (a) one generated from the

proximity of the array elements and (b) the other having its origin in the unequal contribution of

power from the scatterers chaotically distributed along the propagation medium. This paper is then

organized as follows. Section II first describes the M-MIMO system model in question, defining the

SE bound that we shall use and the combining/precoding signal estimation problems as optimization

problems; indeed, the latter will be where the Kaczmarz’s framework is employed as a possible solver,

thereby emulating the capacity results originating from the canonical solutions. The key properties of

the KA and its random variant, the rKA, are discussed in Section III, while they are applied as a way

to solve, in two different ways, the combining/precoding optimization problems in Section IV. Section

V evaluates the computational complexity of the rKA-based schemes 1. Numerical results aiming at a

better characterization of the rKA-based scheme that presents the best appeal in a much more realistic

M-MIMO context are demonstrated in Section VI. The main conclusions are then summarized in

Section VII.

1Note the use of the randomized version of KA here. In fact, as will be seen and motivated further, the rKA provides more reliable

results, which corroborates the future focus on signal processing schemes based on it.

Notations: Italics letters denote scalars, whereas boldface uppercase and lowercase represent matrices

and columns vectors, respectively. The superscript (·)H denotes the Hermitian transpose and NC(·, ·)stands for the circularly symmetric complex Gaussian distribution. E {x} holds for the expected value

of a random variable x. The N × N identity matrix is indicated by IN , whereas a vector of N and

a matrix of N × N zero elements are represented, respectively, as 0N and 0N×N . ‖·‖2 and ‖·‖F are

the l2-norm and Frobenius norm, respectively. The inner product between two vectors is denoted as:

〈·, ·〉. The concatenation is represented by [·, ·] and the selection of the row i and column j of A is

performed as [A]i,j . In particular, the ith element of a vector a can be seen as ai.

II. SYSTEM MODEL

In this section, a canonical single-cell M-MIMO is presented, aiming to introduce the SE bound

utilized to evaluate the signal processing techniques and to state the signal estimation as an optimization

problem to be solved later through the Kaczmarz’s premises. The setup considered here is comprised of

a BS equipped with M antennas serving K single-antenna UEs. Presupposing that the coherence blocks

with a length of τc (coherence interval) symbols are statistically independent over their realizations

(block-fading model) [1], [2], let gk ∈ CM denote the channel from UE k to the BS. In addition, it is

adopted a correlated Rayleigh fading model, whereby the channel can be written as gk ∼ NC(0,Rk).

The covariance matrix Rk ∈ CM×M thus embodies effects such as pathloss and spatial channel

correlation, where both correspond to large-scale propagation phenomena, while the complex Gaussian

distribution stands for the small-scale fading [11]. At this moment, one can assume a pilot training

phase to obtain the CSI that follows the features of the TDD scheme. The pilot sequences attributed to

the UEs are regarded as mutually orthogonal, since it is desired to combat the intra-cell interference.

As a consequence of this process, the BS acquires the estimated channel matrix G ∈ CM×K being

gk ∈ CM its kth column, which represents the estimated channel vector of UE k.

A. Uplink Data Transmission

In the UL, the BS receives the data symbols transmitted by all UEs, yielding the following linearly

combined signal: y =√

ρul∑K

i=1 gisi + n; where ρul is the normalized UL transmit power, si ∼NC(0, 1) stands for the data symbol sent by UE i, and n ∈ CM ∼ NC(0M , IM) denotes the receiver

noise. The performance of a selected receive combining vector, vk ∈ CM , will be evaluated using the

UL SE metric obtained through the application of the so-called use-and-forget technique2. The net

ergodic and lower bounded UL SE for UE k is then [1], [2]

SEulk =

τulτc

1 + γul

[bit/s/Hz], (1)

with the effective signal-to-interference-plus-noise ratio (SINR) given as

ρul|E{vHkgk}|2

ρul∑K

i=1 E{|vHkgi|}2 − ρul|E{vH

kgk}|2 + E{‖vk‖22}, (2)

where τul is the fraction of the coherence interval spent in UL. Supposing that only the UL is occurring,

τul = τc − τp being τp the length of the pilot training phase. The expectations in (2) are taken with

respect to all the randomness related to the channel estimates and combining schemes.

B. Downlink Data Transmission

The BS sends data to its respective UEs during the DL, where each data bearing has to be precoded

with the aim of spatially direct it towards the desired UE. This directional beam control is performed

through a transmit precoding vector given as wk ∈ CM for the kth UE. Thus, the first stage of

DL is to steer the information signal to be transmitted by the BS via a precoding process stated as

x =∑K

i=1wiςi; where ςi ∼ NC(0, 1) is the message signal sent by the BS for UE i. An important

remark to be made is that ‖wi‖22 = 1 such that the transmit precoding process does not interfere in

the signal power. The precoded signal is then transmitted by the BS, while the UEs are awaiting for

its reception. Recall that once the TDD scheme is assumed, the UL and DL channels are reciprocal

within each τc-block. Keeping this in mind, the UE k receives: yk =√

ρdl∑K

i=1 gHi x + n; where ρdl

is the normalized DL transmit power. It can be seen that each UE is affected by all UEs’ precoding

vectors and, for this reason, the selection of wi becomes a difficult problem to solve. An heuristic

way to resolve the selection of the precoding vectors across the UEs is through the use of the UL-DL

duality, which will be better discussed and deeply exploited in the sequence. The UE k can estimate

its respective message signal by computing the average precoding channel E{gHkx}. This procedure

allows the derivation of a hardening bound for the DL SE similar to (1) with an alike definition of τdl,

which is given in [2, p. 317] and extremely relies on attaining the channel hardening effect. The SE

bound of the DL is not directly evaluated here, since the UL evaluation is sufficient to demonstrate

the effectiveness of the work proposals without any loss of generality.

2This denomination stems from the exploitation of the channel estimates only to compute the receive combining vector, whereas the

CSI is not used in the detection process.

C. Combining/Precoding Schemes: Problem Statement

Throughout this section, the ZF and RZF receive combining schemes are presented and then,

relying on the UL-DL duality, their respective transmit precoding vectors are obtained. For ease of

exposition, the receive combining and transmit precoding matrices are defined as V ∈ CM×K and

W ∈ CM×K with vk and wk being their kth columns, respectively. It is furthermore demonstrated

that the introduced canonical schemes can be seen as the solution of optimization problems. As we

will see, the optimization problems can be treated as SLEs and thus can be naturally solved through

rKA, as conducted in Section IV, giving rise to rKA-based signal processing schemes.

The RZF is a sophisticated approach that takes into account the regularization of the estimated

channel matrix by the signal-to-noise ratio (SNR). This regularization factor can mathematically aid

the inversion of G and, physically, it weights the rates of interference suppression and desired signal

maximization provided by this scheme. The RZF is given as [2]

VRZF = G(

GHG+ ξIK

where ξ = 1ρul

is the regularization factor, defined as the inverse of the SNR. This approach is

recommended to be used in scenarios where a good estimation performance is achieved, and the

inter-cell interference is not duly strong. The latter only valid for a multi-cell case.

Notice that if the SNR is high, ξ → 0, and/or a large M is applied, tr(GHG)≫ tr (ξIK), RZF can

be approximated as [1], [2]

VZF = G(

GHG)−1

The term ZF is provident from the fact that GHVZF = IK , noting that VZF is the pseudo-inverse of

GH. Consequently, the ZF aims to reduce all the intra-cell interference, while it assures the power

level of the desired signals. When the receive combining is applied, namely, (VZF)Hy, however, the

real channels are different from the estimated ones, whose fact translates into residual interference [2].

In addition, note that not even all the UEs have high SNR, hence, the ZF is expected to give lower

performance results as compared to RZF. It should be clear that the ZF can be comprehended as the

RZF case with ξ = 0.

In order to separate and estimate the signals of each UE, the receive combining scheme is applied

over the received signal in the BS. This can be expressed as

s = VHy, (5)

where s ∈ CK is the vector with the estimates of the data symbols transmitted by all UEs in one

communication instant and V can be either the RZF or the ZF filter. In view of that, the RZF scheme

can be stated as the optimal solution of the signal estimation problem handled above, as it is presented

in Corollary 1 and proved in the sequel, based on [5].

Corollary 1. From (3) and (5), the RZF signal estimate is

s = (VRZF)Hy =(

GHG+ ξIK

GHy, (6)

thus, it is possible to infer that s is an optimum solution for the estimation problem, since it is the

optimal answer of argmin∈CK‖G− y‖22 + ξ‖‖22.

Proof. The first derivative in relation to of the minimum cost function above is 2GH(G−y)+2ξ.

The optimal solution that minimizes the given derivative is therefore ⋆ = (GHG+ ξIK)−1GHy; then,

one can contemplate that the given answer corresponds to s.

The same can be done for the ZF when considering ξ = 0, however, an analysis focused on the

more general case characterized by the RZF scheme will be assumed hereafter. The optimization

problem given in Corollary 1 can be rewritten compactly as ‖B − y0‖22, where B = [G;√ξIK ] is

an (M +K) ×K matrix with the estimated channel matrix and the factorized regularization factor,

and y0 = [y; 0] is an (M +K)-dim vector with the received signal and a zero vector. Thus, s can be

expressed as

s = (BHB)−1BHy0 = (GHG+ ξIK)−1GHy. (7)

These definitions will be used to properly apply the rKA as an alternative way to solve the optimization

problem established in Corollary 1.

As anticipated before, the selection of the transmit precoding vectors is a very complex problem

to be solved, owing to its self-dependence, i.e., a precoding vector is affected by all the precoding

vectors of the system. An heuristic way to solve it is the UL-DL duality, a direct consequence of

considering TDD mode, by which the transmit precoding vector of UE k can be computed as [2]

wk =vk

‖vk‖2(8)

for any receive combining scheme. Recall that the normalization is related to the desired fact that the

transmit precoding vector should not interfere with the transmitted power. The above definition allows

us to obtain the transmit precoding vectors for ZF, RZF, and also KA- or rKA-based schemes without

the exact solution of the precoding problem. This result is deeply explored in the next sections.

III. KACZMARZ AND RANDOMIZED KACZMARZ ALGORITHMS

The mathematical concepts behind the KA are now briefly described based on the Kaczmarz’s

seminal work [6], in addition to those underlying the rKA, an extension of KA provided by Strohmer

and Vershynin in [9], which has a strict rate of convergence that allows more reliable results when

compared to the classical KA. For this purpose, we consider, throughout this section, the canonical

and generic SLE Ax = b, where A ∈ Cm×n is the matrix of constant coefficients, x ∈ Cn is the

vector of unknowns, and b ∈ Cm is the vector of known offset coefficients. Note that, under the

consideration of an underdetermined (UD) SLE (m < n), A is a full row-rank matrix, whereas for

an OD SLE (m > n), A is a full column-rank matrix. Besides that, the SLE is deemed consistent,

which means that it possesses only one possible solution.

The KA can be written as [6], [9]

xt+1 = xt +bi − 〈ai,x

t〉‖ai‖22

ai, (9)

where t is the iteration index and ai = ([A]i,:)H ∈ Cn is the ith round-robin selected row of A; similarly,

bi is the ith element of b. It is important to stay clear that the selected row i is a deterministic variable

for the classical KA and can then be computed circularly, for instance, as i = mod(t,m) + 1. We

describe the KA in detail as follows. First, the KA is initialized with an arbitrary solution, x0, seeking

to find the genuine answer, x. It then takes a row i in a specific KA iteration t that is successively

chosen from the order of the equations in A, i.e., sweeping from a1 to am. After that, KA performs

the computation bi − 〈ai,xt〉, where notice that if 〈ai,x

t〉 = bi, the solution is not updated, resulting

in xt+1 = xt; if not, the KA projects the current approximate solution, xt, onto the hyperplane

defined by 〈ai,xt〉 = bi, thus securing the solution consistency and performing the update of xt+1.

These successive orthogonal projections tend to lead x0 to x for a suitable number of KA iterations.

In summary, the KA is an efficient manner to find the solution of an SLE, which strives to guide

a random guess in the direction of the desired answer, obeying the maximum energy perturbation

principle [5].

Despite the KA be an extremely powerful solver, mainly for OD SLEs, its rate of convergence

is not duly known and is therefore not satisfactorily explored. This occurs due to the KA’s rate of

convergence be a difficult metric to be estimated, since there is a strong dependence on the way the

equations are arranged in A; which fact affects the so-called KA’s update schedule, namely, the manner

the equations are selected as the number of iterations grows. Eventually, some works identified that the

random choice of the rows of A can guarantee a rigorous rate of convergence for the KA. From this

observation, [9] proposed the rKA scheme that assures an expected exponential rate of convergence,

which is of capital importance to ensure a reliable KA solution in terms of reproducible results.

The rKA randomly and independently chooses the rows of A in its update schedule, founded on a

metric that measures the relevance of the rows. In this case, the random KA-variant model described

in [9] can be expressed as

xt+1 = xt +br(t) − 〈ar(t),x

t〉∥

∥ar(t)

ar(t), (10)

where the random picked row in rKA iteration t is denoted as r(t) ∈ {1, 2, . . . , m}. In particular, each

r(t) has a sample probability given as Pr(t) ∈ [0, 1] that embraces a vector with the specified sampling

distribution of all rows equal to p = (P1, . . . , Pm) ∈ Rm+ . It should be observed that

∑mr(t)=1 Pr(t) = 1.

This sample probability is defined to minimize the expected mean-squared error (MSE) at a given

rKA iteration t, which fact occurs when [9]

Pr(t) :=‖ar(t)‖22‖A‖2F

. (11)

In fact, the above Pr(t) is suboptimal3 and can be interpreted as an amount that measures the fraction

of energy of a given row with respect to the total energy of the system. Although suboptimal, this

probability distribution has demonstrated effectiveness and, hence, will be adopted in this work. The

main convergence result of the rKA established in [5] and [9] is summarized by the following corollary.

Corollary 2. The expected rate of convergence of the rKA is

E{‖xt − x⋆‖22} ≤ (1− κX (A))t‖x0 − x⋆‖22, (12)

with x⋆ being the optimal solution and x0 an arbitrary initial guess. Moreover, κX (A) is nominated

as the average gain and defined as

κX (A) := minϑ∈X ,ϑ6=0

‖Aϑ‖22‖A‖2F‖ϑ‖22

, (13)

where one should observe that the above metric is totally independent of the sample probability and

is proportional to the condition number4 of A.

3Optimization of the probability may not be useful in M-MIMO scenario, since the CSI changes every τc. Indeed, as it shall be

demonstrated, A is directly linked to G. Altogether, it will be required to solve optimization problems for each τc, wherein these

solutions require unwanted hardware load [5], [12], given the posed BS restrictions.

4Remember that ‖A‖22‖A−1‖2F is usually defined as the condition number of A.

Proof. The proof can be found in [5, p. 11].

To gain further insights, the above corollary defines in (12) the quantity of rKA iterations (pro-

jections) required to take the initial solution x0 closest to x⋆, based on an average gain provided by

the rKA. This convergence is exponentially reaching the average inaccuracy limit that is given by the

left-hand side of (12). Note that the higher the (1−κX (A)), more rKA iterations are needed to lower

the initial error ‖x0 − x⋆‖22 towards the coveted boundary. It is thus desirable to have a high value

of κX (A) so as to shrink the number of rKA iterations necessary to achieve, suitably, the expected

error limit. This means that there is an irrefutable connection between the κX (A) and the convergence

of the rKA. As a result, the best-case of convergence is represented by the maximum value that the

average gain can assume, while its minimum stands for the worst-case scenario of convergence. It is

also worth mentioning explicitly that the number of rKA iterations is inversely proportional to the

average gain. Because of the above convergence result, which brings additional reliability, our focus,

from now on, is the use of rKA in the design of signal processing schemes based on the Kaczmarz’s

methodology5.

IV. OBTAINING THE COMBINING/PRECODING MATRICES VIA RANDOMIZED KACZMARZ

ALGORITHM

The rKA is now presented as a valid approach to solve the receive combining problem established

in Corollary 1, which can be interpreted as if the rKA was emulating the solution offered by the RZF

scheme. More precisely, two rKA-based RZF strategies are presented to obtain the receive combining

matrix, by which the transmit precoding matrix can be derived using the UL-DL duality, described

in (8), without loss of generality. An alternative way to solve the optimization problem considered in

Corollary 1 is its statement through the resolution of an OD SLE given as B = y0. Notice that this

SLE is OD, since the number of equations, M +K, is greater than the number of unknowns, K, in

a typical M-MIMO scenario [13]. Moreover, observe that this problem is not consistent, because of

the arbitrariness inserted by the receiver noise in the signal received by the BS, y, which makes that

the OD SLE possesses an infinite number of solutions. If the rKA is applied to solve this problem,

a convergence bias or residual error is eventually introduced in the solution due to the inconsistency.

In [5], this problem was circumvented by splitting B = y0 with ⋆ = s into two stages. The first

5One must stay clear that the receive combining and transmit precoding based therefore on the rKA can also be implemented with

the KA, however, the converge will immensely rely on the way that the equations are ordered in the SLE to be solved.

strives to remodel the OD SLE to a consistent form, leading to an UD SLE; then, another OD SLE

is derived based on the genuine stated problem (B = y0). We present this in depth below.

A. The Two Steps in Solving the SLE

The first goal is to obtain an estimate of y0, which term embodies the inconsistency. To this end,

y0 ∈ C(M+K) is first defined as

y0 := Bs(a)= B(BHB)−1GHy, (14)

where in (a) the relation exposed in (7) was applied. By making a parallel with the original SLE,

BHy0 = , the y0 can be estimated via the following SLE:

BHy0 = (BHB)(BHB)−1GHy = GHy. (15)

The above SLE is UD inasmuch as the number of equations, K, is smaller than the unknowns amount,

M +K. Even more, it is consistent in view of all the solutions are within the subspace generated by

the columns of BH. Regardless of not being optimum, y0 can be estimated through rKA. That being

the case, it is natural to derive a second SLE given as Bs = y0, wherein both OD and consistency

conditions are comprised. The solution of this second SLE based on rKA, however, is not strictly

necessary, seeing that the estimate of s can be acquired through the K last rows of y0. This means

that solely part of the UD SLE has to be solved, which can be easily seen when y0 is replaced in

Bs = y0 as its definition in (14), doing that we get

Bs = y0 = B(BHB)−1GHy

s = BHB(BHB)−1GHy(a)= BHy0,

where in (a) it was used the equality in (15).

B. Parallel rKA-Based RZF Scheme

The receive combining matrix can be obtained through the implementation of K rKAs in parallel

of which the kth one is solving an SLE of the form: BHc = ek. By already adopting the Kaczmarz’s

notation, ct ∈ C(M+K) = [ut; zt] with ut ∈ CM and zt ∈ CK . Further, ek ∈ CK is the canonical

basis vector with [ek]k = 1 and 0 otherwise. This analogous problem is essentially founded in the UD

SLE exposed in (15) and the property given in (16), being its parallel resolution properly described

in Algorithm 1. It is remarkable to see that, in the parallelized form, ek is replacing GHy from its

underlying SLE in (15), which only activates the resolution for the equations related to the kth rKA. In

other words, this aspect suggests that the kth rKA is solving the receive combining vector associated

with UE k. In fact, the kth rKA solution does not actually yields vk, but it provides the vector dk ∈ CK

that is given solely by the last K rows of ct, i.e., zt; thereby, vk = Gdk.

Notably, the execution of each rKA can be envisioned to run concurrently in different hardware

units, where each unit is independently or quasi-independently reaching its own receive combining

vector. This independence is very related to the way that the randomness of the update schedule in

(10) is interpreted at a given context, i.e., how the random selection of rows are associated across the

hardware units. With this in mind, two possibilities are plausible to give satisfactory results: (a) all the

K rKAs can share the random rows, r(t), and (b) each rKA has its self-realized arbitrariness. This

discussion is better deepened in Section V, which treats about computational complexity; however,

realize that both approaches provide suitable convergence, as established in Corollary 2.

In spite of the fact that Algorithm 1 was indirectly6 assessed in [5], some changes are included here.

To put them in context, it is reasonable to infer that the sample probability Pr(t) is generally impacted

by the pathloss and shadowing due to their relation with the estimated channel powers (see Pr(t) in

step 10 of Algorithm 1). If p is detrimentally affected, the rKA’s update schedule is also degraded. At

this moment, one must be noted that if row k is not selected in the kth rKA, the solution update will

not even occur, making the desired convergence unattainable. This also happens because of the use

of zero vectors to initialize the state variables in c0. The zero vectors, however, are desirable, since

other arbitrary values, like a vector of ones, can cause a power bias over the rKA-estimated receive

combining vectors. In summary, the UEs close to the BS (center-UEs) have significant higher values

of Pr(t) than those for UEs located at the edge of the cell (edge-UEs). This disparity leads to a poor or

impossible convergence of the combining vectors of edge-UEs obtained through Algorithm 1. More

insightful results about the consequences of large-scale effects over p are carried out in Section VI-A.

Striving to overcome this negative effect, we observed that a suitable and non-random manner to

initialize the kth rKA and thus guarantee its updating consists of forcing the selection of the kth row

in the first run; as performed in step 8 of Algorithm 1. This proceeding can also provide a better

average initial guess, which translates into a better average gain that improves the rate of convergence

presented in Corollary 2, as shall be evidenced in the sequel. What makes the proposed procedure an

6Recall that the authors of [5] discuss the rKA as an approach to directly obtain s, motivated by slow mobility scenarios. Herein, we

analyze a most generic case, where obtaining individually the combining/precoding matrices is of paramount importance.

Algorithm 1 Parallel (PARL) approach of rKA to estimate the RZF receive combining matrix

1: Input: Estimated channel matrix G ∈ CM×K , inverse level of SNR ξ, number of UEs K , and number of rKA

iterations TrKA.

2: Initialization: Specify the state vectors ut ∈ CM and zt ∈ CK with u0 = 0M and z0 = 0K ; either DRZF ∈ CK×K

as the factorized version of GHVRZF.

3: Procedure:

4: for i← 1 to K do

5: Compute the canonical basis, ei ∈ CK , where [ei]i = 1 and [ei]j = 0, ∀j 6= i.

6: for t← 0 to TrKA − 1 do

7: if t = 0 then

8: Pick the ith row of GH, gH

r(t) ∈ C1×M , for r(t) = i.

9: else

10: Pick the r(t)th row of GH, gH

r(t) ∈ C1×M , with r(t) ∈ {1, 2, . . . ,K} drawn based on p with Pr(t) =‖gr(t)‖

‖G‖2F+Kξ

11: end if

12: Compute the residual ηt :=[ei]r(t)−〈gr(t),u

t〉−ξzt

‖gr(t)‖22+ξ

13: Update ut+1 = ut + ηtgr(t).

14: Update zt+1r(t) = ztr(t) + ηt and repeat the other positions: zt+1

j = ztj , ∀j 6= r(t).

15: end for

16: Update [DRZF]:,i = zTKA−1.

17: end for

18: Output: VRZF = GDRZF.

attractive way to initialize Algorithm 1 is the fact that edge-UEs can now fairly obtain their receive

combining vectors, even with the tough preference given to the center-UEs in the computation of

Pr(t), making the parallel (PARL) rKA-based RZF scheme more robust to practical regimes and,

consequently, enhancing its convergence towards the aimed solutions. One can additionally conjecture

that the proposed initialization approach that can be seen as a hybrid algorithm between KA and

rKA is not strictly necessary to UEs that have high values of Pr(t), for the simple reason that they

are more probable to occur when drawing r(t). As a consequence, an optimization problem to find

which one of the K rKAs has to be force-initialized could be suggested. Note that the gain provided

by this probable optimization, however, is merely one more iteration for each rKA, and its finding

may produce a considerable and undesired computational complexity. This unattractive optimization

procedure was therefore not considered in this work.

The convergence performance of Algorithm 1 is defined in Theorem 1. Through this result, some

comments about the average gain bounds are made in Remarks 1 and 2. Recall that the limits of

κX (BH) are inversely proportional to the respective bounds of the number of rKA iterations. These

remarks will hence be important for the computational evaluation of the rKA-based RZF schemes.

Theorem 1. Consider the kth UD SLE BHc = ek, where BH ∈ CK×(M+K) is the matrix of constant

coefficients, c ∈ C(M+K) is the vector of unknowns, and ek ∈ CK is the canonical basis. By applying

the rKA to solve this system, as done in Algorithm 1, it converges towards the optimal solution c⋆

bounded by the following average error:

E{‖ct − c⋆‖22} ≤ (1− κX (BH))t‖c(1) − c⋆‖22, (17)

with κX (BH) =

λmin(BHB)

‖B‖2F(a)=

λmin(GHG) + ξ

‖G‖2F +Kξ, (18)

where in (a), B is replaced by its definition stated in (7). Note that c(1) can be seen as the initial

solution of the rKA, which also provides an average gain based on the KA:

κ(1)X (BH) = min

ϑ∈X ,ϑ6=0

kϑ‖22‖bk‖22‖ϑ‖22

, (19)

that is very dependent on the row index k. One should observe that the gap c(1) − c⋆ can thus be

made smaller on average than the provided by the adoption of a zero or any other random solution

as the initial one.

Proof. The proof is given in Appendix B.

Remark 1 (Average Gain Bounds: Generic Correlated Covariance Matrices with Gaussian Estimated

Channel Matrix). If the elements of G are distributed as NC(0, 1)7, the eigenvalues of G are upper

limited by 1 and then, if all eigenvalues are 1, we have ‖G‖2F = K. Thereby, the average gain stated

in (18) is limited as the following manner [5]:

λmin(GHG)

‖G‖2F≤ λmin(G

HG) + ξ

‖G‖2F +Kξ≤ 1

K, (20)

where ξ ≥ 0. From the above inequality, it is possible to presume that the higher the ξ, the faster

the convergence of Algorithm 1, since κX (BH) also turns out to be higher. Consequently, the ZF

filter case, i.e., ξ = 0 is expected to support a slower convergence than the provided by emulating

7For the canonical channel estimators analyzed in the numerical results, this distribution does not hold, since the estimated powers

are certainly smaller than the true ones.

the RZF scheme through rKA8. In addition to that, one should be observed that edge-UEs have a

better average gain than center-UEs; namely, even though the edge-UEs are poorly selected by the

randomized approach, they reach the solution more quickly than center-UEs.

Remark 2 (Average Gain Bounds: Uncorrelated Rayleigh Fading with Gaussian Estimated Channel

Matrix). When the uncorrelated Rayleigh fading channel modeling is considered, i.e., Rk = βkIM ,

where βk is the average large-scale fading coefficient for UE k, and the elements of G are distributed

as NC(0, 1), the limits of κX (BH) can be strictly established based on the random matrix theory. This

is done by solving the characteristic polynomial of GHG, finding its minimum, and then applying the

statistical characteristic of G, yielding [5]:

κX (BH) ≥ λmin(G

‖G‖2F=

(√M −

√K)2

1−√

K, (21)

where KM

is the UE-BS antenna ratio, also known as the loading factor in M-MIMO. From this result,

the number of rKA iterations is therefore definitively associated with the value of the loading factor.

Interestingly, as the value of KM

is close to 1, the average gain tends to be zero. This means that the

number of rKA iterations needed to comprise the worst-case convergence condition goes to infinity,

i.e., the convergence can be considered unattainable in that case. One needs to keep in mind, however,

that the UE-BS antenna ratio in M-MIMO is typically KM∈ [0, 0.5] [1], [2].

C. Linear Operator rKA-Based RZF Scheme

The following key property of rKA has to be introduced so that we can present the second rKA-based

method for obtaining the RZF receive combining matrix.

Corollary 3. Supposing any given SLE in its trivial form Ax = b, where A ∈ Cm×n is the matrix

of constant coefficients, x ∈ Cn is the vector of unknowns, and b ∈ Cm is the vector of known offset

coefficients with m and n representing the number of equations and unknowns, respectively. One can

solve this SLE via rKA with an initial guess x0, yielding in the tth rKA iteration an approximate

solution denoted as xt. This estimate can be expressed in terms of a linear operator (LO) realized

over A as xt = Lt(A)b, being Lt(A) ∈ Cn×m only dependent on A and the randomness attained to

the rKA convergence; however, it is apart from b.

8The results that will be obtained for the rKA emulating the RZF scheme can be easily expanded to the ZF case, but, since ZF has

a worse rate of convergence than RZF, the expected computational gains must be reduced for the former scheme.

Proof. The proof follows [5, p. 20].

As a result of the consistent OD SLE Bs = y0 and Corollary 3, it is easy to see that st = Lt(B)y0

is similar to the signal estimation expression in (5) with the proviso that Lt(B) is a K × (M +K)

matrix, whereas VH is K ×M ; the same can be said for y0 and y. It is therefore possible to realize

that the M first columns of the LO tend to approach the RZF receive combining matrix hermitian

for a suitable number of rKA iterations. This is the concept behind Algorithm 2, but the proposed

procedure was conceived aiming at estimating the LO over BH; by founding on BHs′ = y′0, where

s′ = Bs ∈ C(M+K) and y′0 = BHy0 ∈ CK . In this case, the control of p does not depend on M +K

rows, but on K rows by which the UEs are represented. Again, this management is desirable, since

the large-scale effects negatively impact the distribution range of p. An attempt to reduce this harmful

influence was to use the first K iterations to initialize the rKA, as realized in step 7 of Algorithm 2.

Theorem 2 establishes the convergence performance of Algorithm 2, where its proof is quite similar

to the conducted for Theorem 1. The Remarks 1 and 2 are also valid here.

Theorem 2. Let BHs′ = y′0 be an OD SLE with BH ∈ CK×(M+K) denoting the matrix of constant

coefficients, s′ ∈ C(M+K) the vector of unknowns, and y′0 ∈ CK the vector of known consistent

estimates of the received signal. As can be seen from Algorithm 2, the given SLE can be solved

through rKA. The convergence of this procedure is limited by the average error between the solution

obtained in rKA iteration t and the optimal (s′)⋆, given as

‖(s′)t − (s′)⋆‖22}

≤ (1− κX (BH))t ‖(s′)(K) − (s′)⋆‖22, (22)

with κX (BH) =

λmin (BHB)

‖B‖2F. (23)

The initial solution (s′)(K) also carries an average gain, which is

κ(K)X (BH) = min

ϑ∈X ,ϑ6=0

‖bHkϑ‖22

‖bk‖22‖ϑ‖22. (24)

The above average gain can be better than that the provided by any other arbitrary solution.

V. COMPUTATIONAL COMPLEXITY ANALYSIS

Since canonical and proposed schemes based on rKA are adequately described, we now turn our

attention to find a fair way to analyze and compare their computational complexities. This investigation

aids to answer the ad hoc question that desires to verify if the rKA-based strategies pave the way for

computationally light signal processing schemes. Our main goal is basically to define functions that

describe the costs of computational complexity of the signal processing schemes in view of matrix-

to-matrix multiplication and matrix inversion, together with the consideration of the same operations

for vectors. This computational finding strategy is motivated by the work conducted in [2, App. B,

p. 558], wherein the authors give a very detailed framework upon the costs of those elementary

matrix/vector operations. To emphasize, complex multiplications/divisions are then taken into account

in the computational complexity measuring functions, while additions and subtractions are neglected

with the premise that the latter are not as heavy as the former from hardware point of view. With this

in mind, the examination of the computational cost will be divided into the UL and DL phases.

Algorithm 2 Linear Operator (LO) approach of rKA to estimate the RZF receive combining matrix

1: Input: Estimated channel matrix G ∈ CM×K , inverse level of SNR ξ, number of UEs K and number of rKA iterations

2: Initialization: Define the linear operator Lt(BH) ∈ C(M+K)×K with L0(BH) = 0(M+K)×K .

3: Obtain B = [G;√ξIK ].

4: Procedure:

5: for t← 0 to TKA − 1 do

6: if t < K then

7: Pick the kth row of BH, bH

r(t) ∈ C1×(M+K), where r(t) = k for t ∈ {1, 2, . . . ,K}.8: else

9: Pick the r(t)th row of BH, bH

r(t) ∈ C1×(M+K), with r(t) ∈ {1, 2, . . . ,K} drawn based on p with Pr(t) =‖br(t)‖

‖B‖2F

10: end if

11: Compute the canonical basis, er(t) ∈ CK , where [er(t)]r(t) = 1 and [er(t)]j = 0, ∀j 6= r(t).

12: Update Lt+1(BH) =

I(M+K) −br(t)b

‖br(t)‖22

Lt(BH) +br(t)e

‖br(t)‖22

13: end for

14: Output: VRZF = [LTKA−1(BH)]1:M,:.

A. Computational Cost: Uplink

Since its inception the idea behind the PARL rKA-based RZF scheme is to detach the solution of

the receive combining matrix onto K low complexity hardware unit points, i.e., each unit point has a

simple, cheap, and dedicated hardware responsible to obtain the receive combining vector relative to its

respective UE through the rKA. From Algorithm 1, it can be observed that most of the computational

load executed by each dedicated hardware is related to the computation of the sampling distribution

p in step 10 and of the residual ηt in step 13. In particular, the computational cost assigned to p is

considered to be the knowledge of all the sample probabilities, Pr(t), in such a way the generation

of some r(t)s comprises of drawing random numbers based on p. This generation can be done in a

distributed or centralized way, as already mentioned. The former stands for a scenario in which each

kth hardware unit is computing its own p and, hence, independently generates its values of r(t); while

the latter denotes a scenario where a central hardware point is quantifying p and afterwards sharing

the draw of r(t)s across the other K hardware units (hardware units are free from computing p). As

each hardware unit has to compute its p, the distributed fashion is said to be the most computationally

heavy. The centralized method, however, can lead to a substantial loss of time, which represents a

fraction of the available coherence interval, by communicating with the central hardware point. We

should therefore expect an impact on performance by adopting the centralized way, especially in

overcrowded scenarios where the value of the loading factor tends to be close to the upper limit

defined here as KM

= 0.5. A computational comparison between both manners of generating r(t)

values is carried out in Section VI-D.

Another key discussion point for the PARL rKA-based RZF scheme is related to the signal recovery

problem described in (5). In view of the fact that each kth unit is actually computing dk and not vk

in Algorithm 1, it is appropriate to perform the following computations in a central hardware point

so as to obtain the signal estimate on a given communication instant:

VPARL = GD, (25a)

s = (VPARL)Hy. (25b)

where (25a) is computed only one time and takes MK2 complex operations, while the signal recovery

in (25b) leads to MK and has to be computed for all τul instants. A centralized fashion is considered

for these computations, because the time spent in obtaining the information needed to perform the

calculations in a distributed or centralized way seems to be approximately equal. A brief note on the

computational complexity of Algorithm 2 is now in order. The linear operator (LO) rKA-based RZF

scheme has the computation of p in step 9 and the update of the LO in step 14 as the main hardware

loads. Different from the PARL approach, Algorithm 2 is essentially co-processed (centralized).

Table I summarizes the obtained complexity measuring functions for rKA-based schemes emulating

the classical RZF strategy, in addition to the canonical results from [2], where both were obtained based

on the computational framework provided by the same work. Remember that this mathematical struc-

ture takes into account only complex multiplications and divisions from elementary matrix-to-matrix

(and vector-to-vector) operations. The reception column entails the computational cost associated with

the signal estimation expressed in (5). Some remarks can then be made regarding the given results. The

computation of the receive combining matrix for Algorithm 1 seems to be the most computationally

light, but this value is very dependable on the number of rKA iterations, TrKA. The PARL rKA-based

RZF scheme shows, however, the greatest reception complexity and the fact that, when considering the

distributed fashion, the calculation of the sampling distribution adds a considerable extra computational

floor. The LO procedure (Algorithm 2) is by far the most costly in computational terms, thus it will

not be considered in the following analysis. It is important to emphasize, however, that this procedure

plays a good way into the design of networks envisioning to apply rKA-based techniques. This is

corroborated by its rate of convergence be fairly similar to the one obtained for the PARL rKA-based

RZF scheme, as we observe through simulation results, and also by the fact that simulation tools, like

Matlab, have powerful built-in functions to well execute matrix operations.

TABLE I

COMPUTATIONAL COMPLEXITY FOR RECEIVE COMBINING SCHEMES PER COHERENCE INTERVAL.

SchemeReceive combining matrix Reception Sampling distribution

Multiplications Divisions Multiplications Multiplications

RZF 3K2M

2+ 3KM

3K τulMK –

ZF 3K2M

3K τulMK –

PARL rKA-based RZF

(Algorithm 1)2MTKA – ⋄τul(MK) +MK2 ◦2MK

LO rKA-based RZF

(Algorithm 2)(K3 + 2K2M + 2K2 +KM2 +

3KM + 2K +M2 + 2M)TKA

– τulMK 2K2 + 2MK

⋄ Reception for Algorithm 1 results from a centralized computational approach (see (25b)). ◦ PARL sampling distribution can be

obtained by either a centralized way (computational cost is ignored) or a distributed way (2M for each of the K hardware units).

B. Computational Cost: Downlink

The computational complexity for the DL phase is composed of two different components: (a)

transmit precoding matrix computations, which are specified in (8), and (b) the transmission stage

that denotes the computational burden related to the computation of x (precoded signal) defined in

Section II-B. In the special case provided by the UL-DL duality, there is a favorable equality among

the computational cost of all the schemes, both canonical and rKA-based. For the latter, it is important

to remember that the receive combining matrix is already available at any given symbol period inside

τc, as demonstrated by the output values of Algorithms 1 (see also (25a)) and 2. The computation of

the precoding matrices consume MK complex operations, whereas the calculation of the precoded

signals takes τdlMK.

VI. NUMERICAL RESULTS

Let us now consider a square-network layout9 to gather some qualitative results on the applicability

of the Kaczmarz’s methodology in an M-MIMO scenario. A single-cell case adopting the parameters

exhibited in Table II is considered throughout the simulations. The cell covers an area of 0.25× 0.25

km2 with a center-BS equipped with M antennas spatially multiplexing K UEs. The UEs are uniformly

distributed inside the cell area with a minimum distance of 35 m to the BS. The average large-scale

fading denoting the pathloss effect for UE k is evaluated as: βk = Γ− 10α log10(dk) [dB]; where dk

is the Euclidean distance between UE k and the BS, Γ is the pathloss constant term at a reference

distance of 1 m, and α is the pathloss exponent. Since otherwise affirmed, the ordinary values of

Γ = −35.3 dB and α = 3.76 are used based on a non-line of sight dense urban scenario [1], [2].

TABLE II

PARAMETERS CONSIDERING A SQUARE-NETWORK LAYOUT FOR A SINGLE-CELL M-MIMO SCENARIO.

Parameter Value

Single-cell area 0.25 × 0.25 km2

BS antennas (M ) 100

# UEs (K) 10; 30; 50

Bandwidth 20 MHz

Receiver noise power −91 dBm

UL transmit power per UE 20 dBm

Antenna correlation factor (r) 0.5

Coherence interval (τc) 200 symbols

Pilot length (τp) K symbols

Pathloss exponent (α) 1.00; 3.76; 4.00

Pathloss constant term (Γ) −35.3 dB

Shadowing standard deviation (σ) 4 dB

Channel estimation LS and MMSE

9The simulation settings considered here are motivated by the numerical results realized in [2].

Canonically, the covariance matrix model for the uncorrelated Rayleigh fading is Rk = βk10f/10IM ,

where f ∼ N (0, σ2) stands for the large-scale fading (shadowing) with zero mean and standard

deviation σ, which is constant to all elements of the array [1], [13]. Motivated by the results obtained

in [11], it is also assumed a spatial correlation model composed by the exponential correlation model

for a uniform linear array (ULA) with large-scale fading variations over the array [14], [15], [16].

The exponential correlation model stands for spatial antenna correlation, while shadowing fluctuations

represent the power variability across the antenna elements caused by the disordered distribution of the

scatters in the environment. One should be emphasized that this last effect is empirically justified by

the conclusions acquired in [14]. That being said, the (m,n)th element of the covariance matrix for the

kth UE under this spatially correlated channel is defined as [Rk]m,n = βkrn−mei(n−m)θk10(fm+fn)/20,

where m,n ∈ {1, . . . ,M}, r ∈ [0, 1] is the antenna correlation factor, and θk is the angle-of-arrival

of the kth UE, which is uniformly distributed in [−π,+π) when considering an horizontal ULA.

f1, . . . , fm, . . . , fM ∼ N (0, σ2) are independent and identically distributed random variables that form

the random fluctuations of the large-scale fading with zero mean and a standard deviation σ. We take

this moment to define the moderate spatial correlation for the model just presented as the scenario in

which r = 0.5 and σ = 4 dB, as considered in [4]. This moderately spatial correlation setting will be

vastly used in the simulations below.

As we are considering a single-cell with mutually orthogonal pilots, the channel estimation process

is observed to be only noise limited. Some important observations about the covariance matrices of

the channel estimates can then be made to further clarify the evaluation carried out in this section.

One of the effects of channel estimation over the eigenvalues of the true channel’s covariance matrix

is generally the reduction of their magnitudes, once all estimate is comprised of error [2]. This error

is seen higher for the least-squares (LS) estimator in comparison to the minimum mean-squared error

(MMSE) estimator, where the difference is due to the use or not of prior statistical information. Another

known key result is that the stronger the eigenvalue associated with a given eigendirection, the easier

it becomes the channel estimate related to this spatial direction [2]. Note that a higher eigenvalue

actually translates in an increase of the SNR, which explains the best channel estimate. Since spatial

correlation impacts the eigenvalues of the channel covariance matrix, where few of them even carry a

large fraction of the power, as shown in [11] for the considered model, spatially correlated channels

are therefore susceptible to a better estimate. For a strong spatial correlation, less eigenvalues have

considerable power, further enhancing the channel estimation related to these spatial directions that

are the most significant for establishing the link between the BS antennas and a UE. However, when

the structure of the channel is not duly exploited, as in the case of the LS estimator, the scaling of the

channel estimate is difficult to be right defined, and thus only the channel direction can be estimated

decently [11].

A. Impacts Detrimental to Sample Probability

As can be seen from step 10 of Algorithm 1, the sample probability quantifies the relative power

of a given UE in relation to the system. Pr(t) is therefore mainly related to the power of the estimated

channels, whereby in our model is also accounting for the pathloss and shadowing effects. One can then

be assumed that this value is very sensitive to large/small-scale fading, channel estimation procedure,

and even impacted by spatial correlation10; being some of these effects illustrated further on.

Fig. 1. CDFs of the sample probability Pr(t) defined in step 10 of Algorithm 1 and averaged over independent small-scale fading

realizations with K

M= 0.1, r = 0.5, and σ = 4 dB. The probabilities were also generated for two different values of α and when

applying the LS/MMSE channel estimators. The true channel case indicates the perfect knowledge of CSI in the BS.

By treating Pr(t) as a random variable, Fig. 1 gives the cumulative distribution functions (CDFs)

of the average sample probability, specified in step 10 of Algorithm 1, for the PARL rKA-based RZF

10The values of M and K influence p as well.

scheme. It is clear to see that Pr(t) is strongly influenced by pathloss, since the CDF curves are shifted

to the left as α grows. This result reaffirms the well-known fact that edge-UEs are condemned to

have a minor level of power than the center-UEs; ergo, the former also own minor values of Pr(t)

that become more and more distant from those obtained for center-UEs as α increases. Note that the

channel estimate has a reasonable impact as well. The curves for the MMSE estimator are more to

the left for α = 4, thus showing that this estimator can obtain smaller values of Pr(t) in comparison

to LS. This is a direct consequence of the fact that the MMSE channel estimator works well under

low SNR. When using the MMSE, there is a small deviation to the right of the spatially correlated

curves in relation to the uncorrelated case that can be explained by the improvement of the channel

estimation which was in turn made possible by the moderate level of spatial correlation. Recall that this

positive result is only partly obtained for the LS estimator and therefore the negative effect of setting

a reasonable estimate for the power scaling factor becomes more evident in this case. In summary,

one can conclude that the edge-UEs may have very low values of Pr(t) when compared to center-UEs

in scenarios of practical interest. This fact hinders the choice of the equations attached to the edge-

UEs, as can be seen in step 10 of Algorithm 1, making it impossible to perform the rKA’s update

schedule correctly. As a result, the genuine rKA applied to this more practical context has a restricted

performance.

To illustrate the consequences of poorly scaled sampling distribution on performance, Fig. 2 exhibits,

as a function of the number of rKA iterations and for different channel estimates, the average UL SE

per UE achieved for the then equivalent PARL rKA-based RZF scheme secondarily conceived in [5]

and the proposed here by means of the solution presented in Algorithm 1. We stress that the former

does not deem with the practical effect visualized in Fig. 1, whereas our proposed methodology takes

into account these harmful differences observed in Pr(t) among the center- and edge-UEs through

a fair forced initialization process. It is clearly visible that the proposed scheme has surpassed its

correspondent one [5], since our proposal always converges faster to the reference bounds given by

the RZF SE. Besides that, the average UL SEs obtained are in agreement with the results of [11],

where the authors have adopted the same spatial correlation model. The superiority in performance

of spatially correlated channels stems from the fact that the spatial correlation arising from antenna

correlation and unequal contribution of the antennas is, on average, corroborative to the SE bound

defined in (1) following the effects on the channel estimation explained above. In addition, it is

important to keep in mind that the use-and-forget technique, used to define (1), is very sensitive to

Fig. 2. Average UL SE per UE as a function of TrKA for the canonical RZF combining and its emulations performed by the PARL

rKA-based schemes. The first PARL algorithm follows the ideas of [5] and does not have an initialization (abbreviated as init.) procedure;

the other proposed (abbreviated as prop.) by us in Algorithm 1 has a hybrid initialization that guarantees the convergence of all receive

combining vectors. Three different channel estimates are considered: (a) idealistic case where the true channel is known at the receiver,

(b) LS channel estimates, and (c) MMSE channel estimates. The scenario under evaluation consists of K

M= 0.1, r = 0.5, and σ = 4

the assurance of channel hardening, which, in turn, is dramatically impacted by the spatial correlation

phenomenon, as reported in [2].

In conclusion, large-scale effects have a noticeable impact on performance and their concern in

the design of the rKA-based schemes applied to M-MIMO allows better results. We emphasize that

the improvements achieved by the proposed algorithm does not imply any increase in computational

complexity compared to that based on the original idea where our proposal also appeals because of its

broader applicability. Some other inferences can be thus pointed out from the results obtained here.

As the fact that when the true channel is known on the receiver side, it seems necessary more rKA

iterations than the case of MMSE channel estimates to follow the boundaries of SE, which in turn

needs more rKA iterations than the worst channel estimation case constituted of LS channel estimates.

This and other effects are better characterized in the sequel.

B. Rate of Convergence: Number of Iterations Required

Our concern now is to define a notion of convergence based on the number of rKA iterations

required to reach a predefined error limit by considering the canonical RZF SE as the true value.

This will give us more insight into how the algorithm converges and will also help us establish

a way to evaluate in the sequel its computational complexity. The gap in percentage between the

SEs of the canonical RZF scheme and of its emulation performed through Algorithm 1 is shown in

Fig. 3 as a function of the average number of rKA iterations for uncorrelated/moderately correlated

scenarios, different channel estimates, and different loading factor values. Starting with the last, the

figure shows that a better convergence is obtained for smaller values of KM

, as expected from earliest

ideas conceived from Remark 2. This is mainly due to the expansion of the solution subspace when

solving the SLE that increases the interference among UEs. A puzzling result is then observed from

Fig. 3 which embraces the perception that the better the CSI is known at the receiver, the worse the

algorithm convergence. The reason why this occurs comes from Remark 1, in which it was observed

that a larger value of ξ tends to help the algorithm convergence. In fact, as better is the channel

estimation, the better is also the characterization of power for the estimated channels which in turn

lessens the average value of ξ. Estimating the channels with a more naive approach therefore improves

the convergence of Algorithm 1, however, one should bear in mind that the achievable average SE is

also diminished. We also note that better convergence results are obtained for the uncorrelated case

in comparison to moderately spatially correlated channels. To explain this, we must again return to

Remark 1, where we observe that the average gain is severely impacted by λmin(GHG). Since the

fraction of eigenvalues that carries most of the power is decreased, the spatial correlation imposes

smaller values of λmin(GHG), thus worsening the convergence of the algorithm. The penalty brought

by the considered spatial correlation model is best detailed in the next section. Overall, the different

effects indicated here are consistent with the theory developed in the paper and the results discussed

in [5], but here with a more comprehensive point of view.

Recalling that the BS is equipped with simple, cheap hardware, we now focus on getting the

average number of rKA iterations necessary to achieve a defined error tolerance for the most realistic

implementation scenario in which LS channel estimation is most likely to be employed due to its

computational simplicity [11], [14]11. For loading factors equal to 0.1, 0.3, and 0.5, respectively, the

average numbers of rKA iterations needed to reach an error of 10% are roughly: 93, 589, and 1799

for uncorrelated channels, and 95, 655, and 1983 for moderately correlated channels. While an error

11The MMSE estimator has been used so far, only to provide insights as the ideal case of channel estimation. It goes, however,

against the placed paradigm, which considers a BS equipped with simple hardware that is unable to estimate the second moment of

channel responses.

of 1% demands 293, 1815, and 4960 for uncorrelated channels, and 95, 655, and 1983 for moderately

correlated channels, both with the same order in relation to the loading factors12.

Fig. 3. Percentage gap between the UL SEs of the canonical RZF scheme and its correspondent emulation performed through the

PARL rKA-based scheme (Algorithm 1) as a function of the average number of rKA iterations with M = 100, r = 0.5, and σ = 4 dB.

C. Rate of Convergence: Spatial Correlation Effects

Before moving on to complexity analysis, we set out to better understand how the considered spatial

correlation model impacts the algorithm convergence. To this end, the results of Fig. 3 were extended13,

as can be seen in Fig. 4, by considering both r and σ dimensions for KM

= 0.1. When r is varying and

σ = 0 dB, although there are small convergence improvements in very particular regions, the general

behavior tells us that the antenna spatial correlation is detrimental to the performance of the PARL

rKA-based RZF scheme. The same result is completely obtained in the opposite case wherein the

12These values were obtained through a linear interpolation between the points of the curves in Fig. 3, wherein an interval of 100

average number of iterations was adopted between neighboring points. We also highlight that the exposed data were rounded towards

the nearest upper integer value.

13For the sake of clarity, we have omitted the true channel case; but we stress that the obtained responses are still valid under this

context.

spatial correlation arising from the environment disorder is considered; confirming thus the findings

obtained previously. Fig. 3 also corroborates the expansion of the idea that better channel estimates

deteriorate the convergence capacity of Algorithm 1. Again, the reason for the results about spatial

correlation effects proceeds from, strictly speaking, the relevant changes in the eigenstructure of G

inferred by the degree of spatial correlation that depends on the values of r and σ (see Remark 1).

(a) Percentage gap as a function of TrKA and r with σ = 0 dB. (b) Percentage gap as a function of TrKA and σ with r = 0.

Fig. 4. Percentage gap between the SEs of the canonical RZF scheme and its correspondent emulation performed through the PARL

rKA-based scheme (Algorithm 1) as a function of the average number of rKA iterations (TrKA) with K

M= 0.1. The graph also considers

the variation of the gap in: (a) the antenna correlation r dimension and (b) shadowing standard deviation σ dimension.

D. Computational Complexity: Canonical RZF Scheme vs PARL rKA-Based RZF Scheme

By utilizing the average number of rKA iterations acquired in Section VI-B for the scenario of

interest aforementioned, which embodies moderately spatially correlated channels and LS channel

estimation, we are now in position to analyze, computationally, both the canonical RZF scheme and

the PARL rKA-based RZF scheme; where, it is equally important to recall that Algorithm 1 can

be implemented in a distributed or a centralized way. That being the case, once the computational

complexities for DL are the same (see Section V), Table III captures the trade-off between the tolerated

error in average performance and the average computational cost (CC) based solely on Table I (UL

regime) for both ways of implementing the PARL rKA-based RZF scheme and using RZF as cost

reference. We define and also show in Table III an auxiliary metric called the reference ratio (RR) as

the percentage between the CC of canonical and proposed schemes when we consider the classical

TABLE III

AVERAGE COMPUTATIONAL COMPLEXITY COST AND REFERENCE RATIOS FOR THE CANONICAL AND PROPOSED SCHEME.

Scheme

PARL rKA-Based RZF Scheme

Error Bound of 10% Error Bound of 1%

Distributed Centralized Distributed Centralized

CCAverage CC

(uncorr./corr.)

RR [%]

(uncorr./corr.)

Average CC

(uncorr./corr.)

RR [%]

(uncorr./corr.)

Average CC

(uncorr./corr.)

RR [%]

(uncorr./corr.)

Average CC

(uncorr./corr.)

RR [%]

(uncorr./corr.)

0.1 16840 20600/21000 22.33/24.70 18600/19000 10.45/12.83 60600/68600 259.86/307.36 58600/66600 247.98/295.49

0.3 148520 123800/137000 −16.64/ − 7.76 117800/131000 −20.68/ − 11.80 369000/386600 148.45/160.30 363000/380600 144.41/156.26

0.5 424200 369800/406600 −12.82/ − 4.15 359800/396600 −15.18/ − 6.51 1002000/1022400 136.21/141.02 992000/1012400 133.85/138.66

CC - Computational Cost; RR - Reference Ratio.

strategy as the reference value. Given that positive values of RR hence denote a computational loss

of Algorithm 1, negative values are sought to answer if the Kaczmarz’s methodology is a strong

candidate to reduce canonical computational complexity, declared as one of the main goals of this

paper. The results in Table III importantly say that substantial computational gains are obtained for an

error bound of 10% with KM

= 0.3 (moderately crowded) and KM

= 0.5 (overcrowded), and under both

channel types. Note that the price to pay for computational benefits is an expected decrease in average

performance of 10%. On the other hand, when we subject Algorithm 1 for a better operation, that is, we

reduce the error bound to 1%, no computational gains are seen in favor of the PARL rKA-based RZF

scheme. Algorithm 1 can therefore markedly reduce the computational complexity of the RZF classical

scheme, but it should never resemble the canonical achievable SE with an efficient leveling solution

between performance and computational complexity. One can lastly observe that the centralized way

of executing the proposed scheme is more computationally light than the distributed approach. We

stress, however, that the centralization process causes some expected losses in performance that are

not being taken into account.

VII. CONCLUSIONS

In this paper, we revisit the solution to the problem of computational relaxation for canonical

signal processing schemes, given the recently proposed strategy that adopts premises based on the

KA/rKA as a relevant attempt to avoid any prior statistical knowledge of the channels. As a first

contribution, we have re-interpreted the original proposal aiming to derive the procedure of obtaining

the combining/precoding matrices through rKA, thus generalizing the paradigm for scenarios with

higher mobility and looking for the derivation of new insights. Some modifications in the original

rKA-based schemes were then proposed, relying on a more pragmatic novelty shown here that large-

scale effects are expected to degrade the algorithm performance. Specially, we have proposed a hybrid

initialization approach for the rKA-based schemes that has numerically outperformed the methods

available in literature, since it assures a faster average convergence of rKA under uncorrelated and

correlated fading scenarios, and for whatever channel estimation procedure is used. Subsequently,

a computational complexity framework of the two different rKA-based RZF schemes was suitably

derived underlying matrix-to-matrix (vector-to-vector) multiplications/divisions. On one hand, we nu-

merically shown that, if a performance loss of 10% is negligible, the most promising rKA-based scheme

can substantially reduce the computational complexity of the canonical RZF scheme for moderately

crowded and overcrowded scenarios in which a BS with simple and cheap hardware is employed.

On the other hand, the proposed scheme must not resemble the performance results of the canonical

RZF strategy with a computationally-efficient solution. Still, the theoretical characterization of the

convergence function with respect to the different system parameters would be interesting to better

understand the applicability of the proposed rKA-based schemes. This is indeed left as a potential

avenue for future research.

APPENDIX A: USEFUL RESULTS

The following definitions are considering an SLE in its canonical form Ax = b, where A ∈ Cm×n

is the matrix of the constant coefficients, x ∈ Cn is the vector of unknowns, and b ∈ Cm is the vector

of known offset coefficients. The subspace generated by the columns of A is denoted as X ⊂ Cn.

This SLE is solved via rKA in which the random row selection of A is denoted as a random variable

R ∈ {1, 2, . . . , m} and a specific random row picked at iteration t is z ∈ {1, 2, . . . , m}. It is assumed

that z has a generic sample probability given as Pz ∈ [0, 1], whereby it is possible to define a probability

vector as p = (P1, . . . , Pm) ∈ Rm+ . An remarkable property is that

∑mz=1 Pz = 1.

Definition 1. Random Rank-1 Projection Operator. The random rank-1 projection operator of a random

chosen row R is [5]: PR := (aRaHR)/‖aR‖22 , where PR is an n× n matrix.

Definition 2. Average Random Rank-1 Projection Operator. From Definition 1, it is possible to specify

the average projection operator as [5]: P := E {PR} =∑m

z=1 Pz(azaHz )/‖az‖22, where the expectation

is w.r.t. R. Thus, P is an n× n positive-semidefinite matrix with tr(P) = ∑mz=1 Pz = 1.

Definition 3. Average Gain. The average gain is a suitable metric that measures the exploitation

provided by the average projection operator over X seeking obtain x, given as [5]: κX (A,p) :=

minϑ∈X ,ϑ6=0(ϑHPϑ)/‖ϑ‖22, where one observes that κX relies on A and p. Besides, it is possible to

infer that κX (A,p) ∈ [0, 1min{m,n}

]. This last result is a consequence of tr(P) = 1, and of the subspace

X has its dimension generally determined by min{m,n}.

APPENDIX B: PROOF OF THEOREM 1

Using Definition 2 and (11), the average projection operator of BH is [5]

P =m∑

r(t)‖22‖B‖2F

r(t)br(t)

‖br(t)‖22=

‖B‖2F. (26)

From Definition 3 and (26), the average gain provided by BH can be obtained as [5]

κX (BH) = min

ϑ∈X ,ϑ6=0

‖BHϑ‖22‖BH‖2F‖ϑ‖22

(a)= min

ε∈CK ,ε6=0

‖BHBε‖22‖BH‖2F‖Bε‖22

=λmin (B

‖B‖2F, (27)

wherein (a) it was used the fact that any vector in X ⊂ CM+K , which is the subspace generated by the

columns of BH, can be written as Bε with ε ∈ CK , as made in [5]. Finally, the average error of the

first, deterministic iteration of Algorithm 1 can be acquired through Definition 1: Pk = (bHkbk)/‖bk‖22.

As k is deterministic, P = Pk and then, completing the proof, the average gain provided by the first

iteration is

κ1X (B

H) = minϑ∈X ,ϑ6=0

ϑHPϑ‖ϑ‖22

= minϑ∈X ,ϑ6=0

‖bHkϑ‖22

‖bk‖22‖ϑ‖22. (28)

REFERENCES

[1] T. L. Marzetta, E. G. Larsson, H. Yang, and H. Q. Ngo, Fundamentals of Massive MIMO. Cambridge, UK: Cambridge University

Press, 2016.

[2] E. Bjornson, J. Hoydis, and L. Sanguinetti, Massive MIMO Networks: Spectral, Energy, and Hardware Efficiency. Delft, The

Netherlands: Now Publishers, 2017.

[3] A. Kammoun, A. Muller, E. Bjornson, and M. Debbah, “Linear Precoding Based on Polynomial Expansion: Large-Scale Multi-Cell

MIMO Systems,” IEEE Journal of Selected Topics in Signal Processing, vol. 8, no. 5, pp. 861 – 875, 2014.

[4] E. Bjornson, L. Sanguinetti, and M. Debbah, “Massive MIMO with Imperfect Channel Covariance Information,” in 2016 50th

Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 2016, pp. 974–978.

[5] M. N. Boroujerdi, S. Haghighatshoar, and G. Caire, “Low-Complexity Statistically Robust Precoder/Detector Computation for

Massive MIMO Systems,” IEEE Transactions on Wireless Communications, vol. 17, no. 10, pp. 6516–6530, 2018.

[6] S. Kaczmarz, “Angenaherte Auflosung von Systemen linearer Gleichungen,” Bulletin International de l’Academie Polonaise des

Sciences et des Lettres. Classe des Sciences Mathematiques et Naturelles. Serie A, Sciences Mathematiques, vol. 35, pp. 355–357,

[7] D. Needell, N. Srebro, and R. Ward, “Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm,”

Advances in Neural Information Processing Systems, pp. 1017–1025, 2014.

[8] Y. Chi and Y. M. Lu, “Kaczmarz Method for Solving Quadratic Equations,” IEEE Signal Processing Letters, vol. 23, no. 9, pp.

1183–1187, 2016.

[9] T. Strohmer and R. Vershynin, A Randomized Solver for Linear Systems with Exponential Convergence. Springer, Berlin,

Heidelberg, 2006, vol. 4110.

[10] V. Croisfelt Rodrigues, J. C. Marinello Filho, and T. Abrao, “Kaczmarz Precoding and Detection for Massive MIMO Systems,”

in Accepted on: IEEE Wireless Communications and Networking Conference, Marrakech, Morroco, 2019.

[11] E. Bjornson, J. Hoydis, and L. Sanguinetti, “Massive MIMO Has Unlimited Capacity,” IEEE Transactions on Wireless

Communications, vol. 17, no. 1, pp. 574–590, 2018.

[12] L. Dai, M. Soltanalian, and K. Pelckmans, “On the Randomized Kaczmarz Algorithm,” IEEE Signal Processing Letters, vol. 21,

no. 3, pp. 330–333, 2014.

[13] T. L. Marzetta, “Noncooperative cellular wireless with unlimited numbers of base station antennas,” IEEE Transactions on Wireless

Communications, vol. 9, no. 11, pp. 3590–3600, 2010.

[14] X. Gao, O. Edfors, F. Tufvesson, and E. G. Larsson, “Massive MIMO in Real Propagation Environments: Do All Antennas

Contribute Equally?” IEEE Transactions on Communications, vol. 63, no. 11, pp. 3917–3928, 2015.

[15] X. Gao, O. Edfors, F. Rusek, and F. Tufvesson, “Massive MIMO Performance Evaluation Based on Measured Propagation Data,”

IEEE Transactions on Wireless Communications, vol. 14, no. 7, pp. 3899–3911, 2015.

[16] S. Loyka, “Channel Capacity of MIMO Architecture Using the Exponential Correlation Matrix,” IEEE Communications Letters,

vol. 5, no. 9, pp. 369–371, 2001.

on kaczmarz signal processing technique in massive mimo

Documents

the road to massive mimo & mmwave mobile...

massive mimo systems at millimeter-wave

distributed massive mimo: a diversity combining method for...

5g massive mimo and beamforming - mpirical.com

massive mimo: fundamentals and system designs

channel models for massive mimo

two-layer linear processing for massive mimo on the ... ·...

mmwave massive mimo - gbv

comparing massive mimo and mmwave...

massive-mimo meets hetnet: interference coordination...

massive mimo

on the total energy efficiency of cell-free massive mimo ·...

comparing massive mimo and mmwave mimo - ieee...

precoding and massive mimo - tohoku university official ......

small massive mimo compendium

non-coherent large scale mimo: massive but feasible · pdf...

simulation of beamforming by massive mimo antennas...

ultra massive mimo - 6g channel

machine learning for massive mimo communications

nutaq's titanmimo massive mimo testbeds