spatial dct-based channel estimation in multi-antenna multi-cell … · 2015-01-12 · 1 spatial...

1

Spatial DCT-Based Channel Estimation inMulti-Antenna Multi-Cell Interference Channels

Maha Alodeh, Student Member, IEEE, Symeon Chatzinotas, Senior Member, IEEE,Bjorn Ottersten, Fellow Member, IEEE

Abstract—This work addresses channel estimation in multipleantenna multicell interference-limited networks. Channel stateinformation (CSI) acquisition is vital for interference mitigation.Wireless networks often suffer from multicell interference, whichcan be mitigated by deploying beamforming to spatially directthe transmissions. The accuracy of the estimated CSI playsan important role in designing accurate beamformers that cancontrol the amount of interference created from simultaneousspatial transmissions to mobile users. Therefore, a new techniquebased on the structure of the spatial covariance matrix and thediscrete cosine transform (DCT) is proposed to enhance channelestimation in the presence of interference. Bayesian estimationand Least Squares estimation frameworks are introduced byutilizing the DCT to separate the overlapping spatial paths thatcreate the interference. The spatial domain is thus exploited tomitigate the contamination which is able to discriminate acrossinterfering users. Gains over conventional channel estimationtechniques are presented in our simulations which are also validfor a small number of antennas.Index Terms—Channel estimation, training sequence contamina-tion, discrete cosine transform, second order statistics.

I. INTRODUCTION

Interference is the most critical factor for designing andscaling wireless networks, as it leads to the spectrum scarcity-congestion problem. Moreover, the design paradigm for cel-lular networks has been shifted from partial frequency reuseto full frequency reuse enhancing the spectrum utilization, andthus making the problem of interference more acute. Therefore,the design of future networks will require collaborating basestations to jointly serve their users or to smartly mitigatethe interference. This can be enabled by exchanging datainformation and channel state information (CSI). These designshave received much attention in the literature, but their maindrawback is the requirement of backhaul exchange for users’data, which requires major upgrades of current infrastructureespecially when CSI changes rapidly [1]- [3]. In an effort totackle the interference issue, without data sharing over thebackhaul network, various coordination techniques has beenproposed to handle the limited data exchange scenarios; for

Copyright (c) 2014 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposes must beobtained from the IEEE by sending a request to [email protected] Alodeh, Symeon Chantzinotas and Bjorn Ottersten are with In-terdisciplinary Centre for Security Reliability and Trust (SnT) at theUniversity of Luxembourg, Luxembourg. E-mails:{ [email protected],symeon.chatzinotas @uni.lu and [email protected]}. B. Ottersten isalso with Royal Institute of Technology (KTH), Stockholm, Sweden. E-mail{[email protected]}.This work is supported by Fond National de la Recherche Luxembourg (FNR)projects, project ID:4919957.

example the authors in [1] exploit the availability of CSI atbase stations to design precoding techniques that minimize theinterference created by BS transmissions by maximizing thesignal to leakage noise ratio (i.e. virtual signal to interferencenoise ratio), while the work in [3] investigates zero forcingbeamforming in a coordinated multicell environment. Deploy-ing these techniques requires accurate channel state informa-tion to design the suitable transmit and receive beamforming.

The CSI acquisition techniques can be categorized intofeedback and reciprocity techniques. In the feedback systems, atraining sequence is broadcasted by the BS which is measuredby users and a limited feedback link is considered from theusers to the base station. In [4]- [7] such a mode is considered.In [4], the authors have proved that in order to achieve fullmultiplexing gain in the MIMO downlink channel in the highsignal to noise ratio (SNR) regime, the required feedback rateper user grows linearly with the SNR (in dB). The main resultin [5] is that the extent of CSI feedback can be reduced byexploiting multi-user diversity. While in [6], it is shown thatnon-random vector quantizers can significantly increase theMIMO downlink throughput. In [5], the authors design a jointCSI quantization, beamforming and scheduling algorithm toattain optimal throughput scaling. Authors in [7] present aninvestigation of how many feedback bits per user are necessaryto maintain the optimal multiplexing gain in a K-cell MIMOinterference channel. In the second mode, the distinguishingfeature of such systems is the concept of reciprocity, wherethe uplink channel is utilized as an estimate of the downlinkchannel reducing the feedback requirements. This is one ofthe main advantages of a TDD architecture in low mobilityscenarios [8], as it eliminates the need for feedback, and jointuplink training combined with the reciprocity of the wirelessmedium are sufficient to estimate the desired CSI. TDD hasthe advantage over direct feedback since the users’ terminaldo not need to estimate their own channel. In TDD systems,the base station estimates the channel state information (CSI)based on uplink training sequences over the same frequencyband, and then uses it to generate the beamforming vector inthe downlink transmissions [9]- [11].

CSI is typically acquired by exploiting finite-length trainingsequences in the presence of inter-cell interference. There-fore, in a synchronous setting, the training sequences fromneighboring cells would contaminate each other. While in anasynchronous setting, the training sequences are contaminatedby the downlink data transmissions. Recently, the problem ofnon-orthogonality of training sequences has been thoroughly

2

investigated [14]- [20]. It is pointed out in [14] that trainingsequence contamination presents a huge challenge for perfor-mance and a robust precoding technique is proposed to handlethis kind of interference. Specifically, it is shown that trainingsequence contamination effects [16]- [17], [20]- [21] (i.e., thereuse of non-orthogonal training sequences across interferingcells) causes the interference rejection performance to quicklysaturate with the number of antennas, thereby undermining thevalue of MIMO systems in cellular networks [16].

The problem of training sequence contamination can occurin two different scenarios:

• Inter-cell training sequence contamination: presents themost common scenario in which the users in the samecell utilize orthogonal training sequences while this set isreused in other cells.

• Intra-cell training sequence contamination: most researchignores this kind of contamination as they assume notraining sequence reuse within the same cell and thetraining sequences are orthogonal. However, it is worthinvestigating the scenario in which the number of simul-taneously tackled transmissions in single cell is greaterthan number of training sequences [23].

Although we mention two types of contamination, handling thecontamination in both scenarios is the same. We focus on thefirst type of contamination in this work. Various contaminationmitigation techniques have been investigated in the literature; anovel asynchronous training sequence transmission is proposedin [15], where different time slots are allocated to the userswho utilize the same training sequence in different cells.The channel estimates are decontaminated from the downlinkinterference by adding additional antennas to estimate andcancel the interference. On the other hand, the effect of pilotcontamination is thoroughly studied in the literature and beencharacterized in [16]- [17] using large system analysis fordifferent channels. Training sequence allocation techniquesare proposed in [21]- [22], to find the optimal set of usersthat simultaneously utilizes the same training sequence. Theallocation schemes vary according to the considered scenario.The result in [21] depends on the channel power and strongestinterferer while the work in [22] relies on second orderstatistics to pick the users who have the lowest overlap in theirsubspaces.

In this paper, we tackle the problem of training sequencecontamination in correlated multiple user single input multipleoutput (MU-SIMO) in the uplink of TDD scenarios. Althoughthe multiple antennas at users’ terminal provide additionaldegrees of freedom, they present a source of additional com-plexity from practical perspective which is preferable to betackled at the BS. Moreover, the assumption of exploitingsingle antenna at the users’ terminal can be motivated bythe fact that multiplexing multiple streams per users is lessbeneficial than sending single stream exploiting receive com-bining techniques [13]. Based on this observation, we assumea uniform linear array (ULA) at the base station and exploit theembedded information in second order statistics about users’favorable and unfavorable directions.

From a different perspective, orthogonal transformations areexploited in the literature to virtually represent the informationin a different domain which simplifies the analysis. In [24],a channel modelling problem for a single user multiple inputmultiple output (SU-MIMO) is investigated using orthogonaltransformations to provide a geometric interpretation of thescattering environment. This virtual transformation reveals twoimportant aspects: the number of parallel channels and the levelof diversity and clarify their impact on capacity calculations.In this paper, we exploit the orthogonal transformations’ ca-pability of virtually converting the information into a differentdomain and based on this we propose several signal process-ing algorithms. One of the orthogonal transformations thathas appealing characteristics is the discrete cosine transform(DCT). DCT is utilized to compress the dimensions used forchannel estimation, thus facilitating interference mitigation atthe estimation process. The contributions of this paper can besummarized as

• Improving the contamination mitigation performance ofthe traditional Bayesian estimation (BE) at the estimationstep. The authors in [22] exploit BE to mitigate thecontamination and they analyzed its performance usinglarge system analysis. In this work, we combine BEwith DCT in different algorithms to decontaminate thetraining sequences at the estimation process. Moreover,a modification is employed on traditional BE to furtherimprove the contamination rejection from the target esti-mate. Numerically, the proposed algorithms are shown tooutperform the estimation algorithm proposed in [22].

• Traditionally, least squares estimation (LS) is not capableof discriminating the interference. We apply LS in theDCT domain to mitigate the contamination at the estima-tion process by exploiting DCT characteristics.

• Training sequence allocation techniques are proposed tofind the optimal set of users that utilize the same trainingsequences. In an effort to utilize the multiuser diver-sity concept as in [21]- [22], we propose joint trainingsequence allocation and DCT compression to combinethe benefit of both schemes. These allocation techniquesfurther suit the nature of enhanced LS and BE therebyoutperforming the traditional techniques.

A. Notation

The adopted notations in the paper are as follows: weuse uppercase and lowercase boldface to denote matrices andvectors. Specifically, IK denotes the K × K identity matrix.Let XT , X∗ and XH denote the transpose, conjugate, andconjugate transpose of a matrix X respectively. E refers tothe expectation, ‖ · ‖ denotes the Frobenius norm, and ‖ · ‖0denotes the zero norm. The Kronecker product of two matricesX and Y is denoted by X ⊗Y. The notation used for , isused for definitions. 1K×K , 0K×K are the matrices of all onesand zeros with size K ×K respectively, � denotes Hadamardproduct and [A(k, n)] is (k, n)th element of matrix A. Lettr(X), vec(X) denote the trace operation,and the columnvector obtained by stacking the columns of X. CN (a,R) isused to denote circularly symmetric complex Gaussian random

3

vectors, which has the mean a and the covariance matrix R.Finally, ∪ is the union of sets and f(x) denotes a function ofx.

II. SYSTEM MODEL

Our model consists of a network of C time-synchronizedcells with full spectrum reuse, each one of the cells serves Lusers. Estimation of flat block fading, narrow band channelsin the uplink is considered, and all the base stations areequipped with an M -element uniform linear array (ULA) ofantennas. We assume that the training sequences, of length τsymbols, used by single-antenna users in the same cell aremutually orthogonal. However, training sequences are reusedin a multicell environment from cell to cell. The trainingsequences used for estimating the user channels are denotedby si , [si1 . . . siτ ]T ∈ Cτ×1. The training sequencesymbols are normalized such that {|sij |2 = P

τ ,∀j ∈ τ},where P is the total training sequence power. Channel vectorsare assumed to be CM×1 Rayleigh fading with correlationdue to the finite multipath angle spread seen from the basestation side, the lth user’s channel towards cth BS is given byhlc ∼ CN (0, αlcRlc) ∈ CM×1, where αlc is the attenuationfrom the lth user to cth BS. We denote the channel covariancematrix Rlc ∈ CM×M as Rlc = E[hlch

Hlc ].

Considering the transmission of si sequence, the M × τsignal baseband symbols sampled at the cth target base stationis

Yc =∑i

∑∀l∈Ki

hlcsTi + Nc. (1)

where Ki is the set of users who use the training sequence si.Nc ∈ CM×τ is the spatially and temporally white complexadditive Gaussian noise (AWGN) with element-wise varianceσ2. We define a training matrix Si = si ⊗ IM , such thatSHi Si = τIM . Then, the received training signal at the targetbase station can be expressed as

yc = vec(Yc) =∑i

Si

C∑∀l∈Ki

hlc + nc (2)

where nc ∈ CMτ×1 = vec(N) is the sampled noise at thecth BS. Since the pilots are orthogonal sHi sj = 0, this makesSHi Sj = 0M×M . Due to orthogonality between differentsequences, the sampled signal resulted from the transmissionsof the ith training sequence at cth BS can be isolated fromother training sequences and can be expressed as

yc = Si∑∀l∈Ki

hlc + nc (3)

For the sake of simplicity and without lack of generalization,we drop the training sequence index and assume that a singletraining sequence is used over the network, which makes thebaseband signal sampled at cth base station as

yc = S

C∑l=1

hlc + nc (4)

where l = 1, . . . , C denotes the users transmitting the trainingsequence s. Furthermore, we assume that there is time syn-chronization in the system for coherent uplink transmissions.

A. Channel model

We consider a uniform linear array (ULA) whose responsevector can be expressed as

a(ω) = [1 e−jω . . . e−j(M−1)ω]T (5)

where ω = 2πd sin θλ , d is the antenna spacing at the base station,

λ is the signal wavelength and θ is angle of arrival of singlepath. The received signal at the base station can be expressedas a multipath model utilizing the response array vector as

hlc =

Q∑i=1

γia(ωi) (6)

where γi is complex random gain factor, θi is the angle ofthe arrival of the ith path, Q is the number of paths. Weadopt a generic Toeplitz correlation model as it is the suitablemodel for implementing the correlation from theoretical [25]and practical perspectives [29]. The two generic correlationtypes have a generic Toeplitz structure.

In order to analytically study the performance of the pro-posed technique, we deal with a simplified exponential cor-relation model due to its mathematical tractability [25]. Thecorrelation structure of Rlc can be formulated as following

1) Exponential Correlation [25]: It is known that exponen-tial correlation matrix is a special case of Toeplitz and it is oftenused for ULA system, and it has the following formulation

[R(i, j)] =

{ρ|i−j|, i > j

ρ|i−j|∗, i < j(7)

where ρ ∈ C, |ρ| ≤ 1. This kind of correlation is suitable fortheoretical analysis, and it will be used in the next sections tostudy the benefits of the proposed framework.

2) Practical Correlation [29]: In order to approximate thepractical correlation, the received signal at the base station canbe implemented as limited memoryless multipath model withsingle tap utilizing the response array vector as (6). For a mul-tipath scattering confined to a relatively small angular spreadseen from the base station. A general correlation structure canbe well approximated by

R ≈ DaBDHa

where σω = 2π dλσθ cos θ, Da = diag[a(ω)]. σθ is the standarddeviation of the angular spread.

B depends on the angular spread of the multipath compo-nents. The angular distribution is Gaussian ω ∈ N (0, σω), itcan be written as

[B(m,n)] ' exp(((m− n)

√3δω)2

2). (8)

For ω uniformly distributed over [−δω, δω], it has the fol-lowing structure

[B(m,n)] ' sin((m− n)δω)

(m− n)δω. (9)

and σω =√

3δω . We adopt the both correlation models:exponential and practical to test the efficiency of the proposedalgorithms. From the previous correlation expressions, it canbe argued that each user in the cell has a different covariance

4

matrix due to its position as the covariance matrix is a functionof the angular spread and its corresponding distribution.

The covariance information of the target users and in-terfering users can be acquired exploiting resource blockswhere the desired user and interference users are known to beassigned training sequences at different times. Alternatively,this information can be obtained using the knowledge of theapproximate users’ positions and the type of the angular spreadat BS side exploiting the correlation equations (8)-(9).

III. DCT FOR SPATIAL COMPRESSION

The optimal transform that decorrelates the signal isKarhunen-Loeve Transform (KLT) which requires significantcomputational resources [27]. Therefore, fixed transforms arepreferred in many applications. In contrast to data-dependenttransform KLT, DCT is a fixed transform that does not dependon the data structure and has excellent energy compactionproperties that perform very close to KLT. Although thereare several fixed transform techniques (i.e. FFT, DFT), DCTis more useful in the context of this paper since it hashigher compression efficiency in comparison with the othertechniques. DCT is a technique for converting a signal intoelementary frequency components. It transforms the signalfrom time domain to frequency domain. Most of the signalinformation tends to be concentrated in a few components ofthe DCT if the information is correlated [26]. Therefore, thesignal can be compressed by keeping the important frequenciesand truncating the least influential ones without impacting thequality of estimate. DCT’s qualities motivate its utilizationin time-domain estimation in OFDM-MIMO to reduce theborder effect owing to its capacity to reduce the high frequencycomponents in the transform domain [28]. In this work, we donot tackle the multicarrier OFDM, we utilize the DCT to handlethe interference at the estimation process in multiple antennamulticell systems.

The compression capability of DCT1 is depicted in Fig. 1 fora covariance matrix as in (7) with ρ = 0.9. The signal energyis condensed in the first few spatial frequencies, and it can benoticed that 93% of signal is compressed in the first four coef-ficients. The DCT capability compression helps in categorizingthe important and the unimportant spatial frequencies in orderto concentrate the contamination in fewer dimensions. In orderto evaluate the effectiveness of DCT compaction property, westudy the correlation models mentioned in the system modelsection.

In order to study numerically the compression capabilityof DCT, we note that R spans a set of eigenspaces whichdetermines the directions of the transmissions. Each eigenspaceacquires certain power, which is known as the eigenvalueof this space. In this direction, we use the metric of thestandard condition number (SCN), which is defined as the ratioof the maximum eigenvalue to the minimum eigenvalue anddenoted by χ(R), before and after DCT has been applied.The intuition behind using the SCN is to measure the dynamic

1See eq. (12)- (14) for more details about DCT.

range of eigenvalues which is representative of the compres-sion level. This can be demonstrated the reduction of SCN,since preserving a small number of strong eigenvalues meansthat the signal space can be effectively represented by fewereigenvectors. Thus, the same energy amount is contained inlower dimensional subspace which indicates a compression.The maximum and the minimum eigenvalues for exponentialcorrelation matrix R as in eq. (7) are 1

(1−ρ)2 and 1(1+ρ)2

respectively [34].

Lemma 1. [34]: The SCN of any exponential-form correlationmatrix R is thus given by

limM→∞

χ (R) =(1 + ρ

1− ρ)2

(10)

The SCN can be very large for highly correlated data (ρ→ 1).

0 1 2 3 4 5 6 7 8 90

0.2

0.4

0.6

0.8

Spatial frequency × π10

DCT

response

0 2 4 6 8 100

0.1

0.2

0.3

0.4

Antenna number

Tim

eresponse

Channel response

DCT response

Fig. 1. Comparison between the DCT spatial frequency response andspatial time response at the scenario of ρ = 0.9 in (7).

The DCT basis comprises of the eigenvectors of the follow-ing symmetric tri-diagonal matrix [33]

Qc =

1− ζ −ζ 0 . . . 0−ζ 1 0 . . . 0

.... . .

. . .. . .

......

. . . −ζ 1 −ζ0 . . . 0 −ζ 1− ζ

(11)

where ζ = ρ1+ρ2 . In matrix form, the eigenvectors are given

by

[UD(k, n)] = c[k] cos

((2n+ 1)kπ

2M

). (12)

As a result, the definition of DCT has the following interpre-tation

md[k] = c[k]

M−1∑n=0

m[n] cos

(π(2n+ 1)k

2M

), (13)

5

where m is the time domain vector and md is the DCTrepresentation of m and can be written as:

m[n] =

M−1∑k=0

c[k]md[k] cos

(π(2n+ 1)k

2M

), (14)

where k represents the spatial frequency coefficient, n denotesthe antenna number, and finally c[k] is defined as below:

c[k] =

√

1M , k = 0,√2M , otherwise.

A. DCT for Uniform Linear Array

In order to interpret the DCT channel response, we needto find the DCT for ULA response, which can be expressedas (15). From (15), the DCT for channel vector following theexpression in (6) can be implemented as (16). To verify thecompression efficiency of employing DCT, we study its impacton the change of eigenvalues of the exponential correlationmatrix.

Lemma 2. [34] The SCN of the exponential correlation matrixeq.(7) after the DCT is such that

limM→∞

χ(UDRUDH) = 1 + ρ. (17)

The change of SCN from 1+ρ1−ρ to 1+ρ before and after DCT

transformation explains the compressive nature of DCT.The correlation matrix can be re-implemented as Tlc =

E[mdlcm

dlc

H] = ADRlcA

HD , which is a 2-dimensional DCT

of the Rlc. Decontaminating the training sequences from theinterference requires taking into the account to the followingimportant points

• As most information is condensed in the certain frequen-cies, these frequencies should face the lowest possibleinterference in order to obtain an accurate estimate. Thespatial frequencies are functions of angles of arrival ofpilots at the BS, thus condensing these spatial frequenciesin certain bands makes the separation process easier. Thiskind of separation can be enhanced by prearranging thespatial frequencies through performing training sequenceallocation which is discussed later in section VII.

• The users who have the least common spatial characteris-tics should be assigned the same sequence; as they havethe minimum overlap in the spatial frequency domain.

The compression pattern of DCT is defined as a functionof the power distribution among the spatial frequencies. Suchdistributions depend on the nature of the angular spread atthe BS. To enable the spatial separation among the differentestimate, the pattern of compression should be highlighted.For this purpose, we adopt the correlation models of [8] toformulate generic rules about DCT compression for ULAsystems.

Lemma 3. The spatial DCT frequencies are concentrated atlow frequencies if angle of arrivals are close to zero. If thedirection of the arrivals are close to π

2 , the DCT componentsare condensed at high frequencies.

Proof. See Appendix.

The previous Lemma 3 confirms the intuition that the con-tamination is minimum when the important frequencies of theusers who allocate the same training sequence are concentratedat different spatial band. The compression efficiency enhanceswith increasing the number of antennas at the BS, which makesthe separation much easier.

Lemma 4. As M →∞, the DCT response is condensed in asingle spatial frequency component.

Proof. See Appendix.

In the next section, we utilize the DCT compression todeal with the problem of contaminated estimation in correlatedmultiuser environment.

IV. CHANNEL ESTIMATION WITH K- TRAINING SEQUENCEREUSE

Exploiting the ULA structure, we develop a new estimatorwith the aim of decontaminating the reused training sequencesover the network. Our estimators utilize the embedded infor-mation in the second order statistics of the channel vectors.The covariance matrices capture the embedded informationrelated to the distribution (mainly mean and spread) of themulti-path angles of arrival at the base station [31]. In [10]-[11], the authors focus on optimal training sequence designsand they exploit the covariance matrices of the desired channelsand interference. The optimal training sequences are developedwith adaptation to the statistics of the disturbance [10]. Anextension of [10] is proposed in [30], which proposes a moregeneral framework for the purpose of training sequence designin MIMO systems, which handles not only minimization ofchannel estimators MSE as an optimization metric, but also theoptimization of a final performance metric of interest related tothe use of the channel estimate in the communication system.However, the design of the training sequence does not have animpact on interference mitigation, as long as, we utilize fullyaligned training sequences. Here, we concentrate on designingan estimation technique that can achieve accurate results, byexploiting the spatial frequency to mitigate the interference inCSI estimation process.

A. Bayesian EstimationThe Bayesian Estimator (BE) is widely discussed in the

literature and is utilized in many applications [32]. The BEcoincides with the minimum mean square estimator in casehlc and yc are jointly Gaussian distributed random variables.The BE estimator can be formulated as [32]

Flc = GlcSH = Rlc

(( C∑l=1

Rlc

)+σn

2

τIM

)−1SH . (18)

The previous formulation provides insights about the nature ofthe training sequence contamination; the estimator is a functionof all the correlation matrices related to all users who utilizethe same sequence. Therefore, the spatial characteristics of allusers, who have the same sequence, influence on the accuracyof the estimations.

6

aDCT[k] =

√

1Me−jωi

M−12

sin M2ωi

sinωi2

, k = 0,√2M

sin((ωi+πkM

)M2

)

sin((ωi+πkM

) 12

)ej

(ωi+πkM

)(M−1)

2+πkM +

√2M

sin((ωi−πkM )M2

)

sin((ωi−πkM ) 12

)ej

(ωi−πkM

)(M−1)

2−πkM , 0 ≤ k ≤M − 1.

(15)

hDCTlc [k] =

∑Pi=1

γi√

1M√P

e−jωiM−1

2sin M

2ωi

sinωi2

, k = 0,∑Pi=1

γi√

2M√P

(sin((ωi+

πkM

)M2

)

sin((ωi+πkM

) 12

)ej

(ωi+πkM

)(M−1)

2+πkM +

sin((ωi−πkM )M2

)

sin((ωi−πkM ) 12

)ej

(ωi−πkM

)(M−1)

2−πkM

), 0 ≤ k ≤M − 1.

(16)

1) Minimum Mean Square Error Performance : The con-sidered performance is the mean square error (MSE) of theproposed estimator, and can be expressed as [32]

Elc = Ehlc

{‖hlc − hlc‖2|hlc

}. (19)

The final formulation for MSE can be expressed as

Elc = tr

(Rlc −R2

lc

( C∑m=1

Rmc +σ2n

τIM

)−1). (20)

The upper bound of the MSE of the BE can be written as

EMIlc = tr

(Wlc

(C∆lc +

σ2n

τIM

)−1WH

lc

)(21)

where Rlc is decomposed as Wlc∆lcWHlc , and the subscript

MI refers to the “maximum interference scenario”. Thisscenario occurs when the set of users, who utilizes the sametraining sequence, has identical spatial second order statisticand attenuation towards certain BS.

The lower bound of the MSE of the BE is as:

ENIlc = tr

(Rlc

(Rlc +

σ2n

τIM

)−1)(22)

where superscript NI refers to the “no interference scenario”.This lower bound can be achieved when the users span distinctsubspaces and this condition should be satisfied {WlcWlj =0,∀j 6= c}.

Therefore, the overlap in these subspaces will degrade theestimate. A new look to the problem will be handled throughthe DCT framework in the next sections. As the work inthis paper aims at minimizing the estimation errors to reducethe contamination, we do not consider beamforming which ishandled in [14]. Taking this into the account, we only considerconventional beamforming techniques:• Coordinated Beamforming (CB) requires the estimation of

the intefering channels. The contamination occurs whenthe same BS assigns the same sequence to estimate theserved user channel as well as interfering channels and/orother BSs use the same training to estimate their users’channel or the corresponding interfering channels. In thisscenario, the contamination can be utilized to estimatethe interfering channel for the user that utilizes the sametraining sequence.

• Maximum ratio transmission (MRT), which requires theestimation of the desired user channel. The contamination

happens when the base stations utilize the same trainingsequence for their users.

B. Least Square Estimation

Least square estimation can be utilized in the scenarioswhen the information about the second order statistics is notavailable. Hence, if the received signal is modeled as (2), a leastsquare (LS) estimator for the desired channel can be formulatedas

hlslc = SHyc. (23)

The conventional estimator suffers from a lack of orthogonalitybetween the desired and interfering training sequences, aneffect known as training sequence contamination [14], [16]-[17]. In particular, when the same training sequence is reusedin all cells, the estimated channel can be expressed as

hlslc = hlc +

C∑m6=c

hmc +SHN

τ. (24)

As it appears in (24), the interfering channels have strongimpact and leak directly into the desired channel estimate.The estimation performance is then limited by the signal tointerfering ratio at the base station, which consequently limitsthe ability to design effective beamforming solutions.

1) Mean Square Error Performance: The MSE for LScan derived as (19), and the closed form expression can beformulated as

ELSlc = tr

( C∑m=1,m 6=l

Rmc

). (25)

From (25), it can be noted that MSE is a linear function of theinvolved correlation matrices traces, and it grows linearly withthe number of users who utilizes the same training sequence.

V. ESTIMATION TECHNIQUES USING DCT COMPRESSION

A. Important Frequencies Determination

In order to decontaminate the channel estimate, we need todetermine the important frequencies that should be extractedfrom the estimated channel SHyc. To determine these impor-tant frequencies, we need to solve the following quadratic form

ω∗i = arg maxω∗i

aD(ωi)RlcaHD (ωi). (26)

The solution lies in finding the eigenvectors related to themaximum eigenvalues. Therefore, the concentration of the

7

important spatial frequencies depend on the angular spreadat the BS. In order to separate the contamination from the

required estimate, we need to determine the number of theimportant frequencies. It should be noted that the number ofconsidered spatial frequency depends on the tradeoff betweenthe loss resulted from contamination and DCT lossy compres-sion. We define the vector qlc ∈ {0, 1}M×1, which identifiesthe unimportant and important frequencies of hlc as 0 and 1respectively. A definition of the compression ratio of can bestated as

η =‖qlc‖0M

. (27)

In the next section, integrated DCT-LS and DCT-BE frame-works are proposed to deal with the problem of contaminatedestimation in correlated multiuser environment.

B. DCT Based Bayesian Estimation (DBE)

As the received signal is a combination of all users’ channelwho utilize the training sequence, the DCT of the receivedsignal contains the spatial frequencies related to these chan-nels compressed in certain components. Therefore, a splittingtechnique in DCT domain is applicable if we know the mostimportant frequencies related to the required estimate. Toreduce the impact of contamination, we can extract thesefrequencies and replace the least important frequencies bytheir average values, which are close to zero. This can beexplained by Fig.(2)-a (the upper figure), it depicts the splittingcapability of DCT by distinguishing the important informationof the involved estimate, which is clearly condensed at highand low frequencies. For the first estimate (the black one), theinformation is compressed in the low spatial frequency com-ponents while the high spatial frequencies do not hold muchinformation. It can be noticed that these spatial frequencies arecontaminated by other channels (the blue and the pink ones).This contamination can be tackled by removing these spatialfrequencies and equating them to zero.

0 5 10 15 200

2

4

6

8


DCT

ofSHyc

h1

cont. from h2

cont. from h3

0 5 10 15 200

5

10

15


DCT

ofR

1 2 1cSHyc

R121 h1

cont. from R121 h2

cont. from R121 h3

Fig. 2. A comparison of DCT response employing before and aftermultiplication by R

12lc ‘cont’ denotes contamination.

The new estimation technique should take into considerationthe covariance information and DCT compression capability toboost the performance of BE. The new estimation procedurecan be described as following

A1. DCT based Bayesian Estimation (DBE)

• At user’s side: Send the training sequence s.• At BS side:

1) Modify the covariance matrices in Glc using Rlc, which canbe formulated as follows:

UD = UD �(qlc ⊗ 11×M

)UDRlcU

HD = UDRlcU

HD

2) Find yc = SHyc, extract the important information related tothe estimate yc = (UDyc)� qlc.

3) Take the inverse DCT of yc as UHD yc = yfc .

4) Find hlc = Glcyfc .

C. DCT Based Least Squares Estimation (DLS)In comparison with typical LS, in which the estimation

results in direct summation of all channel (24), the DCTBased LS (DLS) requires information about the importantfrequency set for each estimated channel. The estimation canbe summarized as follows

A2. DCT based Least Square Estimation (DLS)

• At the user’s terminals: Send the training sequence s• At the BS’s side:

1) Employ typical LS on the received signal, find hlslc .2) Employ DCT on hlslc , extract the important DCT frequencies

related to estimate by hlslc = (UDhlslc)� qlc.

3) Take the inverse DCT of the extracted version UHD hlslc .

This estimation technique can be combined with a trainingsequence allocation algorithm to make the most importantspatial frequencies distinct which simplifies the separation ofthese frequencies. The DLS can be used to estimate the channelfor MRT beamforming since it only requires the statisticalinformation of the target channel.

The major difference between DBE and DLS is that DBEdecontaminates the training sequences using two steps: zeroingthe unimportant frequencies, and the BE step to remove thecontamination from the important frequencies. BE and DBErequire the covariance acquisition of the direct and all interfer-ing links. Another complexity source is the matrix inversion.Thereby, their usage in large scale MIMO systems is limitedand constrained to the limited size problems. On the otherhand, DLS only requires the covariance knowledge of thetarget channel. It has less complexity due to the absence ofa matrix inversion step and the acquisition of the interferinglinks’ covariance matrices.

VI. MODIFIED SPATIAL ESTIMATION

To enhance the performance of BE, DBE and DLS, theimportant frequencies should be more distinguishable. Thiscan be obtained by compressing them into smallest possiblenumber of components to separate them more effectively.Therefore, a modification can be proposed to handle this issueusing the available correlation information and DCT.

The concept of increasing the correlation of the channelof interest is proposed in this section. This makes the targetchannel more ill-conditioned and thereby the contamination

8

more separable. Therefore, to estimate hlc we can increaseits correlation by multiplying the received signal yc in (4)by R

i2

lcSH , which makes the correlation of the target estimate

Ri+1lc and the correlation of contamination R

i2

lcRmcRi2

lc.The effect of implementing such step for i = 1 is depicted

in Fig.(2)-b. In the considered scenario, θs = {0◦, 15◦, 35◦}are the lower limit of the angular spread for each channelrespectively, ∆θ = 20◦ is the angular spread. It can be notedthat after employing this step, the impact of contamination inthe DCT domain is limited and does not have any influentialcontribution to the estimate and can be removed easily usingDCT. Taking into the account this property, we propose twoestimation techniques as follows

0 1 2 3 4 5 6 7−14

−12

−10

−8

−6

−4

−2

i

mse[dB]

Scenario 1

Scenario 2

Scenario3

Fig. 3. The normalized MSE vs. Ri2lc at different correlation scenarios,

K = 3, ∆θ = 20◦. Scenario 1:θs = [0◦, 20◦, 40◦] , Scenario 2:θs =

[0◦, 15◦, 30◦], Scenario 3:θ = [0◦, 10◦, 203irc]

A. Modified BE Based Estimation (MBE)

The multiplication by Ri2

lc increases the correlation of thetarget estimate, the modified correlation can be decomposed asWlc∆

i+1lc WH

lc . This makes eigenvalues of the target estimatemore distinct since the stronger eigenvalue becomes moreeffective and the opposite holds for the weaker ones. Therefore,the channel becomes more ill-conditioned and more separable.However, the impact of this multiplication is unknown withrespect to the interfering subspaces and heuristic approach isproposed to gain the benefit of it when it is applicable. Afterthe multiplication, the received signal can be written as

y′

c = γlcRi2

lcSHyc = γlcR

i2

lcSHS

C∑m=1

hlc + Ri2

lcSHnc (28)

where γlc is designed to keep the signal power fixed ‖y′c‖2 =‖yc‖2. The modified BE can be derived in a similar fashionto traditional BE (19)-(23) in the original manuscript butwith assuming that correlation matrices Qmc = R

i2

lcRmcRi2

lc.To utilize the enhanced compression capability, multiply thereceived signal by R

i2

lcSH and then employ the following filter

to estimate the time domain hlc2. This makes the formulation

of the final estimation at the BS as

Pilc = γlcR1+ i

2lc

(Ri2lc

( C∑m=1

Rmc

)Ri2lc +

σ2n

τRilc

)−1

Ri2lcS

H (29)

The total MSE of estimating hlc can be evaluated as

εilc = Ehlc

{tr(

(hlc −Pilcyc)(hlc −Pi

lcyc)H)}

(30)

and finally the MSE can have the following closed formexpression

εilc = tr

(Rlc − γ2

lcRi+2lc

(Ri2lc

( C∑m=1

Rmc)Ri2lc +

σ2n

τRilc

)−1). (31)

To study the impact of this multiplication on the MSE ofdifferent MBE is studied in Fig.(3), in which the MSE isdepicted with respect to i, where i = 0 refers to typical BE. Itcan be concluded at scenario 2 and 3 that this multiplicationreduces the MSE, which motivates the utilization of this stepat the estimation process. On contrary to the two scenarios,the first one shows a contradicting performance. This can beexplained that without the multiplication, it is impossible toseparate the channel estimations due to the complete overlapamong the users’ subspaces. This factor passively enhancesin case of this multiplication due to the noise amplificationfactors. Therefore, an adaption between the typical BE andmodified BE is required to get the joint benefits of the bothtechniques. Moreover, it can be noted that the best variant (theith power) of Rlc from the MSE perspective is the first oneand it is sufficient to be used in the estimation.

In order to optimize the system performance, we can adaptthe estimation strategy based on the involved users who utilizethe same training sequence. The adaptive strategy Jl can beexpressed as

Jl =

{BE , EBElc ≤ EMBE

lc

MBE , EBElc ≥ EMBElc .

(32)

The selection procedure is implemented at the system instal-lation level and is performed once for each channel. There-fore, the users who utilize the same training sequence canhave different estimation techniques. It can be noted that theimportant frequencies of the target estimate do not changewith the multiplication of R

12

lc, since the eigenvectors of theestimate are not change by this multiplication. However, thesubspaces spanned by interfering signals change due to themodified covariance matrices Rlm = R

12

lcRlmR12

lc. However,there is no mathematical explanation clarifies the relationbetween the eigenvectors of Rlm and Rlm. This makes theselection of the most appropriate estimation technique hard,thus the adaptation based on MSE is the solution for thisissue. The adaptation depends on the subspaces spanned bythe target and interfering signals related to traditional andmodified techniques. Moreover, the multiplication by othervariants can lead to excessive contamination correlation with

2It is derived in a similar fashion to Bayesian estimation technique usingdifferent correlation set Rm

lc = R2lc, the estimated channel for the cth user

and Rmlm = R

12lcRlmR

12lc.

9

target subspaces, this makes the contamination subspaces spanthe same subspaces as the target estimate which actually makesthe contamination mitigation much harder.

B. Modified DCT Bayesian Estimation (MDBE)

The concept of integrating the DCT with modified BE canbe adopted to enhance the estimation quality, the modifiedDCT BE can be described as following

A3. Modified DCT Bayesian estimation (MDBE)

• Multiply R12lcS

H by the received signal yc and get glc.• Employ the same steps (1)-(4) as DBE at glc to glc. It should be

noted that Glc should be replaced Plc.• To obtain the estimate hlc = glc.

The adaptation idea can be exploited as in (32) to enhance theperformance of DCT based estimation to optimally select theestimation technique from DBE or MDBE.

C. Modified DCT LS Estimation (MDLS)The joint utilization of the DCT and LS within the modified

framework is proposed to boost the quality of LS estimate,and can be summarized as follows

A4. Modified DCT LS estimation (MDLS)

• Multiply R12lcS

H by the received signal yc and get glc.• Employ the same steps (1)-(4) as DLS at glc to glc.

• To obtain the estimate hlc = R− 1

2lc glc.

rrelation matrix structure Rlc = Wlc∆lcWHlc = Wlc∆lcWlc,

where ∆lc ∈ Rr×r+ ,Wlc ∈ CM×r are the matrices that containthe positive eigenvalues and the eigenvectors that are associatedwith the positive eigenvalues. This enables the implementationof the proposed algorithms in the case of ill-conditionedchannels. The usage of DLS and MDLS can be adapted like(32) to enhance the quality of estimation.

VII. JOINT SPATIAL ESTIMATION AND TRAININGSEQUENCES ALLOCATION

Several training sequence assignment techniques are inves-tigated in the recent literature [21]- [22], in order to selectthe optimal set of users that can utilize the same trainingsequence simultaneously. These allocation techniques mimicthe previously proposed scheduling algorithms in multiuserMIMO scenarios [5] as they try to assign the same trainingsequence to the users who have distinct subspaces. The trainingsequence assignment strategies inherit the concept of multiuserdiversity, which depends on the pool of the involved users, asa consequence, the system performance is degraded for lownumber of users. In this section, an heuristic approach assignsthe available training sequences to required users’ channelestimates is proposed as

A. Training Sequence Allocation for BE related techniques

Define the set of users who utilizing the training sequencesl by U(sl), and the set of user who adopt the modified BEestimation as UM (sl) ⊂ U(sl) and the set of users whose

channel are estimated by typical BE UB(sl) ⊂ U(sl) such thatUM (sl) ∪ UB(sl) = U(sl) and the set of all users U . The setof all users who have not assigned a training sequence yet isdefined as M. We define a measuring function as (33), theallocation algorithm is described as

A5. Greedy Training Sequence Allocation Algorithm for (BE/MBE)estimation

• Start from any training sequence, without any loss of generalitywe start with s1.

• Determine the training sequence reuse factor K.• Define the set of users who should allocate the same training

sequence as U(sl).• Pick random user u ∈M = U − ∪l−1

n=1U(sn)• Step 1: set U(sl) = u• Step 2: if |U(sl)| ≤ K

1) for users= M(1) to M(|M|)

u∗ = minu

(min

uM∈ME(UB(sl),UM (sl) ∪ {uM})

, minuB∈M

E(UB(sl) ∪ {uB},UM (sl)))

2) M =M/{u∗}3) Estimation technique selection for u∗ is done according to{

MDBE, E(UB(sl),UM (sl) ∪ {u∗}) ≤ E(UB(sl) ∪ {u∗},UM (sl))

DBE, E(UB(sl),UM (sl) ∪ {u∗}) ≤ E(UB(sl) ∪ {u∗},UM (sl))

4) U(sl) = U(sl) ∪ {u∗}5) go to step 2.

This algorithm aims at minimizing the estimation error intwo steps: the first step is summarized by allocating trainingsequences to the set of users whose spatial signatures havethe most distinctive characteristics. The performance improveswith the number of user as it becomes more likely to findusers with distinct second order statistics to be assigned thesame training sequence. The second step is the selection ofthe optimal estimation technique whether it is the typical orthe modified BE.

B. Training Sequence Allocation for DLS/MDLS Techniques

By taking a look at (25), it can be noted the MSE dependson the trace of the interfering covariance matrices of the usersusing the same sequences.

Theorem 1. For typical LS, employing any training sequenceallocation algorithm does not reduce the MSE performance.

Proof. It can be proved from (25) that MSE performancedepends on the traces of all involved covariance matricesand it does not depend on the signal space spanned by eachcorrelation matrices. Therefore any set of correlation matriceshas the same trace will result in the same MSE.

However, this theorem does not apply for DCT based LSestimation. Since the MSE metric of LS does not provide usan indication about the spatial separation among the subspacesand the spatial frequencies. Since the correlation matricescapture the information about subspaces, the intersection ofthese subspaces results in interference. Therefore, to measurethe spatial separation between the lth and mth users towards

10

E(UB(sl),UM (s1)) =∑

∀c∈UM (sl)

tr

(Rlc −R

32lc

( ∑∀m∈UM (sl)

R12lcRmcR

12lc +

σ2n

τRlc + ζIM

)−1)

(33)

+∑

∀c∈UB(sl)

tr

(Rlc −R2

lc

( ∑∀m∈UB(sl)

Rmc +σ2n

τIM)−1

).

the cth BS, we define the following metric:

δclm =tr(RlcRmc

)tr(Rlc)tr(Rmc)

(34)

where 0 ≤ δclm ≤ 1. When δclm is close to 1, the users spanhighly overlapped subspaces, but when δclm is close to 0, theusers span a highly separated subspaces. The spatial separationbetween the mth user in cth and interference subspaces of allinterfering users at the estimation process can be written as

δcl =tr(Rlc

∑Cm=1,m 6=l Rmc

)tr(Rlc)tr(

∑Cm=1,m6=l Rmc)

. (35)

Define the set of users who utilizing the training sequencesl by U(sl), the set of all users U and the set of all users whohave not assigned a training sequence yet asM. We define thefollowing metric as

δ(U(sl)) =∑

l∈U(sl)

tr(Rlc

∑m∈U(sl),m 6=l Rmc

)tr(Rlc

)tr(∑

m∈U(sl),m 6=c Rmc

) .The algorithm can be summarized in (A6).

A6. Greedy Training Sequence Allocation Algorithm for DLS/MDLS

• Start from any training sequence. Without any loss of generality,we start with s1.

• Determine the training sequence reuse factor K.• Define the set of users who should allocate the same training

sequence as U(sl)• Pick random user u ∈M = U − ∪l−1

n=1U(sn)• Step 1: set U(sl) = u• Step 2: if |U(sl)| ≤ K for users M(1) to M(|M|)

1) u∗ = minimizeu∈M

δ(U(sl) ∪ {u})2) M =M/{u∗}3) U(sl) = U(sl) ∪ {u∗} go to Step 1.

However, the performance of DLS and MDLS can be opti-mized using training sequences allocation algorithm. This canbe explained by their dependency on the spatial content of theusers who utilize the same training sequence. Unfortunately,as theorem 1 states, the MSE metric of LS is incapable ofaddressing such a separation. On the other hand, the MSEof BE metric offers such a quality so we can optimize theperformance of DLS and MDLS by allocating the trainingsequences as in previous greedy algorithms.

VIII. NUMERICAL RESULTS

In the previous sections, several algorithms were developedto decontaminate the training sequences in multiantenna mul-ticell interference channels. In this section, the performanceof these algorithms are evaluated and compared. Monte Carlosimulations are performed to examine the efficiency of the

LS Least Square Estimator (23).BE Bayesian Estimator (18).DBE DCT Bayesian Estimator (A1).MBE Modified Bayesian Estimator (29).DLS DCT Least Square Estimator (A2)MDBE Modified DCT Bayesian Estimator (A3).MDLS Modified DCT Least Square Estimator (A4).ABE-MBE Adaptive BE-MBE (32).ADBE-MDBE Adaptive DBE-MBE . . .PA Training sequence Allocation A5-

A6.WPA Without training sequence allocation . . .

TABLE ITHE ACRONYMS USED IN THE SIMULATION SECTION FOR THE ADOPTED

ALGORITHM AND THEIR CORRESPONDING EQUATIONS.

proposed algorithms. To assess the system performance, weneed to compare it with the state of the art techniques as typicalBayesian estimator and least square estimator. The consideredalgorithms and the corresponding equations are summarized intable I.

We consider a multi-cell network where the users are alldistributed around the base stations. We denote the ratio ofthe power of the direct link αcc over interfering link αlc withβlc = αcc

αlc. To test the validity of the proposed algorithms,

we deal with estimation at interference limited scenario, weconsider βlc = 1,∀l, c, we drop the index β for the ease ofnotation. We adopt the model of a cluster of synchronizedand hexagonally shaped cells. We assume that the trainingsequence P = 0dB. The used metric for evaluating the systemperformance is normalized sum mean square error and can beexpressed as:

ε = 10 log10

(∑Cc=1 ‖hlc − hlc‖2∑c

c=1 ‖hlc‖2). (36)

In this section, we adopt the correlation model in II-A2 θskdenotes the lower limit of the angular spread for each user withrespect to the kth BS. ∆θ is the span of the angular spread andit is assumed to be the same for all users. ∆oθ determines theamount of the overlap between the angular spread of differentusers. The compression ratio η is selected optimally using fullsearch to minimize η = arg min

η∈(0,1]ε and this is considered

along the simulations section unless mentioned otherwise.

A. Accurate Second order statistics

Fig. (4) illustrates the comparison among the different esti-mation techniques with respect to the number of antennas. Itcan be noted that MSE performance reduces monotonicallywith respect to the number of antennas at BS. It can be

11

10 15 20 25 30 35 40 45 50−25

−20

−15

−10

−5

0

5

Number of antennas

MSE

(dB)

LS

DLS

MDLS

ABE-MBE

ADBE-MDBE

Fig. 4. The normalized MSE ε vs. number of antennas, the consideredscenario C = 2, K = 2, β = 1, ∆θ = 20◦, θs1 = {10◦, 25◦},θs2 = {20◦, 35◦}, ∆oθ = 5◦, and P = 0.

concluded that the LS has the worst performance in comparisonwith the other techniques. This can be explained by thefact that LS estimation just removes the impact of trainingsequence without introducing any processing to the aggregateof the received signals. DLS and MDLS are introduced toexploit the correlation information of the target estimate andincorporate the concept of DCT to get the important spatialfrequency related to each estimate. It can be inferred thatDLS and MDLS outperform the typical LS for all antennasscenario. In this figure, we also depict the comparison betweenthe BE estimation techniques. The proposed techniques DBEand MDBE overcome the ABE-MBE. Intuitively, the adaptiveMBE-BE performs better than BE and also this applies toDCT based technique, so the figures in this section displaythe comparison between the adaptive proposed schemes andthe typical ones.

0.3 0.4 0.5 0.6 0.7 0.8 0.9−20

−15

−10

−5

0

5

MSE

(dB)

η

DLS

MDLS

BE

MDBE

Fig. 5. The normalized MSE ε vs. η, the considered scenario C = 2,K = 2, β = 1, P = 0, θs1 = {10◦, 30◦}, θs2 = {20◦, 35◦}∆θ = 20 and ∆oθ = 5◦.

In Fig. (5), we study the performance of different pro-posed DCT based estimation techniques with respect to the

compression ratio η. It can be noted that at high η, a largepercentage of spatial frequencies are utilized. Therefore, thecontamination is still contained in the influential frequencies ofeach estimate, which makes it hard to get an accurate estimate.On the other hand, when η is low, a small percentage ofspatial frequencies are utilized which also removes a part ofthe useful signal to get an accurate channel estimate. Fig. (5)shows that for the DCT Based LS estimation (DLS, MDLS)the optimal η is lower than DCT Based Bayesian estimation(DBE, MDBE). This can be justified by the fact that typical BEestimation techniques, by their construction, have the capabilityof handling the interference at estimation in contrast to LSbased estimation. This makes the DCT compression step forDLS and MDLS more valuable as it adds the capability ofmitigating the contamination. While for DBE and MDBE, itenhances the capability of mitigating the interference, whichmakes it unnecessary to have lower values of η.

2 3 4 5−20

−15

−10

−5

0

5

10

15

Training sequence reuse factor K

MSE

(dB)

LS

DLS

MDLS

ABE-MBE

ADBE-MDBE

Fig. 6. The normalized MSE vs training sequence reuse factorsM = 20,P = 0dB, β = 1, θs1 = {10◦, 25◦, 40◦, 55◦, 70◦},θs2 = {20◦, 35◦, 50◦, 65◦, 80◦}, θs3 = {0◦, 15◦, 30◦, 45◦, 60◦},θs4 = {35◦, 50◦, 65◦, 70◦, 85◦}, θs5 = {85◦, 70◦, 55◦, 40◦, 25◦},∆θ = 20, and ∆oθ = 5◦.

Fig. (6) illustrates the MSE performance with respect totraining sequence reuse factor. It is anticipated that increasingthe training sequence reuse over the network increases theinterference levels at the estimation, which makes it harderto obtain an accurate estimate of the required channel. Thisfigure plots the comparison of different proposed estimationalgorithms. It can be noted that there is a gap between ADBE-MDBE and ABE-MBE at low training sequence reuse factors,but this gap reduces with increasing the training sequence reusefactor. This can hamper the implementation of ADBE-MDBEat high training sequence reuse factor. However, the simulatedscenarios consider interference limited case namely β = 1.MDLS performs closely to ABE-MBE for all training sequencereuse factors. Comparing MDLS with DLS, it can be seen thatMDLS shows an enhanced performance over DLS, and theboth proposed techniques outperform the typical LS for alltraining sequence reuse values.

Fig. (7) displays the comparison of different estimation tech-niques with the respect to the angular spread overlap ∆oθ. The

12

considered scenario. Intuitively, higher overlap of the angularspread leads to higher training sequence contamination. It canbe noticed that DLS and MDLS converge to LS in the scenarioof complete angular spread overlap. While for BE techniques,it can be viewed that ADLS-MDLS has the same performanceas ABE-MBE in case of complete angular spread overlap. Forthe scenario of distinct subspaces ∆oθ = 0◦, the typical BEand MBE and consequently their adaptive schemes performbetter than ADBE-MDBE because there is no contaminationat this scenario, and the DCT compression is meaningless. The

0 5 10 15 20−16

−14

−12

−10

−8

−6

−4

−2

0

2

Angular Overlap ∆θo

MSE

(dB)

LS

DLS

MDLS

ABE-MBE

ADBE-MDBE

Fig. 7. The normalized MSE vs angular overlap K = 2, C = 2,P =0dB, β = 1, θs1 = {10◦, 30◦}, θs2 = {70◦, 85◦}, M = 10.

efficiency of employing training sequence allocation algorithmsis depicted in Fig. (8). We want to allocate 4 training sequencesto 8 users in the two cells. The comparison of each estimationtechnique without and with training sequence allocation isplotted and it can be inferred that the system performance is en-hanced if training sequence allocation algorithm is employed.This can be explained by the fact that selecting the which areassigned the same training sequence enhances the chances ofbeing them naturally separable. Therefore, the contaminationis reduced by employing these algorithms.

It is observed from the simulations that adapting the perfor-mance of different BE techniques as ABE-MBE and ADBE-MDBE achieves a better performance than employing justBE or MBE. Simulations have shown that the frequency ofusing MBE (or MDBE) versus BE (or DBE) in the adaptivealgorithm is 75% versus 25% in the scenario of C = 2, K = 2,∆oθ = 10◦ and ∆θ = 25◦. This observation shows that themodified algorithm outperforms the original ones in majorityof channel realizations.

B. The impact of inaccurate second order statistics

In the previous figures, we study the performance of theproposed techniques assuming accurate covariance acquisitionat all BSs. Assuming inaccuracies and estimation errors inthe covariance acquisition step may affect the performanceof the suggested methods. The impact of these inaccuraciesis depicted in Fig. (9), which plots the relation betweenthe MSE and the uncertainty. It can be noted that typical

10 15 20 25 30 35 40 45 50−30

−25

−20

−15

−10

−5

0

Number of antennas

MSE

(dB)

DLS-WPA

MDLS-WPA

ABE-MBE-WPA

ADBE-MDBE-WPA

DLS-PA

MDLS-PA

ABE-MBE-PA

ADBE-MDBE-PA

Fig. 8. The normalized MSE vs number of antennas M = 20,P =0dB, β = 1, θs1 = {10◦, 40◦}, θs2 = {40◦, 60◦},∆θ = 20, and∆oθ = 5◦.

LS is not affected by the uncertainty, this is intuitive sincethe estimation process does not depend on the covarianceinformation. This fact also applies on the MSE assuming DLSestimation. Although the important spatial frequencies deter-mination depends on the covariance information, the estimatordesign is independent from covariance information at BSs.Moreover, it can be concluded that the BE based techniquesare more sensitive to covariance errors as they are functions ofthe covariance matrices of the involved users,the inaccuraciesaffect the contamination rejection at different BSs. These sys-tems’ MSE increase with respect to the amount of covariancesinaccuracies. Finally, since MDLS is modified version of DLSand LS, which is based on increasing the covariance of thetarget channels, it is expected that the covariance inaccuraciesdegrade the estimation performance at different BSs.

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45−25

−20

−15

−10

−5

0

uncertainty

MSE

(dB)

LS

DLS

MDLS

ABE−MBE

ADBE−MDBE


The impact of inaccuracies considering the overlap in angu-lar spread users’ subspaces is depicted in Fig. (10). In compari-

13

son with Fig. (9), the performance of the proposed algorithms isstudied. It can be noted that the proposed techniques are moresensitive to uncertainties which can be translated into higherMSE at all. As conclusion, to protect the system from theseuncertainties, training sequence allocation should be adaptedto take into consideration these uncertainties in their designs.

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45−18

−16

−14

−12

−10

−8

−6

−4

uncertainty

MSE

(dB)

MDLS

ABE−MBE

ADBE−MDBE


IX. CONCLUSION

In this work, we studied the interference during the channelestimation and its impact on the system performance in multi-cell multiantenna networks. We investigated the performance ofa Bayesian estimation and a least square estimation frameworkand formulated the lower bound and the upper bound ofmean square error for such estimator. We proposed modifiedtechniques to enhance the estimation accuracy by introducingthe DCT, thereby transforming the problem into a differentdomain. This allowed the development of a new interferencemitigation algorithm by compressing the spatial frequencies.It enabled enhanced estimation utilizing the DCT compressingcapability to reduce the overlap in the interfering subspaces andboost the separation in the spatial domain. We incorporated thisconcept with training sequence allocation to assign differenttraining sequences to reduce the interference in the overlappingarea of the DCT subspaces. The performance of proposedalgorithms was studied and compared to current state of theart techniques. From the simulation results, it can be concludedthat the proposed algorithms provide considerable gains overthe conventional Bayesian and least squares estimation tech-niques.

APPENDIXProof of Lemma 3: This can be proven by taking the

expectation of Eθ[hdlc] at each spatial frequency k taking intothe account the different distribution of the arrival angles atthe BS. For uniform angular distribution on [θ1, θ2]

Eθ[hdlc[k]] =

∫ θ2

θ1

hdlc[k]dFθ =1

θ2 − θ1

∫ θ2

θ1

hdlc[k]dθ

Since the term ωi = c sin(θi) exist in all k and make theintegration term non integrable. For k = 0, the integration canbe replaced by another integration,

Eω[hdlc[0]] =1

c(θ2 − θ1)

∫ ω2

ω1

hdlc[0]

√1−

(ωc

)2dω

≤ 1

c(θ2 − θ1)

∫ ω2

ω1

hdlc[0]dω (37)

For k = 0, If the angular spread angles close to zero, theprevious term has a constant value, while if the angular spreadis close to π

2 the previous integration is close to zero. For highfrequencies, using the distributive property of integration, wecan solve the related to integration by setting x1 = c sin θ− kπ

M

and x2 = c sin θ + kπM as:

Eω[hdlc[k]] =1

θ2 − θ1

∫ c sin θ2− kπM

c sin θ1− kπMhdlc[k]

√1− (

x1 + kπM

c)2dx1

+1

θ2 − θ1

∫ c sin θ2+kπM

c sin θ1+kπM

hdlc[k]

√1− (

x2 − kπM

c)2dx2.

For angle of arrival close to zero, this integration will be closeto zero. While for angle of arrival close to π

2 , this term willresult into considerable constant, which proves the lemma.

Proof of Lemma 4: If we assume θs is close to zero, findingthe expectation for small angles close to zero at different spatialfrequencies as

c(k) = limM→∞

Eω[|a[k]|] (38)

It can be found using (39)-(43) that the zero frequencyconverges to constant while the rest frequencies converge tozero. If θs is close to 90◦, all the frequencies will converge tozero, except for the highest spatial frequency.

REFERENCES[1] E. Bjornson, R. Zakhour, D. Gesbert and B. Ottersten,“Cooperative Multicell Pre-

coding: Rate Region Characterization and Distributed Strategies with Instantaneousand Statistical CSI,” IEEE Transactions on Signal Processing, vol. 58, no. 8, pp.4298-4310, August 2010.

[2] S. Chatzinotas, M. A. Imran, C. Tzaras, Capacity Limits in Cooperative CellularSystems. Cooperative Wireless Communications, Auerbach Publications, Taylor andFrancis Group, 2009.

[3] J. Zhang, and A. Jeffrey, “Adaptive spatial intercell interference cancellation inmulticell wireless networks,”IEEE Journal on Selected Areas in Communications,vol. 28, no. 9, pp. 1457-1467, December 2010.

[4] N. Jindal, “MIMO broadcast channels with finite-rate feedback,” IEEE Transactionson Information Theory, vol. 52, no. 11, pp. 5045-5060, November 2006.

[5] T. Yoo, N. Jindal, and A. Goldsmith, “Multi-antenna broadcast channels with limitedfeedback and user selection,” IEEE Journal on Selected Areas in Communications,vol. 25, no. 7, pp. 1478-1491, September 2007.

[6] A. Ashikhmin and R. Gopalan, “Grassmannian packings for efficient quantization inMIMO broadcast systems,” IEEE International Symposium on Information Theory(ISIT), July 2007.

[7] N. Lee and W. Shin, “Adaptive Feedback Scheme on K-Cell MISO InterferingBroadcast Channel with Limited Feedback,” IEEE Transactions on Wireless Com-munications, vol. 10, no. 2 pp. 401-406, February 2011.

[8] P. Zetterberg,“Experimental Investigation of TDD Reciprocity-Based Zero-ForcingTransmit Precoding,” EURASIP Journal on Advances in Signal Processing, Volume2011, Article ID 137541, 2011.

[9] F.A. Dietrich and W. Utschick, “Pilot-Assisted Channel Estimation Based on Second-Order Statistics,” IEEE Transactions on Signal Processing, vol. 53, no. 3 , pp. 1178- 1193, March 2005.

[10] E. Bjornson and B. Ottersten, “A Framework for Training-based Estimation inArbitrarily Correlated Rician MIMO Channels with Rician Disturbance,” IEEETransactions on Signal Processing, vol. 58, no. 3, pp. 1807-1820, March 2010.

[11] M. Biguesh, and A. B. Gershman, “Training-based MIMO channel estimation: astudy of estimator tradeoffs and optimal training signals,” IEEE Transactions onSignal Processing, vol. 54, no. 3, pp. 884-893, March 2005.

14

c(k) =

0.5λj2πd∆θ

(log 1−e

j2πdθ2λ

1−ej2πdθ1λ

− log e−j2πdθ2

λ (−1+e−j2πdθ2

λ )

e−j2πdθ1


λ )

)+ 1, ,M = odd, k = 0,

0.5λj2πd∆θ

(log 1−e

j2πdθ2λ

1−ej2πdθ1λ

− log e−j2πdθ2


λ )

e−j2πdθ1


λ )

),M = even, k = 0,

0 k 6= 0.

(39)

Eω[√M |aDCT

(0)|] =

λ2πd∆θ

( M−12∑

m=1

sinmθ2 − sinmθ1

m+

2πd

λ(θ2 − θ1)

),M = odd, k = 0,

λ2πd∆θ

M2∑

m=1

sinmθ2 − sinmθ1

m,M = even, k = 0.

(40)

∞∑m=0

sinmθ

m= θ + 0.5j

(log

(1− ejθ

)− log

(e−jθ(−1 + ejθ)

))(41)

aDCT(k) =

((sin((ωi + πk

M )M2 )

sin((ωi + πkM ) 1

2 )

)2

+

(sin((ωi − πk

M )M2 )

sin((ωi − πkM ) 1

2 )

)2

+2sin((ωi + πk

M )M2 )

sin((ωi + πkM ) 1

2 )

sin((ωi − πkM )M2 )

sin((ωi − πkM ) 1

2 )cos(π(M + 1)k

M

)) 12

(42)

aDCT(k) =

√

2

√(sin((ωi+

πkM )M2 )

sin((ωi+πkM ) 1

2 )− sin((ωi−πkM )M2 )

sin((ωi−πkM ) 12 )

)2≈ 0 ,M = odd, k 6= 0,

√2

√(sin((ωi+

πkM )M2 )

sin((ωi+πkM ) 1

2 )+

sin((ωi−πkM )M2 )

sin((ωi−πkM ) 12 )

)2≈ 0 ,M = even, k 6= 0.

(43)

[12] T. L. Marzetta, “Noncooperative cellular wireless with unlimited numbers of basestation antennas,” IEEE Transactions Wireless Communications, vol. 9, no. 11, pp.3590-3600, Nov. 2010.

[13] E. Bjornson, M. Kountouris, M. Bengtsson, and B. Ottersten,“Receive Combiningvs. Multistream Multiplexing in Downlink Systems with Multi-antenna Users,”IEEETransactions on Signal Processing, vol. 61 , no. 13, pp. 3431 - 3446 ,July 2013.

[14] J. Jose, A. Ashikhmin, T. L. Marzetta and S. Vishwanath, “Pilot Contamination andPrecoding in Multi-Cell TDD Systems,” IEEE Transaction Wireless Communications,vol. 10, no. 8, pp. 2640 - 2651, Aug. 2011.

[15] K. Appaiah, A. Ashikhmin and T. L. Marzetta, “Pilot Contamination Reductionin Multi-user TDD Systems,” IEEE International conference on Communications(ICC), May 2010.

[16] H. Q. Ngo, T. L. Marzetta, and E. G. Larsson, “Analysis of the pilot contaminationeffect in very large multicell multiuser MIMO systems for physical channel mod-els,” IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), May 2011.

[17] B. Gopalakrishnan and N. Jindal, “An analysis of pilot contamination on multi-user MIMO cellular systems with many antennas,” IEEE International Workshop onSignal Processing Advances in Wireless Communications (SPAWC), June 2011.

[18] H. Q. Ngo and E. G. Larsson, “EVD-based channel estimation in multicell multiuserMIMO systems with very large antenna arrays,” IEEE International Conference onAcoustics, Speech and Signal Processing (ICASSP), March 2012.

[19] Y. Li, Y.-H. Nam, B. L. Ng and J. Zhang, “A Non-asymptotic Throughput forMassive MIMO Cellular Uplink with Pilot Reuse,” IEEE Global Conference onCommunication (Globecom), December 2012.

[20] N. Krishnan, R. D. Yates, and N. B. Mandayam, “Cellular Systems with ManyAntennas: Large System Analysis under Pilot Contamination,” Allerton Conferenceon Communications, Computing and Control, October, 2012.

[21] H. H. Brunner, M. H. Castaneda, J. A. Nossek, “How Much Training is Needed forInterference Coordination in Cellular Networks,” ITG Workshop on Smart Antennas(WSA), March 2012.

[22] H. Yin, D. Gesbert, M. Filippou, and Y. Liu, “A Coordinated Approach to ChannelEstimation in Large-scale Multiple-antenna Systems,” IEEE Journal in selected Areasin Communications, vol. 31 , no. 2, pp. 264 - 273, February 2013.

[23] M. Alodeh, S. Chatzinotas and B. Ottersten, “Joint Channel Estimation andPilot Allocation in Underlay Cognitive MISO Networks,” invited paper to IEEEInternational Wireless Communications and Mobile Computing (IWCMC), Cyprus,August 2014.

[24] A. Sayeed, “ Deconstructing Multiantenna Fading channel,”IEEE Transactions onSignal Processing, vol. 50 , no. 10 , pp. 2563- 2579, October 2002.

[25] C. Martin, and B. Ottersten, “Asymptotic eigenvalue distributions and capacity forMIMO channels under correlated fading,” IEEE Transactions on Signal Processing,vol. 3, no. 4, pp.1350 - 1359, July 2004.

[26] N. Ahmad, T. Natarajan and K. R. Rao, “Discrete cosine transform,” IEEETransactions on Computers, vol. 100, no. 1, pp. 90-93, 1974.

[27] H. Chen and B. Zeng, “New Transforms Tightly Bounded by DCT and KLT,”IEEELetters on Signal Processing, vol. 19, no. 6, pp. 344 - 347, June 2012.

[28] Y.-H. Yehia and S.-G. Chen, “DCT-based channel estimation for OFDM systems,”IEEE International Conference on Communications (ICC), June 2004.

[29] P. Zetterberg and B. Ottersten, “The spectrum efficiency of a base station antennaarray for spatially selective transmission,” IEEE Transactions on Vehicular Technol-ogy, vol. 44, no. 3, pp. 57-69, August 1995.

[30] D. Katselis, C. R. Rojas, M. Bengtsson et. al, “Training Sequence Design for MIMOChannels: An Application-Oriented Approach,” EURASIP Journal on WirelessCommunications and Networking,2013 2013:245.

[31] A. Scherb and K. Kammeyer, “Bayesian channel estimation for doubly correlatedMIMO systems,” IEEE Workshop on Smart Antennas (WSA), March 2007.

[32] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory.Englewood Cliffs, NJ: Prentice Hall, 1993.

[33] A. N. Akansu and R. A. Haddad, Multiresolution Signal Decomposition: Transforms,Subbands, and Wavelets.Academic Press, Inc. 1992.

[34] F. Beaufays,“Transform-domain Adaptive Filters: An Analytical Approach,” IEEETransactions on Signal Processing, vol. 43, no. 2, pp. 422-431, February 1995.

[35] M. Alodeh, S. Chatzinotas and B. Ottersten, “Spatial DCT-Based Least SquareEstimation n Multi-antenna Multi-cell Interference Channels,” IEEE InternationalConference on Communications (ICC), Sydney, Australia , June 2014.

spatial dct-based channel estimation in multi-antenna multi-cell … · 2015-01-12 · 1 spatial...

Documents