06_vtc_habendorf

8/8/2019 06_VTC_habendorf

1/5

On Ordering Optimization for MIMO Systems with

Decentralized Receivers

Rene Habendorf and Gerhard Fettweis

Vodafone Chair Mobile Communications Systems, Technische Universitat Dresden, Germany{habendorf,fettweis}@ifn.et.tu-dresden.dehttp://www.ifn.et.tu-dresden.de/MNS

Abstract This paper addresses Tomlinson-Harashima Precod-ing (THP) for the downlink of multiuser systems, where thetransmitter is equipped with multiple antennas and each decen-tralized receiver has a single antenna. Although the performanceof THP strongly depends on the ordering of the precodedsymbols, current THP approaches apply data independent sortingalgorithms. The contribution of this paper is the extension of THPwith an algorithm exploiting the knowledge of the data symbolsto compute the optimum stream ordering for precoding.

I. INTRODUCTION

In systems with decentralized non-cooperative mobile de-

vices space division multiple access (SDMA) algorithms have

to be performed at the base station which enables a joint

processing of all transmit and receive signals.

Hence Multiuser Detection (MUD) is suitable for the uplink

whereas for the downlink being the capacity bottleneck in var-

ious multimedia applications, Multiuser Transmission (MUT)

can be applied if channel state information is available at the

transmitter. In time division duplex (TDD) systems this can

be achieved due to the channel reciprocity and if the channel

coherence time is large compared to one transmission interval.

However, in frequency division duplex (FDD) schemes, the

channel estimates obtained at the receiver have to be sent back

to the transmitter via same backward channel.

In this paper we focus on the nonlinear Tomlinson-

Harashima Precoding (THP) which was initially proposed

for the equalization of SISO channels in [1],[2] and has

been applied to matrix channels in [3]. In [4], joint spatio-

temporal THP was proposed. Nonlinear constellation shaping

techniques taking the equivalent decision regions into account

have been investigated in [5], [6], [7], and [8].

The performance of THP strongly depends on the ordering

of the precoded symbols which corresponds to a permutationof the rows of the channel matrix. In recent years, differ-

ent data independent approaches for computing an improved

ordering of the channel matrix have been investigated. Our

proposed method exploits the data symbols which are perfectly

known to the transmitter to compute a precoded signal with

minimum transmit power enhancement. To obtain the optimum

ordering, a tree search algorithm is applied.

The performance of the proposed scheme is investigated

numerically in both uncoded and coded scenarios.

I I . SYSTEM MODEL

We consider downlink transmission from a base station

using NT transmit antennas to NR non-cooperative mobilestations (MS), each equipped with a single receive antenna.

Throughout the paper we restrict ourselves to the case of

NT NR. The independent zero mean complex Gaussianchannel taps hq,k from transmit antenna k to the receiveantenna of MS q with unit variance

E{hq,k

2

}= 1 are ar-

ranged in the flat block fading channel matrix H CNRNT .

With s = [s1, . . . , sNT ]T and r = [r1, . . . , rNR ]

T, denoting

the vectors of transmit and receive symbols, respectively, the

transmission equation for burst interval k reads as

r(k) = H(k)s(k) + (k) . (1)

At the receivers, the signal is distorted by the noise vector

CNR1 of circularly symmetric complex Gaussian i.i.d.samples with covariance matrix E{H} = 2INR . Through-out this paper, (.)T and (.)H denote matrix transposition andhermitian transposition, respectively. Further, IN denotes the

N N identity matrix.III. MULTIUSER TRANSMISSION

Figure 1 shows the equivalent time discrete system model

used for precoding the vector d = [d1, . . . , dNR ]T of M-ary

QAM symbols taken from the odd-complex integer grid S={a+jb|a, b {1, 3, . . . , (M1)}}. Similar to decisionfeedback equalization the interference is canceled out by a

feedback (FB) filter after transforming it into a causal form by

a feedforward (FF) filter. To fulfill a fair comparison between

FB

FF

H

1d

NR

d

1

RN

1

d G

x 1

Fig. 1. Downlink transmission model for Tomlinson-Harashima Precoding(THP) with decentralized receivers.

different precoding schemes, a fixed transmit power constraint

ETx is incorporated by the scaling factor

=

ETxsHs

. (2)


2/5

Both the feedforward and the feedback filter can be obtained

by a LQ decomposition of the channel matrix H CNRNTinto a unitary matrix Q CNRNT and a lower triangularmatrix L CNRNR .

With the feedforward filter FF = QH and the scalingmatrix G = diag{L11,1, . . . , L1NRNR} the effective systemmatrix

L

= H FF G = LQ QH

G = LG (3)

is lower triangular having a unit main diagonal. With the

feedback matrix FB = L I the described structure per-forms linear pre-equalization which increases transmit power

significantly. The modulo device used in THP reduces each

element of the pre-equalized signal into the fundamental

Voronoi region V= {a + jb|a, b [ /2, /2)} of the M-ary QAM modulo lattice by adding integer multiples of =2

M to its I and Q component, respectively. To compensatefor that manipulation, the receiver hast to perform the same

modulo operation which transforms it into a nonlinear device.

Obviously, the received signal has to be multiplied with 1 to

ensure unit gain transmission. Nonlinear constellation shapingtechniques taking the equivalent decision regions into account

[5], [6], [7], [8] are not in the focus of this paper.

A. MMSE Channel Extension

In [4] the FF and FB filters for THP are derived based on

a minimum mean square error (MMSE) criterion by dropping

the Zero Forcing constraint (THP-ZF) to completely cancel

out all interference. The computation of the THP-MMSE

filters requires a matrix inverse for each row of the channel

matrix, which becomes quite complex for large NR. Thecomputational complexity is reduced in [9] by incorporating

the Cholesky factorization algorithm. However, in this paperwe apply the LQ decomposition algorithm to the channel

matrix extended by a scaled identity matrix of size [NRNR]

H = [H

INR ] CNRNT+NR (4)

to compute FF and FB, which delivers equivalent results to

those obtained in [4]. The scaling factor = NR2/ETxis determined by the variance of the noise and the transmit

energy.

B. Stream Ordering

The diagonal scaling matrix G adjusts the transmit power

such that the SNRs of all symbols of all users are equal andinversely proportional to

trace

GHG

=

NRi=1

1

L2i,i. (5)

The sum in (5) can be influenced by reordering the rows of H

during the LQ decomposition algorithm according to a certain

permutation P NNR1 (1 Pi NR, Pi = Pj for i =j) which results in a modified order of precoding [10]. To

optimize the SNR without exploiting the knowledge of the

transmitted symbols, the optimum permutation is given as

Popt = arg minP

NRi=1

1

L2i,i, (6)

where L is given by the LQ decomposition of the reordered

channel matrix

H = HP,1..NT . (7)

If we write (6) as the suboptimum criterion

Psubopt = argmax minP

{L21,1, L22,2, . . . , L2NR,NR} (8)

which maximizes the minimum entry in diag(L), the V-BLAST algorithm [11] can be used to find a quite satisfactory

permutation [10],[12]. Alternatively to the V-BLAST algo-

rithm requiring multiple calculations of the pseudo inverse of

the matrix H, the suboptimum heuristic sorted QR decompo-

sition (SQRD) algorithm in [13] can be modified to compute

a sorted LQ decomposition (SLQD) of H. In the remainder

of this paper, both schemes are referred to as THP-VBLASTand THP-SLQD, respectively.

C. Tree Search Tomlinson-Harashima Precoding

Although the data symbols are perfectly known to the trans-

mitter, the sorting algorithms described above do not exploit

this knowledge. In the following, an algorithm is proposed that

jointly performs the decomposition of the channel matrix and

the successive interference cancelation, and therefore directly

computes the transmit signal s. Since the unitary FF filter

does not increase the signal power the optimum data dependent

ordering Popt is given by

Popt = arg minP xH

GH

G x. (9)

To compute the optimum ordering efficiently, the proposed

algorithm performs a depth-first-tree-search to iteratively eval-

uate all possible permutations P. During the first dive theSQRD solution xSQRD is computed to start with an initial

candidate having a reasonably good metric

= xHSQRDGHSQRDGSQRDxSQRD (10)

according to the optimization criterion given by equation (9).

Therefore, the row with the minimum norm is selected in each

layer = 1, . . ,NR out of the set of the remaining rows of thedecomposed matrix.

The next permutation is evaluated by selecting the al-ternative row according to the SQRD permutation in layer

= NR 1 at the end of the SQRD-branch. Since thepermutation change occurred near the end of the tree structure,

the results of the first NR 2 layers of the previouslycomputed (SQRD) branch can be reused which reduces the

computational complexity compared to an evaluation of all

NR! permuted LQ decompositions significantly. Therefore, apartial copy of Q = GQ, the vector of the squared maindiagonal elements of L: norm = [L21,1, L

22,2, . . . , L

2NR,NR

],


3/5

Algorithm 1 Tree Search Tomlinson-Harashima Precoding

Input: H CNRn, d CNR1Output: preprocessed vector s CNT1

[0] := 0; Q[0]

:= H; := 1(k = 1,..,NR) P[0]k := k(k = 1,..,NR) norm[0]k := Hk,1..n (Hk,1..n)Hwhile (

1)

{5: if (just stepped down into new layer) {alt := 0 reset alternative counter

} else {if (alt < NR ) alt := alt +1else = 1 step one layer up

10: continue

}P[]1..NR = P

[1]1..NR

copy previous ordering

if (first dive) {idx := arg min

i=..NR

norm[1]i compute SLQD

15: } else {idx := alt + select next alternative

}swap P[] and P[]idxQ

[]

,1..n := Q[1]

idx,1..n / norm[1]idx

20: for (k := + 1, . . ,NR) {if (k = idx) k = kelse k =

j := P[]kLj, := Q

[1]

k,1..n (Q[]

,1..n)H compute projections

25: }j := P[]x := mod {dj Lj,1..1 x1..1}[] := [1] +

|x

|2/ norm

[1]idx

if ([] < or first dive) {30: if ( = NR) {

:= [] found new candidate

s :=NRk=1

(Q[k]

k,1..NT)H xk

:= 1 step one layer up} else {

35: for (k := + 1, . . ,NR) {if (k = idx) k := kelse k :=

j := Pkorthogonalize

40: Q[]

k,1..n := Q[1]

k,1..n

Lj,

Q

[1]

idx,1..n

update row norms

norm[]k := norm

[1]k |Lj,|2 norm[1]idx

} := + 1 step one layer down

45: }} else {

:= 1 step one layer up}

}

and the permutation vector Phave to be stored for each layer which is denoted by the superscript (..)[] in algorithm 1.

Since in each layer the th column of the FB matrix L isgenerated, the interference cancelation for the th data symbold can be performed. With the obtained sequence x

[]1.. the

current metric is given by

[]

=

i=1

x[]

i

2

G

[]

i,i

2

.(11)

Throughout this paper, (.)[] denotes the representation ofthe respective vector or matrix according to the th layer ofthe TS-THP algorithm. Further, Qk,a..b denotes the submatrix

built by the elements from column a to column b of the kth

row of matrix Q.

If [] exceeds the current optimum metric , the currentbranch is closed and the next alternative permutation of layer

is processed. Hence an initial candidate with a reasonablygood metric reduces the number of computed nodes. After

having evaluated all alternatives in a certain layer thealgorithm searches for available alternatives at the previous

layers 1,.., 1. Each time the last layer = NR is reachedand [NR] < a new candidate has been found.

If all alternatives in layer = 1 have been evaluated, thealgorithm exits with the current candidate being the optimum

precoded vector according to the optimization criterion in (9).

IV. SIMULATION RESULTS

The performance of the proposed Tree Search Tomlinson-

Harashima Precoding (THP-TS) algorithm as well as the

conventional THP and schemes with data independent ordering

criteria are evaluated numerically. The base station, equipped

with NT = 4 antennas, transmits 16-QAM symbols to theNR = 4 single antenna mobile stations. The channel is

independently fading from burst to burst and assumed to beperfectly known to the transmitter. Figures 2 and 3 show

the uncoded bit error rate for the THP-ZF and the THP-

MMSE approaches over the ratio of average received energy

per information bit EbRx = ETx/ log2(M) to the one-sidednoise power spectral density N0, respectively.

Both figures reveal that the unsorted THP is outperformed

by the suboptimum THP-SLQD scheme significantly. The

VBLAST approach shows further slight performance improve-

ments. For the THP-TS-DI scheme algorithm 1 has been

adapted to compute the optimum data-independent ordering

according to equation (6). Although the VBLAST ordering is

a suboptimum one it shows the same performance as the data

independent optimum ordering obtained by THP-TS-DI. Thismight change for an increasing number of mobile stations as

well as transmit-antennas at the base station.

The proposed THP-TS algorithm shows superior perfor-

mance especially for medium to high signal to noise ratios

(SNRs) since it takes the knowledge of the transmitted data

symbols into account. However, for low SNRs this increase in

performance reduces dramatically which motivates the inves-

tigation of a coded system as it is presented in the following

section.


4/5

0 5 10 15 20 25 30

104

103

102

101

100

EbRx

/N0

[dB]

uncoded

biterrorrate

THPZF

THPZFSLQD

THPZFVBLAST

THPZFTSDI

THPZFTS

Fig. 2. 4x4 MIMO transmission: comparison of different ordering criteriafor THP-ZF and 16-QAM

0 5 10 15 20 25 30

104

103

102

101

10

0

EbRx

/N0

[dB]

uncodedbiterrorrate

THPMMSE

THPMMSESLQD

THPMMSEVBLAST

THPMMSETSDI

THPMMSETS

Fig. 3. 4x4 MIMO transmission: comparison of different ordering criteriafor THP-MMSE and 16-QAM

In figure 4 the transmit power requirements of various

THP schemes before applying (2) are shown for different

values N = NT = NR. The results are normalized to thecorresponding THP-ZF scheme, all THP-MMSE schemes are

evaluated for EbRx/N0 = 10dB. The transmit power can bereduced significantly by applying ordered decomposition tech-

niques. The VBLAST performs only slightly better than the

SLQD approach whereas the proposed THP-TS scheme showssuperior behavior especially for large N. The MMSE schemesoutperform the corresponding ZF schemes by approximately

2dB.

The ratio of nodes computed by the search tree of theproposed algorithm over the number of nodes evaluated by a

decomposition of all N! permutations of the channel matrix His shown in figure 5. An upper bound is given by the reduction

of nodes due to the tree structure, further improvements can

be obtained by the exit criterion related to metric . While the

2 3 4 5 6 7 8

5

4

3

2

1

0

10log10(Ps/Ps,THPZF)[dB]

N

THPZF

THPZFVBLAST

THPZFSLQD

THPZFTS

THPMMSETS

THPMMSE

THPMMSESLQD/VBLAST

Fig. 4. Transmit power for NxN MIMO precoding

THP-TS schemes achieve no complexity reduction for a 2x2

MIMO system, the number of evaluated nodes can be reduced

significantly for increasing N as anticipated.

2 3 4 5 6 7 80.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

N

TS (upper bound)

THPZFTS

THPMMSETS

Fig. 5. Ratio of nodes computed by THP-TS over the number of nodesevaluated by a decomposition of all N! permutations ofH for NxN MIMOprecoding

A. Applying Channel Coding

In order to evaluate the performance of the proposed al-

gorithm in a coded system, we apply a systematic parallelconcatenated convolutional code with feedback polynomial

13oct and feedforward polynomial 15oct. Each of the NR = 4single antenna mobile stations receives one 16-QAM data

stream transmitted by the base station equipped with NT = 4antennas. The independently encoded blocks with a length of

4000 bits per user at a code rate of 3/4 are transmitted overindependent channel realizations. Hence the size of each users

random interleaver is 3000 bits. Each mobile station applies

an automatic gain control to ensure unit gain transmission


5/5

followed by the modulo device and a turbo decoder to decode

its intended data.

Due to the modulo device and its resulting cyclic equivalent

decision regions the likelihood values for the iterative decoder

are computed over an infinite sum [12]. The log likelihood

ratio (LLR) of bit 0 (LSB) of a 16-QAM symbol with Gray

mapping reads as

LLR(d(0)n ) = log

P(d(0)n = 1)

P(d(0)n = 1)

(12)

= log

+k=

exp(dR,n+k1)

2

22

+ exp

(dR,n+k3)

2

22

+k=

exp(dR,n+k+1)2

22

+ exp

(dR,n+k+3)2

22

= maxk=...

max

(dR,n + k 1)2

22,(dR,n + k 3)2

22

maxk=...

max

(dR,n + k + 1)2

22,(dR,n + k + 3)2

22

,

where = 2M denotes the shortest distance between twoequivalent points due to the modulo operation applied at the

receiver and dR,n denotes the real component of the nth

symbol at the detector. The computation of the LLRs for the

remaining bits of dn is straightforward. The max operation

is given by

max(a, b) = log ( exp(a) + exp(b)) (13)

= max(a,b) + log (1 + exp(|a b|)) .In a practical implementation it is reasonable to approximate

the infinite sum in (12) by a sum over the central Voronoi

region and its first two neighbours (

1

k

1).

0 2 4 6 8 10 12

104

103

102

101

100

EbRx

/N0

[dB]

codedbiterrorrate

THPZFSLQD

THPMMSESLQD

THPZFVBLAST

THPMMSEVBLAST

THPZFTS

THPMMSETS

ergodic capacity

THPMMSE

THPZF

Fig. 6. Bit error rate for a coded 4x4 MIMO system with rate 3/4 (PCCC)using 16-QAM modulation and 5 turbo decoder iterations

The simulation results are given in figure 6. The THP-TS

schemes outperform the corresponding SLQD and VBLAST

schemes by approximately 0.5 dB whereas a performance gain

of approximately 1.7 dB for the MMSE schemes over the

corresponding ZF approaches can be observed. The dashed

line shows the ergodic sum-capacity for a 4x4 MIMO system.

The THP-MMSE-TS scheme performs within 4 dB from

this capacity which seems to be acceptable due to the non-

cooperative receivers, the limited block lengths and inde-

pendently encoded data streams. A tradeoff between fairness

among the different users on the one side and rate and power

loading techniques on the other side would further increase

the performance of the proposed schemes.

V. CONCLUSIONS

A tree-structured algorithm which reduces the complexity of

computing an optimum data dependent matrix decomposition

for Tomlinson-Harashima Precoding has been derived in this

paper. The algorithm combines matrix decomposition and the

feedback structure of the precoder to optimize the matrix per-

mutation. The proposed scheme outperforms data independent

sorting approaches in terms of bit error performance.

REFERENCES[1] H. Harashima and H. Miyakawa, Matched-Transmission Technique for

Channels with Intersymbol Interference, IEEE Trans. Commun., vol.COM-20, pp. 774780, Aug. 1972.

[2] M. Tomlinson, New Automatic Equaliser Employing Modulo Arith-metic, IEE Electr. Lett., vol. 7, no. 5,6, pp. 138139, Mar. 1971.

[3] R. F. H. Fischer, C. Windpassinger, A. Lampe, and J. B. Huber, Space-Time Transmission using Tomlinson-Harashima Precoding, in Proc. Int.

ITG Conf. on Source and Channel Coding (SCC02), Berlin, Germany,Jan. 2002, pp. 139147.

[4] M. Joham, J. Bremer, and W. Utschick, MMSE Approaches toMultiuser Spatio-Temporal Tomlinson-Harashima Precoding, in Proc.

Int. ITG Conf. on Source and Channel Coding (SCC04), Erlangen,Germany, Jan. 2004, pp. 387394.

[5] C. B. Peel, B. M. Hochwald, and A. L. Swindlehurst, A Vector-Perturbation Technique for Near-Capacity Multiantenna Multiuser Com-munication, in Proc. 41st Allerton Conf. on Communication, Control,

and Computing 2003, Monticello, IL, USA, Oct. 2003.[6] R. F. H. Fischer and C. Windpassinger, Even-Integer Interference

Precoding for Broadcast Channels, in Proc. Int. ITG Conf. on Sourceand Channel Coding (SCC04), Erlangen, Germany, Jan. 2004, pp. 395402.

[7] S. Shi and M. Schubert, Precoding and Power Loading for Multi-Antenna Broadcast Channels, in Proc. 38th Conf. on InformationSciences and Systems (CISS04), Princeton, USA, Mar. 2004.

[8] R. Habendorf, R. Irmer, W. Rave, and G. Fettweis, Nonlinear MultiuserPrecoding for Non-Connected Decision Regions, in Proc. IEEE Int.Workshop on Signal Processing Advances in Wireless Communications(SPAWC05), New York City, USA, June 2005, pp. 535539.

[9] D. Schmidt, M. Joham, F. A. Dietrich, K. Kusume, and W. Utschick,Complexity Reduction for MMSE Multiuser Spatio-TemporalTomlinson-Harashima Precoding, in Proc. Int. ITG Workshop onSmart Antennas (WSA05), Duisburg, Germany, Apr. 2005.

[10] C. Windpassinger, T. Vencel, and R. F. H. Fischer, Precoding and

Loading for BLAST-like Systems, in Proc. IEEE Int. Conf. on Com-munications (ICC03), Anchorage, Alaska, USA, May 2003.

[11] G. Foschini, G. Golden, R. Valenzuela, and P. Wolniansky, SimplifiedProcessing for Wireless Communication at High Spectral Efficiency,

IEEE J. Select. Areas Commun., JSAC-17, pp. 18411852, Nov. 1999.[12] C. B. Peel, B. M. Hochwald, and A. L. Swindlehurst, A Vector-

Perturbation Technique for Near-Capacity Multiantenna MultiuserCommunication-Part II: Perturbation, IEEE Trans. Commun., vol. 53,no. 3, pp. 537544, Mar. 2005.

[13] D. Wubben, J. Rinas, R. Bohnke, V. Kuhn, and K. Kammeyer, EfficientAlgorithm for Detecting Layered Space-Time Codes, in Proc. Int. ITGConf. on Source and Channel Coding (SCC02), Berlin, Germany, Jan.2002, pp. 399405.

06_vtc_habendorf

Documents