06_vtc_habendorf
TRANSCRIPT
-
8/8/2019 06_VTC_habendorf
1/5
On Ordering Optimization for MIMO Systems with
Decentralized Receivers
Rene Habendorf and Gerhard Fettweis
Vodafone Chair Mobile Communications Systems, Technische Universitat Dresden, Germany{habendorf,fettweis}@ifn.et.tu-dresden.dehttp://www.ifn.et.tu-dresden.de/MNS
Abstract This paper addresses Tomlinson-Harashima Precod-ing (THP) for the downlink of multiuser systems, where thetransmitter is equipped with multiple antennas and each decen-tralized receiver has a single antenna. Although the performanceof THP strongly depends on the ordering of the precodedsymbols, current THP approaches apply data independent sortingalgorithms. The contribution of this paper is the extension of THPwith an algorithm exploiting the knowledge of the data symbolsto compute the optimum stream ordering for precoding.
I. INTRODUCTION
In systems with decentralized non-cooperative mobile de-
vices space division multiple access (SDMA) algorithms have
to be performed at the base station which enables a joint
processing of all transmit and receive signals.
Hence Multiuser Detection (MUD) is suitable for the uplink
whereas for the downlink being the capacity bottleneck in var-
ious multimedia applications, Multiuser Transmission (MUT)
can be applied if channel state information is available at the
transmitter. In time division duplex (TDD) systems this can
be achieved due to the channel reciprocity and if the channel
coherence time is large compared to one transmission interval.
However, in frequency division duplex (FDD) schemes, the
channel estimates obtained at the receiver have to be sent back
to the transmitter via same backward channel.
In this paper we focus on the nonlinear Tomlinson-
Harashima Precoding (THP) which was initially proposed
for the equalization of SISO channels in [1],[2] and has
been applied to matrix channels in [3]. In [4], joint spatio-
temporal THP was proposed. Nonlinear constellation shaping
techniques taking the equivalent decision regions into account
have been investigated in [5], [6], [7], and [8].
The performance of THP strongly depends on the ordering
of the precoded symbols which corresponds to a permutationof the rows of the channel matrix. In recent years, differ-
ent data independent approaches for computing an improved
ordering of the channel matrix have been investigated. Our
proposed method exploits the data symbols which are perfectly
known to the transmitter to compute a precoded signal with
minimum transmit power enhancement. To obtain the optimum
ordering, a tree search algorithm is applied.
The performance of the proposed scheme is investigated
numerically in both uncoded and coded scenarios.
I I . SYSTEM MODEL
We consider downlink transmission from a base station
using NT transmit antennas to NR non-cooperative mobilestations (MS), each equipped with a single receive antenna.
Throughout the paper we restrict ourselves to the case of
NT NR. The independent zero mean complex Gaussianchannel taps hq,k from transmit antenna k to the receiveantenna of MS q with unit variance
E{hq,k
2
}= 1 are ar-
ranged in the flat block fading channel matrix H CNRNT .
With s = [s1, . . . , sNT ]T and r = [r1, . . . , rNR ]
T, denoting
the vectors of transmit and receive symbols, respectively, the
transmission equation for burst interval k reads as
r(k) = H(k)s(k) + (k) . (1)
At the receivers, the signal is distorted by the noise vector
CNR1 of circularly symmetric complex Gaussian i.i.d.samples with covariance matrix E{H} = 2INR . Through-out this paper, (.)T and (.)H denote matrix transposition andhermitian transposition, respectively. Further, IN denotes the
N N identity matrix.III. MULTIUSER TRANSMISSION
Figure 1 shows the equivalent time discrete system model
used for precoding the vector d = [d1, . . . , dNR ]T of M-ary
QAM symbols taken from the odd-complex integer grid S={a+jb|a, b {1, 3, . . . , (M1)}}. Similar to decisionfeedback equalization the interference is canceled out by a
feedback (FB) filter after transforming it into a causal form by
a feedforward (FF) filter. To fulfill a fair comparison between
FB
FF
H
1d
NR
d
1
RN
1
d G
x 1
Fig. 1. Downlink transmission model for Tomlinson-Harashima Precoding(THP) with decentralized receivers.
different precoding schemes, a fixed transmit power constraint
ETx is incorporated by the scaling factor
=
ETxsHs
. (2)
-
8/8/2019 06_VTC_habendorf
2/5
Both the feedforward and the feedback filter can be obtained
by a LQ decomposition of the channel matrix H CNRNTinto a unitary matrix Q CNRNT and a lower triangularmatrix L CNRNR .
With the feedforward filter FF = QH and the scalingmatrix G = diag{L11,1, . . . , L1NRNR} the effective systemmatrix
L
= H FF G = LQ QH
G = LG (3)
is lower triangular having a unit main diagonal. With the
feedback matrix FB = L I the described structure per-forms linear pre-equalization which increases transmit power
significantly. The modulo device used in THP reduces each
element of the pre-equalized signal into the fundamental
Voronoi region V= {a + jb|a, b [ /2, /2)} of the M-ary QAM modulo lattice by adding integer multiples of =2
M to its I and Q component, respectively. To compensatefor that manipulation, the receiver hast to perform the same
modulo operation which transforms it into a nonlinear device.
Obviously, the received signal has to be multiplied with 1 to
ensure unit gain transmission. Nonlinear constellation shapingtechniques taking the equivalent decision regions into account
[5], [6], [7], [8] are not in the focus of this paper.
A. MMSE Channel Extension
In [4] the FF and FB filters for THP are derived based on
a minimum mean square error (MMSE) criterion by dropping
the Zero Forcing constraint (THP-ZF) to completely cancel
out all interference. The computation of the THP-MMSE
filters requires a matrix inverse for each row of the channel
matrix, which becomes quite complex for large NR. Thecomputational complexity is reduced in [9] by incorporating
the Cholesky factorization algorithm. However, in this paperwe apply the LQ decomposition algorithm to the channel
matrix extended by a scaled identity matrix of size [NRNR]
H = [H
INR ] CNRNT+NR (4)
to compute FF and FB, which delivers equivalent results to
those obtained in [4]. The scaling factor = NR2/ETxis determined by the variance of the noise and the transmit
energy.
B. Stream Ordering
The diagonal scaling matrix G adjusts the transmit power
such that the SNRs of all symbols of all users are equal andinversely proportional to
trace
GHG
=
NRi=1
1
L2i,i. (5)
The sum in (5) can be influenced by reordering the rows of H
during the LQ decomposition algorithm according to a certain
permutation P NNR1 (1 Pi NR, Pi = Pj for i =j) which results in a modified order of precoding [10]. To
optimize the SNR without exploiting the knowledge of the
transmitted symbols, the optimum permutation is given as
Popt = arg minP
NRi=1
1
L2i,i, (6)
where L is given by the LQ decomposition of the reordered
channel matrix
H = HP,1..NT . (7)
If we write (6) as the suboptimum criterion
Psubopt = argmax minP
{L21,1, L22,2, . . . , L2NR,NR} (8)
which maximizes the minimum entry in diag(L), the V-BLAST algorithm [11] can be used to find a quite satisfactory
permutation [10],[12]. Alternatively to the V-BLAST algo-
rithm requiring multiple calculations of the pseudo inverse of
the matrix H, the suboptimum heuristic sorted QR decompo-
sition (SQRD) algorithm in [13] can be modified to compute
a sorted LQ decomposition (SLQD) of H. In the remainder
of this paper, both schemes are referred to as THP-VBLASTand THP-SLQD, respectively.
C. Tree Search Tomlinson-Harashima Precoding
Although the data symbols are perfectly known to the trans-
mitter, the sorting algorithms described above do not exploit
this knowledge. In the following, an algorithm is proposed that
jointly performs the decomposition of the channel matrix and
the successive interference cancelation, and therefore directly
computes the transmit signal s. Since the unitary FF filter
does not increase the signal power the optimum data dependent
ordering Popt is given by
Popt = arg minP xH
GH
G x. (9)
To compute the optimum ordering efficiently, the proposed
algorithm performs a depth-first-tree-search to iteratively eval-
uate all possible permutations P. During the first dive theSQRD solution xSQRD is computed to start with an initial
candidate having a reasonably good metric
= xHSQRDGHSQRDGSQRDxSQRD (10)
according to the optimization criterion given by equation (9).
Therefore, the row with the minimum norm is selected in each
layer = 1, . . ,NR out of the set of the remaining rows of thedecomposed matrix.
The next permutation is evaluated by selecting the al-ternative row according to the SQRD permutation in layer
= NR 1 at the end of the SQRD-branch. Since thepermutation change occurred near the end of the tree structure,
the results of the first NR 2 layers of the previouslycomputed (SQRD) branch can be reused which reduces the
computational complexity compared to an evaluation of all
NR! permuted LQ decompositions significantly. Therefore, apartial copy of Q = GQ, the vector of the squared maindiagonal elements of L: norm = [L21,1, L
22,2, . . . , L
2NR,NR
],
-
8/8/2019 06_VTC_habendorf
3/5
Algorithm 1 Tree Search Tomlinson-Harashima Precoding
Input: H CNRn, d CNR1Output: preprocessed vector s CNT1
[0] := 0; Q[0]
:= H; := 1(k = 1,..,NR) P[0]k := k(k = 1,..,NR) norm[0]k := Hk,1..n (Hk,1..n)Hwhile (
1)
{5: if (just stepped down into new layer) {alt := 0 reset alternative counter
} else {if (alt < NR ) alt := alt +1else = 1 step one layer up
10: continue
}P[]1..NR = P
[1]1..NR
copy previous ordering
if (first dive) {idx := arg min
i=..NR
norm[1]i compute SLQD
15: } else {idx := alt + select next alternative
}swap P[] and P[]idxQ
[]
,1..n := Q[1]
idx,1..n / norm[1]idx
20: for (k := + 1, . . ,NR) {if (k = idx) k = kelse k =
j := P[]kLj, := Q
[1]
k,1..n (Q[]
,1..n)H compute projections
25: }j := P[]x := mod {dj Lj,1..1 x1..1}[] := [1] +
|x
|2/ norm
[1]idx
if ([] < or first dive) {30: if ( = NR) {
:= [] found new candidate
s :=NRk=1
(Q[k]
k,1..NT)H xk
:= 1 step one layer up} else {
35: for (k := + 1, . . ,NR) {if (k = idx) k := kelse k :=
j := Pkorthogonalize
40: Q[]
k,1..n := Q[1]
k,1..n
Lj,
Q
[1]
idx,1..n
update row norms
norm[]k := norm
[1]k |Lj,|2 norm[1]idx
} := + 1 step one layer down
45: }} else {
:= 1 step one layer up}
}
and the permutation vector Phave to be stored for each layer which is denoted by the superscript (..)[] in algorithm 1.
Since in each layer the th column of the FB matrix L isgenerated, the interference cancelation for the th data symbold can be performed. With the obtained sequence x
[]1.. the
current metric is given by
[]
=
i=1
x[]
i
2
G
[]
i,i
2
.(11)
Throughout this paper, (.)[] denotes the representation ofthe respective vector or matrix according to the th layer ofthe TS-THP algorithm. Further, Qk,a..b denotes the submatrix
built by the elements from column a to column b of the kth
row of matrix Q.
If [] exceeds the current optimum metric , the currentbranch is closed and the next alternative permutation of layer
is processed. Hence an initial candidate with a reasonablygood metric reduces the number of computed nodes. After
having evaluated all alternatives in a certain layer thealgorithm searches for available alternatives at the previous
layers 1,.., 1. Each time the last layer = NR is reachedand [NR] < a new candidate has been found.
If all alternatives in layer = 1 have been evaluated, thealgorithm exits with the current candidate being the optimum
precoded vector according to the optimization criterion in (9).
IV. SIMULATION RESULTS
The performance of the proposed Tree Search Tomlinson-
Harashima Precoding (THP-TS) algorithm as well as the
conventional THP and schemes with data independent ordering
criteria are evaluated numerically. The base station, equipped
with NT = 4 antennas, transmits 16-QAM symbols to theNR = 4 single antenna mobile stations. The channel is
independently fading from burst to burst and assumed to beperfectly known to the transmitter. Figures 2 and 3 show
the uncoded bit error rate for the THP-ZF and the THP-
MMSE approaches over the ratio of average received energy
per information bit EbRx = ETx/ log2(M) to the one-sidednoise power spectral density N0, respectively.
Both figures reveal that the unsorted THP is outperformed
by the suboptimum THP-SLQD scheme significantly. The
VBLAST approach shows further slight performance improve-
ments. For the THP-TS-DI scheme algorithm 1 has been
adapted to compute the optimum data-independent ordering
according to equation (6). Although the VBLAST ordering is
a suboptimum one it shows the same performance as the data
independent optimum ordering obtained by THP-TS-DI. Thismight change for an increasing number of mobile stations as
well as transmit-antennas at the base station.
The proposed THP-TS algorithm shows superior perfor-
mance especially for medium to high signal to noise ratios
(SNRs) since it takes the knowledge of the transmitted data
symbols into account. However, for low SNRs this increase in
performance reduces dramatically which motivates the inves-
tigation of a coded system as it is presented in the following
section.
-
8/8/2019 06_VTC_habendorf
4/5
0 5 10 15 20 25 30
104
103
102
101
100
EbRx
/N0
[dB]
uncoded
biterrorrate
THPZF
THPZFSLQD
THPZFVBLAST
THPZFTSDI
THPZFTS
Fig. 2. 4x4 MIMO transmission: comparison of different ordering criteriafor THP-ZF and 16-QAM
0 5 10 15 20 25 30
104
103
102
101
10
0
EbRx
/N0
[dB]
uncodedbiterrorrate
THPMMSE
THPMMSESLQD
THPMMSEVBLAST
THPMMSETSDI
THPMMSETS
Fig. 3. 4x4 MIMO transmission: comparison of different ordering criteriafor THP-MMSE and 16-QAM
In figure 4 the transmit power requirements of various
THP schemes before applying (2) are shown for different
values N = NT = NR. The results are normalized to thecorresponding THP-ZF scheme, all THP-MMSE schemes are
evaluated for EbRx/N0 = 10dB. The transmit power can bereduced significantly by applying ordered decomposition tech-
niques. The VBLAST performs only slightly better than the
SLQD approach whereas the proposed THP-TS scheme showssuperior behavior especially for large N. The MMSE schemesoutperform the corresponding ZF schemes by approximately
2dB.
The ratio of nodes computed by the search tree of theproposed algorithm over the number of nodes evaluated by a
decomposition of all N! permutations of the channel matrix His shown in figure 5. An upper bound is given by the reduction
of nodes due to the tree structure, further improvements can
be obtained by the exit criterion related to metric . While the
2 3 4 5 6 7 8
5
4
3
2
1
0
10log10(Ps/Ps,THPZF)[dB]
N
THPZF
THPZFVBLAST
THPZFSLQD
THPZFTS
THPMMSETS
THPMMSE
THPMMSESLQD/VBLAST
Fig. 4. Transmit power for NxN MIMO precoding
THP-TS schemes achieve no complexity reduction for a 2x2
MIMO system, the number of evaluated nodes can be reduced
significantly for increasing N as anticipated.
2 3 4 5 6 7 80.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
N
TS (upper bound)
THPZFTS
THPMMSETS
Fig. 5. Ratio of nodes computed by THP-TS over the number of nodesevaluated by a decomposition of all N! permutations ofH for NxN MIMOprecoding
A. Applying Channel Coding
In order to evaluate the performance of the proposed al-
gorithm in a coded system, we apply a systematic parallelconcatenated convolutional code with feedback polynomial
13oct and feedforward polynomial 15oct. Each of the NR = 4single antenna mobile stations receives one 16-QAM data
stream transmitted by the base station equipped with NT = 4antennas. The independently encoded blocks with a length of
4000 bits per user at a code rate of 3/4 are transmitted overindependent channel realizations. Hence the size of each users
random interleaver is 3000 bits. Each mobile station applies
an automatic gain control to ensure unit gain transmission
-
8/8/2019 06_VTC_habendorf
5/5
followed by the modulo device and a turbo decoder to decode
its intended data.
Due to the modulo device and its resulting cyclic equivalent
decision regions the likelihood values for the iterative decoder
are computed over an infinite sum [12]. The log likelihood
ratio (LLR) of bit 0 (LSB) of a 16-QAM symbol with Gray
mapping reads as
LLR(d(0)n ) = log
P(d(0)n = 1)
P(d(0)n = 1)
(12)
= log
+k=
exp(dR,n+k1)
2
22
+ exp
(dR,n+k3)
2
22
+k=
exp(dR,n+k+1)2
22
+ exp
(dR,n+k+3)2
22
= maxk=...
max
(dR,n + k 1)2
22,(dR,n + k 3)2
22
maxk=...
max
(dR,n + k + 1)2
22,(dR,n + k + 3)2
22
,
where = 2M denotes the shortest distance between twoequivalent points due to the modulo operation applied at the
receiver and dR,n denotes the real component of the nth
symbol at the detector. The computation of the LLRs for the
remaining bits of dn is straightforward. The max operation
is given by
max(a, b) = log ( exp(a) + exp(b)) (13)
= max(a,b) + log (1 + exp(|a b|)) .In a practical implementation it is reasonable to approximate
the infinite sum in (12) by a sum over the central Voronoi
region and its first two neighbours (
1
k
1).
0 2 4 6 8 10 12
104
103
102
101
100
EbRx
/N0
[dB]
codedbiterrorrate
THPZFSLQD
THPMMSESLQD
THPZFVBLAST
THPMMSEVBLAST
THPZFTS
THPMMSETS
ergodic capacity
THPMMSE
THPZF
Fig. 6. Bit error rate for a coded 4x4 MIMO system with rate 3/4 (PCCC)using 16-QAM modulation and 5 turbo decoder iterations
The simulation results are given in figure 6. The THP-TS
schemes outperform the corresponding SLQD and VBLAST
schemes by approximately 0.5 dB whereas a performance gain
of approximately 1.7 dB for the MMSE schemes over the
corresponding ZF approaches can be observed. The dashed
line shows the ergodic sum-capacity for a 4x4 MIMO system.
The THP-MMSE-TS scheme performs within 4 dB from
this capacity which seems to be acceptable due to the non-
cooperative receivers, the limited block lengths and inde-
pendently encoded data streams. A tradeoff between fairness
among the different users on the one side and rate and power
loading techniques on the other side would further increase
the performance of the proposed schemes.
V. CONCLUSIONS
A tree-structured algorithm which reduces the complexity of
computing an optimum data dependent matrix decomposition
for Tomlinson-Harashima Precoding has been derived in this
paper. The algorithm combines matrix decomposition and the
feedback structure of the precoder to optimize the matrix per-
mutation. The proposed scheme outperforms data independent
sorting approaches in terms of bit error performance.
REFERENCES[1] H. Harashima and H. Miyakawa, Matched-Transmission Technique for
Channels with Intersymbol Interference, IEEE Trans. Commun., vol.COM-20, pp. 774780, Aug. 1972.
[2] M. Tomlinson, New Automatic Equaliser Employing Modulo Arith-metic, IEE Electr. Lett., vol. 7, no. 5,6, pp. 138139, Mar. 1971.
[3] R. F. H. Fischer, C. Windpassinger, A. Lampe, and J. B. Huber, Space-Time Transmission using Tomlinson-Harashima Precoding, in Proc. Int.
ITG Conf. on Source and Channel Coding (SCC02), Berlin, Germany,Jan. 2002, pp. 139147.
[4] M. Joham, J. Bremer, and W. Utschick, MMSE Approaches toMultiuser Spatio-Temporal Tomlinson-Harashima Precoding, in Proc.
Int. ITG Conf. on Source and Channel Coding (SCC04), Erlangen,Germany, Jan. 2004, pp. 387394.
[5] C. B. Peel, B. M. Hochwald, and A. L. Swindlehurst, A Vector-Perturbation Technique for Near-Capacity Multiantenna Multiuser Com-munication, in Proc. 41st Allerton Conf. on Communication, Control,
and Computing 2003, Monticello, IL, USA, Oct. 2003.[6] R. F. H. Fischer and C. Windpassinger, Even-Integer Interference
Precoding for Broadcast Channels, in Proc. Int. ITG Conf. on Sourceand Channel Coding (SCC04), Erlangen, Germany, Jan. 2004, pp. 395402.
[7] S. Shi and M. Schubert, Precoding and Power Loading for Multi-Antenna Broadcast Channels, in Proc. 38th Conf. on InformationSciences and Systems (CISS04), Princeton, USA, Mar. 2004.
[8] R. Habendorf, R. Irmer, W. Rave, and G. Fettweis, Nonlinear MultiuserPrecoding for Non-Connected Decision Regions, in Proc. IEEE Int.Workshop on Signal Processing Advances in Wireless Communications(SPAWC05), New York City, USA, June 2005, pp. 535539.
[9] D. Schmidt, M. Joham, F. A. Dietrich, K. Kusume, and W. Utschick,Complexity Reduction for MMSE Multiuser Spatio-TemporalTomlinson-Harashima Precoding, in Proc. Int. ITG Workshop onSmart Antennas (WSA05), Duisburg, Germany, Apr. 2005.
[10] C. Windpassinger, T. Vencel, and R. F. H. Fischer, Precoding and
Loading for BLAST-like Systems, in Proc. IEEE Int. Conf. on Com-munications (ICC03), Anchorage, Alaska, USA, May 2003.
[11] G. Foschini, G. Golden, R. Valenzuela, and P. Wolniansky, SimplifiedProcessing for Wireless Communication at High Spectral Efficiency,
IEEE J. Select. Areas Commun., JSAC-17, pp. 18411852, Nov. 1999.[12] C. B. Peel, B. M. Hochwald, and A. L. Swindlehurst, A Vector-
Perturbation Technique for Near-Capacity Multiantenna MultiuserCommunication-Part II: Perturbation, IEEE Trans. Commun., vol. 53,no. 3, pp. 537544, Mar. 2005.
[13] D. Wubben, J. Rinas, R. Bohnke, V. Kuhn, and K. Kammeyer, EfficientAlgorithm for Detecting Layered Space-Time Codes, in Proc. Int. ITGConf. on Source and Channel Coding (SCC02), Berlin, Germany, Jan.2002, pp. 399405.