06_vtc_habendorf

Upload: bhoopsharma

Post on 10-Apr-2018

234 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 06_VTC_habendorf

    1/5

    On Ordering Optimization for MIMO Systems with

    Decentralized Receivers

    Rene Habendorf and Gerhard Fettweis

    Vodafone Chair Mobile Communications Systems, Technische Universitat Dresden, Germany{habendorf,fettweis}@ifn.et.tu-dresden.dehttp://www.ifn.et.tu-dresden.de/MNS

    Abstract This paper addresses Tomlinson-Harashima Precod-ing (THP) for the downlink of multiuser systems, where thetransmitter is equipped with multiple antennas and each decen-tralized receiver has a single antenna. Although the performanceof THP strongly depends on the ordering of the precodedsymbols, current THP approaches apply data independent sortingalgorithms. The contribution of this paper is the extension of THPwith an algorithm exploiting the knowledge of the data symbolsto compute the optimum stream ordering for precoding.

    I. INTRODUCTION

    In systems with decentralized non-cooperative mobile de-

    vices space division multiple access (SDMA) algorithms have

    to be performed at the base station which enables a joint

    processing of all transmit and receive signals.

    Hence Multiuser Detection (MUD) is suitable for the uplink

    whereas for the downlink being the capacity bottleneck in var-

    ious multimedia applications, Multiuser Transmission (MUT)

    can be applied if channel state information is available at the

    transmitter. In time division duplex (TDD) systems this can

    be achieved due to the channel reciprocity and if the channel

    coherence time is large compared to one transmission interval.

    However, in frequency division duplex (FDD) schemes, the

    channel estimates obtained at the receiver have to be sent back

    to the transmitter via same backward channel.

    In this paper we focus on the nonlinear Tomlinson-

    Harashima Precoding (THP) which was initially proposed

    for the equalization of SISO channels in [1],[2] and has

    been applied to matrix channels in [3]. In [4], joint spatio-

    temporal THP was proposed. Nonlinear constellation shaping

    techniques taking the equivalent decision regions into account

    have been investigated in [5], [6], [7], and [8].

    The performance of THP strongly depends on the ordering

    of the precoded symbols which corresponds to a permutationof the rows of the channel matrix. In recent years, differ-

    ent data independent approaches for computing an improved

    ordering of the channel matrix have been investigated. Our

    proposed method exploits the data symbols which are perfectly

    known to the transmitter to compute a precoded signal with

    minimum transmit power enhancement. To obtain the optimum

    ordering, a tree search algorithm is applied.

    The performance of the proposed scheme is investigated

    numerically in both uncoded and coded scenarios.

    I I . SYSTEM MODEL

    We consider downlink transmission from a base station

    using NT transmit antennas to NR non-cooperative mobilestations (MS), each equipped with a single receive antenna.

    Throughout the paper we restrict ourselves to the case of

    NT NR. The independent zero mean complex Gaussianchannel taps hq,k from transmit antenna k to the receiveantenna of MS q with unit variance

    E{hq,k

    2

    }= 1 are ar-

    ranged in the flat block fading channel matrix H CNRNT .

    With s = [s1, . . . , sNT ]T and r = [r1, . . . , rNR ]

    T, denoting

    the vectors of transmit and receive symbols, respectively, the

    transmission equation for burst interval k reads as

    r(k) = H(k)s(k) + (k) . (1)

    At the receivers, the signal is distorted by the noise vector

    CNR1 of circularly symmetric complex Gaussian i.i.d.samples with covariance matrix E{H} = 2INR . Through-out this paper, (.)T and (.)H denote matrix transposition andhermitian transposition, respectively. Further, IN denotes the

    N N identity matrix.III. MULTIUSER TRANSMISSION

    Figure 1 shows the equivalent time discrete system model

    used for precoding the vector d = [d1, . . . , dNR ]T of M-ary

    QAM symbols taken from the odd-complex integer grid S={a+jb|a, b {1, 3, . . . , (M1)}}. Similar to decisionfeedback equalization the interference is canceled out by a

    feedback (FB) filter after transforming it into a causal form by

    a feedforward (FF) filter. To fulfill a fair comparison between

    FB

    FF

    H

    1d

    NR

    d

    1

    RN

    1

    d G

    x 1

    Fig. 1. Downlink transmission model for Tomlinson-Harashima Precoding(THP) with decentralized receivers.

    different precoding schemes, a fixed transmit power constraint

    ETx is incorporated by the scaling factor

    =

    ETxsHs

    . (2)

  • 8/8/2019 06_VTC_habendorf

    2/5

    Both the feedforward and the feedback filter can be obtained

    by a LQ decomposition of the channel matrix H CNRNTinto a unitary matrix Q CNRNT and a lower triangularmatrix L CNRNR .

    With the feedforward filter FF = QH and the scalingmatrix G = diag{L11,1, . . . , L1NRNR} the effective systemmatrix

    L

    = H FF G = LQ QH

    G = LG (3)

    is lower triangular having a unit main diagonal. With the

    feedback matrix FB = L I the described structure per-forms linear pre-equalization which increases transmit power

    significantly. The modulo device used in THP reduces each

    element of the pre-equalized signal into the fundamental

    Voronoi region V= {a + jb|a, b [ /2, /2)} of the M-ary QAM modulo lattice by adding integer multiples of =2

    M to its I and Q component, respectively. To compensatefor that manipulation, the receiver hast to perform the same

    modulo operation which transforms it into a nonlinear device.

    Obviously, the received signal has to be multiplied with 1 to

    ensure unit gain transmission. Nonlinear constellation shapingtechniques taking the equivalent decision regions into account

    [5], [6], [7], [8] are not in the focus of this paper.

    A. MMSE Channel Extension

    In [4] the FF and FB filters for THP are derived based on

    a minimum mean square error (MMSE) criterion by dropping

    the Zero Forcing constraint (THP-ZF) to completely cancel

    out all interference. The computation of the THP-MMSE

    filters requires a matrix inverse for each row of the channel

    matrix, which becomes quite complex for large NR. Thecomputational complexity is reduced in [9] by incorporating

    the Cholesky factorization algorithm. However, in this paperwe apply the LQ decomposition algorithm to the channel

    matrix extended by a scaled identity matrix of size [NRNR]

    H = [H

    INR ] CNRNT+NR (4)

    to compute FF and FB, which delivers equivalent results to

    those obtained in [4]. The scaling factor = NR2/ETxis determined by the variance of the noise and the transmit

    energy.

    B. Stream Ordering

    The diagonal scaling matrix G adjusts the transmit power

    such that the SNRs of all symbols of all users are equal andinversely proportional to

    trace

    GHG

    =

    NRi=1

    1

    L2i,i. (5)

    The sum in (5) can be influenced by reordering the rows of H

    during the LQ decomposition algorithm according to a certain

    permutation P NNR1 (1 Pi NR, Pi = Pj for i =j) which results in a modified order of precoding [10]. To

    optimize the SNR without exploiting the knowledge of the

    transmitted symbols, the optimum permutation is given as

    Popt = arg minP

    NRi=1

    1

    L2i,i, (6)

    where L is given by the LQ decomposition of the reordered

    channel matrix

    H = HP,1..NT . (7)

    If we write (6) as the suboptimum criterion

    Psubopt = argmax minP

    {L21,1, L22,2, . . . , L2NR,NR} (8)

    which maximizes the minimum entry in diag(L), the V-BLAST algorithm [11] can be used to find a quite satisfactory

    permutation [10],[12]. Alternatively to the V-BLAST algo-

    rithm requiring multiple calculations of the pseudo inverse of

    the matrix H, the suboptimum heuristic sorted QR decompo-

    sition (SQRD) algorithm in [13] can be modified to compute

    a sorted LQ decomposition (SLQD) of H. In the remainder

    of this paper, both schemes are referred to as THP-VBLASTand THP-SLQD, respectively.

    C. Tree Search Tomlinson-Harashima Precoding

    Although the data symbols are perfectly known to the trans-

    mitter, the sorting algorithms described above do not exploit

    this knowledge. In the following, an algorithm is proposed that

    jointly performs the decomposition of the channel matrix and

    the successive interference cancelation, and therefore directly

    computes the transmit signal s. Since the unitary FF filter

    does not increase the signal power the optimum data dependent

    ordering Popt is given by

    Popt = arg minP xH

    GH

    G x. (9)

    To compute the optimum ordering efficiently, the proposed

    algorithm performs a depth-first-tree-search to iteratively eval-

    uate all possible permutations P. During the first dive theSQRD solution xSQRD is computed to start with an initial

    candidate having a reasonably good metric

    = xHSQRDGHSQRDGSQRDxSQRD (10)

    according to the optimization criterion given by equation (9).

    Therefore, the row with the minimum norm is selected in each

    layer = 1, . . ,NR out of the set of the remaining rows of thedecomposed matrix.

    The next permutation is evaluated by selecting the al-ternative row according to the SQRD permutation in layer

    = NR 1 at the end of the SQRD-branch. Since thepermutation change occurred near the end of the tree structure,

    the results of the first NR 2 layers of the previouslycomputed (SQRD) branch can be reused which reduces the

    computational complexity compared to an evaluation of all

    NR! permuted LQ decompositions significantly. Therefore, apartial copy of Q = GQ, the vector of the squared maindiagonal elements of L: norm = [L21,1, L

    22,2, . . . , L

    2NR,NR

    ],

  • 8/8/2019 06_VTC_habendorf

    3/5

    Algorithm 1 Tree Search Tomlinson-Harashima Precoding

    Input: H CNRn, d CNR1Output: preprocessed vector s CNT1

    [0] := 0; Q[0]

    := H; := 1(k = 1,..,NR) P[0]k := k(k = 1,..,NR) norm[0]k := Hk,1..n (Hk,1..n)Hwhile (

    1)

    {5: if (just stepped down into new layer) {alt := 0 reset alternative counter

    } else {if (alt < NR ) alt := alt +1else = 1 step one layer up

    10: continue

    }P[]1..NR = P

    [1]1..NR

    copy previous ordering

    if (first dive) {idx := arg min

    i=..NR

    norm[1]i compute SLQD

    15: } else {idx := alt + select next alternative

    }swap P[] and P[]idxQ

    []

    ,1..n := Q[1]

    idx,1..n / norm[1]idx

    20: for (k := + 1, . . ,NR) {if (k = idx) k = kelse k =

    j := P[]kLj, := Q

    [1]

    k,1..n (Q[]

    ,1..n)H compute projections

    25: }j := P[]x := mod {dj Lj,1..1 x1..1}[] := [1] +

    |x

    |2/ norm

    [1]idx

    if ([] < or first dive) {30: if ( = NR) {

    := [] found new candidate

    s :=NRk=1

    (Q[k]

    k,1..NT)H xk

    := 1 step one layer up} else {

    35: for (k := + 1, . . ,NR) {if (k = idx) k := kelse k :=

    j := Pkorthogonalize

    40: Q[]

    k,1..n := Q[1]

    k,1..n

    Lj,

    Q

    [1]

    idx,1..n

    update row norms

    norm[]k := norm

    [1]k |Lj,|2 norm[1]idx

    } := + 1 step one layer down

    45: }} else {

    := 1 step one layer up}

    }

    and the permutation vector Phave to be stored for each layer which is denoted by the superscript (..)[] in algorithm 1.

    Since in each layer the th column of the FB matrix L isgenerated, the interference cancelation for the th data symbold can be performed. With the obtained sequence x

    []1.. the

    current metric is given by

    []

    =

    i=1

    x[]

    i

    2

    G

    []

    i,i

    2

    .(11)

    Throughout this paper, (.)[] denotes the representation ofthe respective vector or matrix according to the th layer ofthe TS-THP algorithm. Further, Qk,a..b denotes the submatrix

    built by the elements from column a to column b of the kth

    row of matrix Q.

    If [] exceeds the current optimum metric , the currentbranch is closed and the next alternative permutation of layer

    is processed. Hence an initial candidate with a reasonablygood metric reduces the number of computed nodes. After

    having evaluated all alternatives in a certain layer thealgorithm searches for available alternatives at the previous

    layers 1,.., 1. Each time the last layer = NR is reachedand [NR] < a new candidate has been found.

    If all alternatives in layer = 1 have been evaluated, thealgorithm exits with the current candidate being the optimum

    precoded vector according to the optimization criterion in (9).

    IV. SIMULATION RESULTS

    The performance of the proposed Tree Search Tomlinson-

    Harashima Precoding (THP-TS) algorithm as well as the

    conventional THP and schemes with data independent ordering

    criteria are evaluated numerically. The base station, equipped

    with NT = 4 antennas, transmits 16-QAM symbols to theNR = 4 single antenna mobile stations. The channel is

    independently fading from burst to burst and assumed to beperfectly known to the transmitter. Figures 2 and 3 show

    the uncoded bit error rate for the THP-ZF and the THP-

    MMSE approaches over the ratio of average received energy

    per information bit EbRx = ETx/ log2(M) to the one-sidednoise power spectral density N0, respectively.

    Both figures reveal that the unsorted THP is outperformed

    by the suboptimum THP-SLQD scheme significantly. The

    VBLAST approach shows further slight performance improve-

    ments. For the THP-TS-DI scheme algorithm 1 has been

    adapted to compute the optimum data-independent ordering

    according to equation (6). Although the VBLAST ordering is

    a suboptimum one it shows the same performance as the data

    independent optimum ordering obtained by THP-TS-DI. Thismight change for an increasing number of mobile stations as

    well as transmit-antennas at the base station.

    The proposed THP-TS algorithm shows superior perfor-

    mance especially for medium to high signal to noise ratios

    (SNRs) since it takes the knowledge of the transmitted data

    symbols into account. However, for low SNRs this increase in

    performance reduces dramatically which motivates the inves-

    tigation of a coded system as it is presented in the following

    section.

  • 8/8/2019 06_VTC_habendorf

    4/5

    0 5 10 15 20 25 30

    104

    103

    102

    101

    100

    EbRx

    /N0

    [dB]

    uncoded

    biterrorrate

    THPZF

    THPZFSLQD

    THPZFVBLAST

    THPZFTSDI

    THPZFTS

    Fig. 2. 4x4 MIMO transmission: comparison of different ordering criteriafor THP-ZF and 16-QAM

    0 5 10 15 20 25 30

    104

    103

    102

    101

    10

    0

    EbRx

    /N0

    [dB]

    uncodedbiterrorrate

    THPMMSE

    THPMMSESLQD

    THPMMSEVBLAST

    THPMMSETSDI

    THPMMSETS

    Fig. 3. 4x4 MIMO transmission: comparison of different ordering criteriafor THP-MMSE and 16-QAM

    In figure 4 the transmit power requirements of various

    THP schemes before applying (2) are shown for different

    values N = NT = NR. The results are normalized to thecorresponding THP-ZF scheme, all THP-MMSE schemes are

    evaluated for EbRx/N0 = 10dB. The transmit power can bereduced significantly by applying ordered decomposition tech-

    niques. The VBLAST performs only slightly better than the

    SLQD approach whereas the proposed THP-TS scheme showssuperior behavior especially for large N. The MMSE schemesoutperform the corresponding ZF schemes by approximately

    2dB.

    The ratio of nodes computed by the search tree of theproposed algorithm over the number of nodes evaluated by a

    decomposition of all N! permutations of the channel matrix His shown in figure 5. An upper bound is given by the reduction

    of nodes due to the tree structure, further improvements can

    be obtained by the exit criterion related to metric . While the

    2 3 4 5 6 7 8

    5

    4

    3

    2

    1

    0

    10log10(Ps/Ps,THPZF)[dB]

    N

    THPZF

    THPZFVBLAST

    THPZFSLQD

    THPZFTS

    THPMMSETS

    THPMMSE

    THPMMSESLQD/VBLAST

    Fig. 4. Transmit power for NxN MIMO precoding

    THP-TS schemes achieve no complexity reduction for a 2x2

    MIMO system, the number of evaluated nodes can be reduced

    significantly for increasing N as anticipated.

    2 3 4 5 6 7 80.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    N

    TS (upper bound)

    THPZFTS

    THPMMSETS

    Fig. 5. Ratio of nodes computed by THP-TS over the number of nodesevaluated by a decomposition of all N! permutations ofH for NxN MIMOprecoding

    A. Applying Channel Coding

    In order to evaluate the performance of the proposed al-

    gorithm in a coded system, we apply a systematic parallelconcatenated convolutional code with feedback polynomial

    13oct and feedforward polynomial 15oct. Each of the NR = 4single antenna mobile stations receives one 16-QAM data

    stream transmitted by the base station equipped with NT = 4antennas. The independently encoded blocks with a length of

    4000 bits per user at a code rate of 3/4 are transmitted overindependent channel realizations. Hence the size of each users

    random interleaver is 3000 bits. Each mobile station applies

    an automatic gain control to ensure unit gain transmission

  • 8/8/2019 06_VTC_habendorf

    5/5

    followed by the modulo device and a turbo decoder to decode

    its intended data.

    Due to the modulo device and its resulting cyclic equivalent

    decision regions the likelihood values for the iterative decoder

    are computed over an infinite sum [12]. The log likelihood

    ratio (LLR) of bit 0 (LSB) of a 16-QAM symbol with Gray

    mapping reads as

    LLR(d(0)n ) = log

    P(d(0)n = 1)

    P(d(0)n = 1)

    (12)

    = log

    +k=

    exp(dR,n+k1)

    2

    22

    + exp

    (dR,n+k3)

    2

    22

    +k=

    exp(dR,n+k+1)2

    22

    + exp

    (dR,n+k+3)2

    22

    = maxk=...

    max

    (dR,n + k 1)2

    22,(dR,n + k 3)2

    22

    maxk=...

    max

    (dR,n + k + 1)2

    22,(dR,n + k + 3)2

    22

    ,

    where = 2M denotes the shortest distance between twoequivalent points due to the modulo operation applied at the

    receiver and dR,n denotes the real component of the nth

    symbol at the detector. The computation of the LLRs for the

    remaining bits of dn is straightforward. The max operation

    is given by

    max(a, b) = log ( exp(a) + exp(b)) (13)

    = max(a,b) + log (1 + exp(|a b|)) .In a practical implementation it is reasonable to approximate

    the infinite sum in (12) by a sum over the central Voronoi

    region and its first two neighbours (

    1

    k

    1).

    0 2 4 6 8 10 12

    104

    103

    102

    101

    100

    EbRx

    /N0

    [dB]

    codedbiterrorrate

    THPZFSLQD

    THPMMSESLQD

    THPZFVBLAST

    THPMMSEVBLAST

    THPZFTS

    THPMMSETS

    ergodic capacity

    THPMMSE

    THPZF

    Fig. 6. Bit error rate for a coded 4x4 MIMO system with rate 3/4 (PCCC)using 16-QAM modulation and 5 turbo decoder iterations

    The simulation results are given in figure 6. The THP-TS

    schemes outperform the corresponding SLQD and VBLAST

    schemes by approximately 0.5 dB whereas a performance gain

    of approximately 1.7 dB for the MMSE schemes over the

    corresponding ZF approaches can be observed. The dashed

    line shows the ergodic sum-capacity for a 4x4 MIMO system.

    The THP-MMSE-TS scheme performs within 4 dB from

    this capacity which seems to be acceptable due to the non-

    cooperative receivers, the limited block lengths and inde-

    pendently encoded data streams. A tradeoff between fairness

    among the different users on the one side and rate and power

    loading techniques on the other side would further increase

    the performance of the proposed schemes.

    V. CONCLUSIONS

    A tree-structured algorithm which reduces the complexity of

    computing an optimum data dependent matrix decomposition

    for Tomlinson-Harashima Precoding has been derived in this

    paper. The algorithm combines matrix decomposition and the

    feedback structure of the precoder to optimize the matrix per-

    mutation. The proposed scheme outperforms data independent

    sorting approaches in terms of bit error performance.

    REFERENCES[1] H. Harashima and H. Miyakawa, Matched-Transmission Technique for

    Channels with Intersymbol Interference, IEEE Trans. Commun., vol.COM-20, pp. 774780, Aug. 1972.

    [2] M. Tomlinson, New Automatic Equaliser Employing Modulo Arith-metic, IEE Electr. Lett., vol. 7, no. 5,6, pp. 138139, Mar. 1971.

    [3] R. F. H. Fischer, C. Windpassinger, A. Lampe, and J. B. Huber, Space-Time Transmission using Tomlinson-Harashima Precoding, in Proc. Int.

    ITG Conf. on Source and Channel Coding (SCC02), Berlin, Germany,Jan. 2002, pp. 139147.

    [4] M. Joham, J. Bremer, and W. Utschick, MMSE Approaches toMultiuser Spatio-Temporal Tomlinson-Harashima Precoding, in Proc.

    Int. ITG Conf. on Source and Channel Coding (SCC04), Erlangen,Germany, Jan. 2004, pp. 387394.

    [5] C. B. Peel, B. M. Hochwald, and A. L. Swindlehurst, A Vector-Perturbation Technique for Near-Capacity Multiantenna Multiuser Com-munication, in Proc. 41st Allerton Conf. on Communication, Control,

    and Computing 2003, Monticello, IL, USA, Oct. 2003.[6] R. F. H. Fischer and C. Windpassinger, Even-Integer Interference

    Precoding for Broadcast Channels, in Proc. Int. ITG Conf. on Sourceand Channel Coding (SCC04), Erlangen, Germany, Jan. 2004, pp. 395402.

    [7] S. Shi and M. Schubert, Precoding and Power Loading for Multi-Antenna Broadcast Channels, in Proc. 38th Conf. on InformationSciences and Systems (CISS04), Princeton, USA, Mar. 2004.

    [8] R. Habendorf, R. Irmer, W. Rave, and G. Fettweis, Nonlinear MultiuserPrecoding for Non-Connected Decision Regions, in Proc. IEEE Int.Workshop on Signal Processing Advances in Wireless Communications(SPAWC05), New York City, USA, June 2005, pp. 535539.

    [9] D. Schmidt, M. Joham, F. A. Dietrich, K. Kusume, and W. Utschick,Complexity Reduction for MMSE Multiuser Spatio-TemporalTomlinson-Harashima Precoding, in Proc. Int. ITG Workshop onSmart Antennas (WSA05), Duisburg, Germany, Apr. 2005.

    [10] C. Windpassinger, T. Vencel, and R. F. H. Fischer, Precoding and

    Loading for BLAST-like Systems, in Proc. IEEE Int. Conf. on Com-munications (ICC03), Anchorage, Alaska, USA, May 2003.

    [11] G. Foschini, G. Golden, R. Valenzuela, and P. Wolniansky, SimplifiedProcessing for Wireless Communication at High Spectral Efficiency,

    IEEE J. Select. Areas Commun., JSAC-17, pp. 18411852, Nov. 1999.[12] C. B. Peel, B. M. Hochwald, and A. L. Swindlehurst, A Vector-

    Perturbation Technique for Near-Capacity Multiantenna MultiuserCommunication-Part II: Perturbation, IEEE Trans. Commun., vol. 53,no. 3, pp. 537544, Mar. 2005.

    [13] D. Wubben, J. Rinas, R. Bohnke, V. Kuhn, and K. Kammeyer, EfficientAlgorithm for Detecting Layered Space-Time Codes, in Proc. Int. ITGConf. on Source and Channel Coding (SCC02), Berlin, Germany, Jan.2002, pp. 399405.