design4cstm_jrnl

7/29/2019 Design4CSTM_jrnl

1/25

1

Design of Coded Space-Time Modulation

Zhiyuan Wu and Xiaofeng Wang

Abstract

In this paper, we study the coded space-time modulation for multiple-input multiple-output wireless

channels. For simpler design and flexible rate-versus-performance tradeoff, conventional encoders are

used before a linear space-time modulator. A joint iterative receiver based on the turbo principle is

assumed that precludes the use of Tarokhs design criteria for space-time codes. Using the extrinsic

information transfer charts, design criteria that concern both the data rate and error performance are

developed. These criteria are much easier to apply than the well-known Tarokhs criteria. It is shown

that the use of outer encoders significantly simplifies the design of linear space-time coding/modulation.

Based on the new design criteria, an optimal space-time linear dispersion modulation scheme is pre-

sented. In addition, the tradeoff between constellation size and symbol rate for a given overall data rate

is discussed. Simulation results are provided to verify the new design criteria and to demonstrate the

merits of the proposed coded space-time modulation.

Index Terms

MIMO, space-time coding, EXIT chart, linear dispersion codes.

I. INTRODUCTION

Recently, there has been a major research thrust of developing multiple-input multiple-output

(MIMO) transmission schemes to exploit the increased capacity of multiple-antenna wireless

channels [1][2]. As a result, numerous MIMO transmission schemes (e.g., [3]-[13]) have been

developed. Among them, most existing designs mainly fall into two categories: performance-

oriented schemes by exploiting the spatial diversity, such as space-time trellis codes (STTCs)

[3], space-time block codes (STBCs) [4][5] and space-time turbo trellis codes (ST Turbo TCs)

[6]-[8], and rate-oriented schemes by capitalizing the MIMO fading channel capacity, such as

Bell-labs layered space-time (BLAST) architectures [9]-[11] and linear dispersion codes (LDCs)

October 26, 2006 DRAFT


2/25

2

[12][13]. However, many of these existing space-time (ST) codes suffer from the design difficulty,

performance loss and/or high decoding complexity.

With recent progress in MIMO transmission, it has been recognized that the idea of the

powerful Turbo codes, first proposed in [14], can also be applied in MIMO systems to achieve

near-capacity performance. Both parallel and serial concatenated schemes have been proposed.

In parallel concatenated systems [6]-[8], the information bit stream is passed though two or

more encoders with different permutation and then punctured and multiplexed at the transmit

antennas. In serial concatenated systems such as [15], some form of outer encoding is applied

before space-time coding/modulation. Such a serial concatenation is often preferable due to its

simpler design and greater flexibility in rate-versus-performance tradeoff. In fact, any space-time

transmission scheme can be considered as an outer encoder serially concatenated before an inner

space-time mapper or modulator that maps a number of input symbols onto a space-time matrix

before transmission. For example, a ST Turbo TC can be viewed as an outer turbo channel

encoder serially concatenated before an inner V-Blast ST modulator [10].

To emphasize the modulation flavor of the inner space-time process when applied after outer

encoding, we will call such a serial concatenated system as the coded space-time modulation

(CSTM). Often, conventional encoders such as convolutional codes, trellis-coded modulation

(TCM), and turbo codes designed for single-input single-output (SISO) channels can be used to

provide extra redundancy and to simplify the design of the inner space-time modulator. In order

to decouple the correlation between outer encoding and inner space-time modulation, interleaving

is often applied to the encoded bits or symbols. Such a concatenated coding system possesses

many advantages of both the conventional codes and the inner space-time modulation. On the

one hand, conventional outer codes can provide large coding gain and time diversity; on the

other hand, space-time coding/modulation provides guaranteed spatial diversity gain to combat

fading. Together, they enable a variety of design targets in performance, bandwidth efficiency,

complexity, and tradeoffs among them.

In a CSTM system, the use of interleaver makes it impractical to evaluate the coding gain and



3/25

3

diversity gain of the overall system based on Tarokhs criteria. Furthermore, Tarokhs criteria are

developed based on maximum-likelihood (ML) decoding which is too complex to be practical

for a concatenated system. At best, a joint iterative receiver based on the turbo principle can be

used [15]. In such a case, Tarokhs criteria no longer apply either to the overall concatenated

system or to the inner space-time modulation alone. Hence, it is important to study the design

of the inner space-time modulator when used in a concatenated system.

Although any existing space-time code can be a potential candidate for the inner space-time

modulation, a particular desirable choice is linear dispersion (LD) codes. This is because it

subsumes many existing block codes as its special cases, allows suboptimal linear receivers with

greatly reduced complexity, and provides flexible rate-versus-performance tradeoff [12]. Hence,

in this paper, using the idea of the extrinsic information transfer (EXIT) chart pioneered by S. ten

Brink [16][17], we consider the design of the inner LD space-time modulator when concatenated

with an outer code under the assumption of a joint iterative (turbo) receiver. Although the EXIT

chart technique developed for SISO systems has been used in the study of some specific MIMO

systems such as [18], it cannot be directly used for more general cases. Unlike in SISO systems,

the outputs of the inner ST modulator often have different statistics. By extending the existing

EXIT chart technique to MIMO transmission systems, it is shown that the inner space-time

modulator shall a) maximize the average mutual information between a bit and the received

signal and b) minimize the pair-wise error performance of the codeword pairs that differ at only

one symbol within a modulation block. Criterion a) is similar to those used in the existing high-

rate schemes while b) is unique for CSTM. These two criteria concern the channel capacity

and performance, respectively, and together reflect a joint optimization of both data rate and

error rate. It is worthy of noting that criterion b) is much simpler to apply than Tarokhs criteria

set since the later requires an optimization over all the possible codeword pairs. Based on the

proposed criteria, an optimal LD space-time modulator can be obtained. Several design examples

are provided to demonstrate the merits of the proposed CSTM scheme and to verify the design

criteria as well.



4/25

4

The rest of the paper is organized as follows. In section II, preliminaries of this research

including the system model and the joint iterative receiver are introduced. In section III, we

extend the existing EXIT chart to MIMO transmission systems. In section IV, we propose design

criteria for the inner ST modulation in a CSTM system, provide design examples, and discuss

the design of constellation versus symbol rate for a given data rate. In section V, simulation

results are presented. Finally, conclusions are drawn in section VI.

II. PRELIMINARIES

A. System Model

In this study, a block fading channel model is assumed where the channel keeps constant in

one modulation block but may change from block to block. That is, the channel is not necessarily

constant within a coding frame which often consists of a large number of modulation blocks.

Furthermore, the channel is assumed to be a Rayleigh flat fading channel with Nt transmit and

Nr receive antennas. Lets denote the complex gain from transmit antenna n to receiver antenna

m by hmn and collect them to form an Nr Nt channel matrix H = [hmn], known perfectly to

the receiver but unknown to the transmitter. The entries in H are assumed to be independently

identically distributed (i.i.d.) symmetrical complex Gaussian random variables with zero meanand unit variance.

The CSTM under investigation is a serial concatenation of an outer encoder and an inner

ST modulator as shown in Fig. 1(a), which subsumes many MIMO transmission schemes as its

special cases. In a CSTM system, the information bits are first encoded, shuffled by an interleaver

and then mapped into symbols. After that, the symbol stream is parsed into blocks of length L.

A symbol vector associated with one modulation block is denoted by x = [x1, x2, . . . , xL]T with

xi {m|m = 0, 1, . . . , 2Q

1, Q 1} (i.e., a complex constellation of size 2Q

, such as

2Q-QAM). The average symbol energy is assumed to be 1, i.e., 12Q

2Q1m=0

|m|2 = 1. Each block

of symbols will be mapped by the inner ST modulator to a dispersion matrix of size Nt T and

then transmitted over the Nt transmit antennas over T channel uses. The system model in Fig.

1(a) is often called bit-interleaved coded modulation (BICM) [19][20]. Another CSTM scheme



5/25

5

under consideration is to apply a constellation mapper right after the outer encoder and before

a symbol-level interleaver, which will be called symbol-interleaved coded modulation (SICM).

As mentioned before, we consider LD ST modulation for various reasons. An LD ST modulator

is defined by its L Nt T dispersion matrices Mi = [mi1,mi2, . . . ,miT] and the corresponding

output matrix of one modulation block is given by

X =L

i=1

Mixi (1)

With a constellation of size 2Q, the data rate of the inner space-time modulator is Rm = Q L/T

bits per channel use and the data rate of the overall concatenated system is R = RcRm bits per

channel use, where Rc 1 is the coding rate of the outer encoder.

Hence, one can adjust symbol rate L/T, constellation size Q, and coding rate Rc to meet

different requirements on data rate and performance. Since the inner ST modulation is linear,

suboptimal linear receivers can be used for demodulation [12]. It can also be observed that the

space-time mapping schemes used in the existing layered space-time architectures, e.g., [9][11],

are LD modulation. Hence the proposed CSTM with LD ST modulation subsumes existing

layered space-time schemes as special cases.

At the receiver, the received signals associated with one modulation block can be written as

Y =

P/NtHX+ Z =

P/NtHL

i=1

Mixi + Z (2)

where Y is a complex matrix of size Nr T whose (m, n)-th entry is the received signal at

receive antenna m and time instant n, Z is the additive white Gaussian noise (AWGN) matrix

with i.i.d. symmetrical complex Gaussian elements of zero mean and variance 2z , and P is the

average energy per channel use at each receive antenna. Let vec() be the operator that forms a

column vector by stacking the columns of a matrix and define y = vec(Y), z = vec(Z), and

mi = vec(Mi), then (2) can be rewritten as

y =

P/NtHGx+ z =

P/NtHx+ z (3)where H = IT H with as the Kronecker product operator and G = [m1,m2, . . . ,mL] will

be referred to as the modulation matrix. Since the average energy of the signal per channel use



6/25

6

at a receive antenna is assume to be P, we have tr(GGH) = NtT. Denoting hi = Hmi as the

i-th column vector ofH, the above equation can also be written asy = P/Nt

L

i=1 hixi + z (4)B. Joint Iterative Demodulation/Decoding Receiver

The computational complexity of an optimal ML-based receiver can be impractical for a

CSTM system due to the use of the interleaver between the inner ST modulator and the outer

encoder. At best, an iterative joint decoding and demodulation receiver, as illustrated in Fig.

1(b), can be employed. In the figure and the following discussion, variables with subscript 1

are associated with the inner ST demodulator and variables with subscript 2 are associated

with the outer decoder, and subscripts a (or A), e (or E) and d stand for a priori, extrinsic,

and a posteriori, respectively.

The joint demodulation and decoding is an iterative process. In each iteration, the extrinsic

information, which is the a posteriori information less the a priori information, is exchanged

between the two constituent components, the inner demodulator and the outer decoder. The

extrinsic information from one component is used as the a priori of the other component. After

a sufficient number of iterations, neither the inner ST demodulator nor the outer decoder can

benefit from the exchange of the extrinsic information any longer and this phenomenon is often

called convergence. Once convergence is reached, the outer decoder will perform hard decision

to generate the decoded information bits.

Normally, log-likelihood-ratios (LLRs) are used in information exchange. For BICM, the bit

LLRs can be calculated as Lb = ln[Pr(b = 1)/Pr(b = 0)]. For SICM, since the symbols are

taken from a constellation of size 2Q, each symbol has a (2Q 1)-tuple whose m-th element

is the logarithm of the ratio of the probability of the symbol taking value m over that of the

symbol taking value 0, i.e. Ls(x = m) = ln [Pr(x = m)/Pr(x = 0)] .

Below, we consider the extrinsic LLRs for the inner ST demodulator and the outer decoder.

1) The extrinsic LLR of the inner ST demodulator

Given the a priori bit LLRs Lba1 = [Lba1,1, L

ba1,2, . . . , L

ba1,LQ] from the outer decoder at the last

iteration and the corresponding channel observation y, using MAP criterion, the extrinsic LLR



7/25

7

of the j-th bit in symbol xi can be calculated as

Lbe1,(i1)Q+j

= ln

Pr(b(i1)Q+j = 1|y,Lba1)

Pr(b(i1)Q+j = 0|y,Lba1) Lbd1,(i1)Q+j

L

b

a1,(i1)Q+j (5)

Similarly, given the a priori symbol LLRs Lsa1 = [Lsa1,1, L

sa1,2, . . . , L

sa1,L] and the observation

y, one can compute the extrinsic LLR of symbol xi in the block using the MAP criterion as

Lse1,i(m)

= lnPr(xi = m|y,L

sa1)

Pr(xi = 0|y,Lsa1)

Lsd1,i(m)Lsa1,i(m)

= ln

xI

exp

y

PNtHIxIPNt him2

2z

+ k

Lsa1,k(xk)

xI

exp

y

PNtHIxIPNt hi02

2z

+ k

Lsa1,k(xk)

(6)

where

HI is the matrix by removing the i-th column from

H in (3), xI = [x1, . . . , xi1, xi+1, . . . , xL]T,

is the set of2Q(L

1)

column symbol vectors and is the set of indices ofxI, i.e. = {k|1 k L and k = i}.

Let Lsa1,i(m) =

jmLba1,j , where m is the set of indices of the bits in m equal to 1. Then

from (5) and (6), one can write

Lbe1,(i1)Q+j

= ln

+mbj=1

exp

Lse1,i(+m) + L

sa1,i(

+m)

mbj=0 exp Lse1,i(m) + Lsa1,i(m) Lba1,(i1)Q+j

(7)

where bj=1 is the set of constellation points whose j-th bit in m equal to 1 while bj=0 is

the set of constellation points whose j-th bit equal to 0. From (7), the MAP inner ST detector

for a BICM scheme can be implemented by adding a process following the symbol-level MAP



8/25

8

detector as defined in (6). This process calculates the extrinsic bit LLR using the output extrinsic

LLR of the corresponding symbol from the symbol-level detector and the a priori LLRs of the

bits in that symbol. This observation will be useful in the later development of this study. In

addition to the optimal MAP algorithm, other linear suboptimal methods are also available for

reduced complexity such as [21][22].

2) The extrinsic LLR of the outer decoder

Unlike the inner ST modulator, the computation of extrinsic LLRs is only based on the input

a priori LLRs from the inner ST demodulator in a CSTM system. The associated optimal MAP

algorithm has been developed by Bahl et al in [23] and will not be described in this paper.

Moreover, other suboptimal algorithms are also available such as [24][25].

III. EXIT CHART FOR MIMO TRANSMISSION

In this section, by means of the EXIT chart, we examine the convergence behavior of the

iterative demodulation/decoding procedure for a CSTM system as described in the last section.

In Fig. 2, a typical EXIT chart of the type of iterative receivers for a CSTM with two input

symbols per LD modulation block is given. The EXIT chart illustrates the trajectory of the

exchange of extrinsic information measured as the mutual information between the LLRs and

the bits in the corresponding symbols. For instance, the extrinsic information of the outer decoder

is IE2 = I(x; Le2). It is important to notice that the extrinsic LLR outputs of the symbols in

a modulation block may have different statistics, depending on the channel H, the dispersion

matrices {Mi}. For BICM, the statistics of extrinsic bit LLR outputs will also depend on the

constellation pattern. Hence, different symbols may have different extrinsic information transfer

functions and IE2 is a function of the two a priori inputs, i.e., IE2 = T(I

E1, I

E1).

The following observations are useful for the analysis of the MIMO transmission system.

Observation 1: If the interleaver is random and the constraint length is sufficiently large, the

outer decoder yields the same amount of extrinsic information for all the symbols (or bits for

bit-level decoder) and the exact amount depends on the average a priori information of the

inputs.

Remarks: The above observation tells that IE2 in the example shown in Fig. 2 is the same for



9/25

9

all the outputs and is a function ofIE1 = (I

E1 + I

E1)/2. This is because the following reasons.

If the constraint length is large, the output extrinsic LLRs will depend, of about equal degree,

on the a priori information of a large number of its neighbors. Furthermore, if the interleaver is

random, these neighbors are evenly distributed among the input groups. Hence, different symbols

tend to have the same amount ofextrinsic information. In addition, a specific neighbor (e.g., the

symbol immediately next to the symbol of concern at the decoder output) may be associated

with any of the input groups with equal probability. Consequently, the a priori information of

any neighbor is the average a priori information of all the input groups and, hence, in Fig. 2,

IE2 = T

(I

E1 + I

E1)/2.

To verify the above hypothesis, two BCJR/MAP decoders as depicted in Fig. 3 were set

up. For the decoder (a) in Fig. 3(a), two streams of independent a priori symbol LLRs with

mutual information I(x; La1) and I(x; La2), respectively, were sent to a BCJR/MAP decoder and

then two corresponding streams of extrinsic LLRs were calculated by the decoder. BCJR/MAP

decoder (b) in Fig. 3(b), only had one stream of a priori symbol LLRs with mutual information

I(x; La) = [I(x; La1) + I(x; La2)] /2 (8)

Since there is a one-to-one correspondence between the LLRs and the corresponding soft output

x of the transmitted symbols measured as

x =

1 + 2Q1m=1

exp(L(m))

1 2Q1m=0

m exp(L(m)) (9)

mutual information between the symbols and the corresponding LLRs is equal to that between

the symbols and the corresponding soft outputs x, i.e., I(x; Le) = I(x; x(Le)). Hence, instead

of using extrinsic LLRs directly, histograms of x were generated to reflect the statistics nature

of corresponding extrinsic LLRs for convenience. In the simulation, BPSK constellation and

a convolutional code with coding rate 1/2 and constraint length 5 were used. Following the

Gaussian consistency [17], the a priori LLRs were generated as symbols corrupted by zero-

mean AWGN with different variances. The simulation results are shown in Fig. 4. From the

figure, the histograms of the output extrinsic averages x of the two streams of input LLRs for

decoder (a) coincide with each other and they also coincide with the histogram for decoder (b).



10/25

10

In summary, the simulation results suggest that the extrinsic information output of the outer

BCJR/MAP decoder only depends on the average a priori information input.

Observation 2: (a) The convergence point is where the average EXIT curve of the inner ST

demodulator and the EXIT curve of the outer decoder meet. (b) The a posteriori LLRs associated

with different symbols in a modulation block sent to the final decision device may have different

statistics determined by their own EXIT curves.

Remarks: This observation is just a direct result from Observation 1. For instance, at conver-

gence point in Fig. 2, let us denote IcE2 as the value of the extrinsic information of the outer

decoder, Ic

E1 and Ic

E1 as the values of extrinsic information corresponding to the two symbols

of the inner demodulator, we have the following relationship

IcE2 = TE2((Ic

E1 + Ic

E1)/2)

Ic

E1 = T

E1(IcE2)

Ic

E1 = T

E1(IcE2) (10)

where I = T(x) indicates that I is a function ofx. Hence, the a posteriori LLRs of the symbols

associated with the dashed curve and dash-dot curve at convergence are

L

d2 = Lc

e2 + Lc

e1

L

d2 = Lc

e2 + Lc

e1 (11)

where Lc

e1 and Lc

e1 correspond to Ic

E1 and Ic

E1 respectively, and both Lc

e2 and Lc

e2 correspond to

IcE2. In (11), Lc

e2 and Lc

e2 have the same statistics as predicted by Observation 1, but Lc

e1 and

Lc

e1 may have different statistics.

Although the above observations were described for a symbol-level decoder such as a decoder

for TCM, they also apply to bit-level decoders if all the quantities associated with symbols in

the above observations are replaced by the corresponding quantities associated with bits. To this

end, we are ready to consider the design of inner ST modulation. As illustrated in Fig. 2, the

curve of a powerful conventional outer code often presents two flat plateaus at the two ends and

a sharp cliff in the middle [17]. On the contrary, the EXIT curves of the inner ST demodulator

are close to a straight line due to its shorter encoding block length which is L.



11/25

11

By Observation 1, to let the trajectory of the iterative receiver snake through the bottleneck in

the middle and thus reach the second plateau region of the IE2 curve, one can seek to maximize

1LQ

E(L

i=1I(xi; L

se1,i)) for SICM and

1LQ

E(L

i=1Q

j=1I(b(i1)Q+j ; Lbe1,(i1)Q+j)) for BICM, where the

expectation E() is taken over the channel H. As described in section II-B, the extrinsic LLR of a

bit can be calculated using the extrinsic LLR of its associated symbol and the a priori LLRs of the

other bits in the symbol. For a given constellation and a priori information, maximizing the sum

extrinsic information of the bits in symbol x, i.e.Q

j=1I(bj ; L

be1,j ), can be well-approximated by

maximizing the extrinsic information of the symbol I(x; Lse1). In summary, we seek to maximize

1LQ E(

Li=1

I(xi; Lse1,i)) for both BICM and SICM.

By Observation 2, one should minimize the outage probability of the a posteriori information

(i.e. I(x; Ld2)) for each symbol (or bits for BICM). Let IE1(a) denote the extrinsic information

in the inner ST modulation block when the input IA1 = a in the following discussion. From (11),

when IcE2 is large, i.e. the convergence point is located at the second plateau of the outer decoder,

IcE2 will change little regardless of the a priori information IE1. Hence, minimizing the outage

probability of the a posteriori information I(x; Ld2) can be approximated by minimizing the

outage probability of the extrinsic information IE1 at convergence, or equivalently, maximizing

Pr(IE1(IcE2) ) for a certain value . Noting that when the trajectory of the iterative receiver

reaches the second plateau, IcE2 Q, we seek to maximize Pr(IE1(Q) ) for any symbol.

Again, for a given constellation, minimizing the outage probability of extrinsic information on

bits for BICM can be approximated by minimizing the extrinsic information on symbols.

In summary, we have two optimization problems concerning the design of the inner ST

modulator:

maximizing the average extrinsic information per bit for any given a priori information,

i.e. maximizing IE1(IA1) =1

LQ EL

i=1 I(xi; Lse1,i); minimizing the outage probability of the extrinsic information of a symbol in the modulation

block at perfect a priori information, i.e. maximizing Pr(IE1(Q) ).



12/25

12

IV. DESIGN OF INNER ST MODULATION

In this section, we consider the design of the inner ST modulator by solving the two opti-

mization problems as described in the last section.

A. Maximizing the Average Extrinsic Information Per Bit

To make the optimization problems tractable and independent of constellation, we assume

i.i.d. Gaussian inputs. The results can be used as design guidelines for practical input symbols

drawn from finite alphabets. In general, maximizing the average extrinsic information for any

given a priori information is difficult due to the unknown statistics of the a priori symbol LLRs.

However, noting that the EXIT curves of the inner ST demodulator are monotonic and close

to a straight line, we seek to maximize the average extrinsic information at the starting point,

i.e. IE1(0). This will ensure that the trajectory of the iterative receiver pass through the narrow

tunnel in the middle to reach the second plateau region of the EXIT chart.

Under the assumption of i.i.d. Gaussian inputs, the extrinsic information of the input symbol

xi when there is no a priori information is

I(xi;y|H) = log(1 + P/Nt hHi

R1i hi) (12)

where Ri is the autocorrelation matrix of the interference and AWGN given byRi = P/Nt HIHHI + 2zI (13)

where HI is defined in (6). Apparently, the mutual information given in (12) is a function ofthe channel H. Hence, we seek to find the modulation matrix G to maximize

IE1(0) =1

LQE

L

i=1

I(xi;y)

(14)

where the expectation is taken with respect to channel H. When the number of bits in a

modulation block (i.e. LQ) is fixed, it is equivalent to maximizing EL

i=1 I(xi;y), which is

the ergodic sum capacity of channel H when the L data streams are independently demodulated.This capacity will be called uncooperated sum capacity which is always smaller than or equal

to the conventional cooperated sum capacity. Concerning the uncooperated sum capacity, we

have the following theorem.



13/25

13

Theorem 1: The ergodic uncooperated sum capacity of channel H is achieved if and only ifthe modulation matrix G satisfies

GGH = INtT (15)

Proof: See Appendix I.

Noting that matrix G is of size NtT L, equation (15) holds only if L NtT. For simpler

complexity of demodulation, we will only consider L = NtT in the sequel. In this case, (15)

implies that

tr(MHmMn) =

1, m = n0, m = n (16)Note, it can be shown that the above matrix G satisfying (15) also maximize the average

mutual information for any given a priori information in SICM or BICM. It is also interesting

to note that the modulation matrix G satisfying (15) also achieves the cooperated sum capacity

given by [1][2]

C = E

log(det(INr +

P

Nt2zHHH))

(17)

B. Minimizing the Outage Probability of the Extrinsic Information

Theorem 2: The probability Pr(IE1(Q) ) is maximized only when MiMHi is full rank

with identical nonzero eigenvalues for all i.

Proof: See Appendix II.

Interestingly, the set of dispersion matrices satisfying the conditions in Theorem 2 also

optimize the pairwise error performance. With perfect feedback (IA1 Q) for the symbols

other than the symbol of concern, say xi, the detection is based on the following observation

obtained by perfectly cancelling interference from other symbols within the same block, i.e.,

yi =

y P/Nt l=i hlxl = P/Nthixi + z (18)

Since hi = vec(HMi), then the modified Euclidean distance between a pair of transmitted

symbols differing at position i in a modulation block is

d2(xi, xi|H) = |xi xi|2

Nrm=1

hHmMiMHi hm (19)



14/25

14

where hHm is the m-th row vector ofH. Following the similar procedure by Tarokh et al in [3],

it can be readily found that, to minimize the error probability of xi, one needs to maximize

the rank ofMi and the product of the nonzero eigenvalues ofMiMHi just like the Rank and

Determinant criteria in [3]. When tr(MiMHi ) is fixed, the maximum coding gain is achieved

when all the eigenvalues ofMiMHi are equal.

In summary, we have the following two criteria for the design of the LD ST modulator.

Capacity Criterion: The symbol rate of the LD ST modulator must be Nt symbols per channel

use. Furthermore, the dispersion matrices shall be chosen such that their F-norms to be 1 and

the trace of the Hermitian product of any pair of distinct dispersion matrices to be 0.

Error-Performance Criterion: For the best error performance, the dispersion matrices MiMHi

for any i must be full rank with identical eigenvalues.

If the modulation block length T is Nt or greater, full rank can be easily guaranteed; while

ifT is less than Nt, full rank is impossible. Error-Performance Criterion also suggests that the

minimum modulation block length shall be Nt.

C. Design Examples

To demonstrate our design criteria, three inner ST modulation design examples are provided be-

low. In all the three schemes, T = Nt and L = N2t . Lets denote P =

01(Nt1) 1INt1 0(Nt1)1

,F = [fmn] as the DFT matrix of size Nt Nt with (m, n)-th entry fmn =

1Nt

exp(2j(m

1)(n 1)/Nt) and S =

1 01(Nt1)0(Nt1)1 0(Nt1)(Nt1)

. The dispersion matrices of the threeschemes are listed below.

Scheme 1: This is an optimal design. The associated dispersion matrices are

M(k1)Nt+i = diag[fk]P(i1) (20)

for k = 1, 2, . . . , N t and i = 1, 2, . . . , N t, where fk denotes the k-th column vector ofF.

Scheme 2: The dispersion matrices of the full-diversity and full-rate scheme [13] are

M(k1)Nt+i = k1diag[fk]i1P(i1) (21)



15/25

15

for k = 1, 2, . . . , N t and i = 1, 2, . . . , N t, where and are constellation dependent and are

chosen to guarantee full diversity [13].

Scheme 3: The dispersion matrices of the threaded BLAST scheme [11] are

M(k1)Nt+i = P(k1)+(i1)S P(i1) (22)

for k = 1, 2, . . . , N t and i = 1, 2, . . . , N t.

It can be readily checked that although all the three schemes satisfy Capacity Criterion

and preserve the original MIMO channel capacity, only the first two schemes satisfy Error-

Performance Criterion. Without an outer encoder, Scheme 2 achieves full diversity gain with

appropriate and and hence outperforms Scheme 1. However, the presence of an outer coder,

the two schemes shall perform closely.

D. Trade-Off Between Constellation Size and Modulation Symbol Rate

For a given inner ST modulation rate in bit Rm = LQ/T, there exists a trade-off between

constellation size Q and symbol rate L/T. From Theorem 1, IE1(0) is maximized only when

L/T Nt. In fact, it can also be shown that IE1(0) is monotonic with respect to L/T. Hence,

for a given Rm, the minimal integer satisfying Q Rm/Nt shall be selected. If Rm < NtQ

with the chosen Q, a subset of dispersion matrices shall be selected from the optimal design.

To verify this design method, simulation was set up over a system with Nt = Nr = T = 2

under the Rayleigh ergodic flat fading channel. In this simulation, we compared the uncooperated

sum capacity of Scheme 1 which is an optimal design, Alamouti scheme [5] and an orthogonal

scheme by selecting a pair of orthogonal dispersion matrices from Scheme 1. The corresponding

numerical results are presented in Fig. 5 when the target modulation rate Rm is 3 and 4 bits

per channel use, respectively. When Rm = 3, the optimal design uses QPSK modulation and

chooses 3 out of the 4 possible dispersion matrices in (20). As can be seen from these figures, the

performance difference between the optimal and the other two schemes grows more significantly

as Rm increases.

In summary, for a given inner ST modulation rate, one seeks to choose a constellation size

as small as possible till the modulation rate in symbol reaches Nt.



16/25

16

V. SIMULATION RESULTS AND DISCUSSIONS

The EXIT characteristics of the three design examples as well as their error performance

will be compared in this section to demonstrate our design criteria. Instead of prohibitive MAP

detection as (6) and (5), the soft-interference-cancellation minimum mean square error (SIC-

MMSE) detection algorithm [22] was used as the inner demodulator in the simulation.

In Fig. 6, the EXIT characteristics of the three schemes were compared under several fixed

channel conditions. In the figure, Eb/N0 = 0dB (Eb is the transmitted power per bit here),

Nt = Nr = 2, and QPSK constellation were applied and the channel coefficient matrix was

assumed to have the form H =

1 cos

0 sin

. Results were obtained for various values of

magnitude and angle . In the EXIT charts, the magnitude determines the ending point of the

transfer function of the inner ST demodulator, while the angle affects the starting point when

the magnitude is fixed. As can be observed, all the schemes have similar starting points under

the same channel condition. The small variation in the starting points among different schemes

is due to the use of suboptimal demodulation algorithm (i.e. SIC-MMSE). This is expected

since they all satisfy Capacity Criterion. However, since Scheme 3 does not comply with Error-

Performance Criterion, under most channel conditions, it has smaller extrinsic information than

Scheme 2 and 3, i.e., its outage probability of the extrinsic information is larger. It can also be

seen from the figure, the first two schemes significantly outperform the third scheme particularly

when the gain of the second transmit antenna is significantly smaller than that of the first transmit

antenna.

We now consider fading channels. In Fig. 7(a) and 7(b), cumulative distribution functions

(CDFs) of extrinsic information at two extreme cases, i.e. IE1(0) and IE1(Q), under Rayleigh

flat fading channels are given, respectively. In the figure, a channel is said to be a slow fading

channel if it keeps constant in a coding frame but changes from frame to frame independently,

while a channel is a fast channel if it keeps constant in a modulation block but varies from block

to block independently. In the simulation, Eb/N0 = 3dB, Nt = Nr = 2, QPSK constellation

were applied. It can be seen from the simulation results that all of the three schemes have similar

statistics for IE1(0), but significantly different statistics for IE1(Q). The simulation results for



17/25

17

IE1(Q) in Fig. 7(b) demonstrate that the first two schemes outperforms Scheme 3 significantly

under both slow and fast fading channels. Although Scheme 2 outperforms Scheme 1 in an

uncoded system, their performance curves in a coded system are almost indistinguishable. For

both IE1(0) and IE1(Q), the performance in a fast fading channel is significantly better than in

a slow fading channel. This is expected because only when the channel varies significantly in

a coding frame, the ergodic capacity is possible and the temporal diversity is achieved by the

outer encoder.

Finally, in Fig. 8, the frame error rate (FER) of the three schemes under different Rayleigh

ergodic flat fading channels are compared. In the simulation, Nt = Nr = T = 2, QPSK

constellation, 200 symbols per coding frame were assumed. A convolutional code with coding

rate 1/2 and constraint length 4 is used as the outer encoder. Its generator polynomials are

H1(D) = 04, H2(D) = 13. In consistence with the EXIT chart analysis, the first two schemes

perform indistinguishable to each other but outperform the third scheme significantly after

sufficient iterations. It is also clear that the three schemes perform closely after the first iteration

since they all comply with Capacity Criterion. Again, the performance of all these schemes in

a fast fading channel is significantly better than that in a slow fading channel.

VI. CONCLUSION

A coded space-time modulation scheme with conventional outer encoder for MIMO wireless

communications has been investigated. Using the EXIT chart technique, the design of the inner

space-time modulator has been studied under the assumption of a joint iterative receiver. Two

design criteria are derived that relate to the channel capacity and error performance, respectively.

To guarantee the convergence of the iterative receiver, the uncooperated sum capacity of the inner

ST modulations must be maximized. Once convergence is achieved, the error performance is

optimized by maximizing the rank and determinant of the dispersion matrix of each individual

symbol. The latter Error-Performance Criterion is much easier to apply than the well-known

Tarokhs Rank and Determinant Criteria in [3]. The proposed two criteria together allow a

complete design of the system concerning both the data rate and error performance. Specifically,

it is shown that for a given inner ST modulation rate in bit, constellation size shall be minimized



18/25

18

till the maximum symbol rate, i.e. Nt, is reached. The proposed design criteria have been verified

by design examples and simulation results.

APPENDIX I

PROOF OF THEOREM 1

Under the assumption ofi.i.d. Gaussian input symbols {xi}, we have RG E(GxxHGH) =

GGH. Since RG is nonnegative definite, it can be decomposed as RG = QQH, where Q is

unitary matrix and = diag[i], i = 1, 2,...,NtT. Since RG = QQH will have the same

uncooperated sum capacity as , we need only consider

RG = GGH = . (23)

For the energy constraint, we haveNtTi=1

i = NtT.

By substituting (13) into (12), we obtain

I(xi;y|H) = log

1 + P/Nt hHi

P/Nt HIHHI + 2zI1 hi (24)

where HI is defined in (6). Further, substituting (4) and (23) into (24) and using the inverse ofa small-rank adjustment of a matrix in [30], we can further write

I(xi;y|H) = log 1 P/Nt ihHi R1hi (25)where hi is the i

th column vector ofH in (3) and

R = P/Nt NtTj=1

jhjhH

j + 2zI = P/Nt HH

H+ 2zI (26)

Hence, the uncooperated sum capacity Isum can be expressed as

Isum = max

Elog

NtTi=1

1 P/Nt ih

Hi R

1hi

(27)

Noting that 1P/NtihHi R1hi is the ith diagonal entry of matrix IP/Nt1/2HHR1H1/2,

we seek to minimize

J() = ElogI P/Nt

1/2HHR1H1/2

(28)



19/25

19

where (X) denotes the product of the diagonal entries of matrix X. Substituting (26) into (28)

and using the inverse of a small-rank adjustment in [30], we have

J() = ElogI+P

Nt2z1/2H

HH1/2

1

(29)It can be easily proven that (i j)

Ji

Jj

0, 1 i, j NtT. This, with the fact

that J() is symmetric with respect to i, 1 i, j NtT, shows that J() is Schur-convex

[27][28]. Hence, must be of the form INtT. By the energy constraint tr(GGH) = NtT, this

is possible only if (15) is satisfied.

APPENDIX II

PROOF OF THEOREM 2

If the a priori information is perfect, we can obtain from (6)

IE1(Q) = log

1 +

P

Nt2zhHi hi

(30)

From (30), maximizing Pr(IE1(Q) ) is equivalent to maximizing Pr(hi2 ). Noting that

hi = Hmi, we have

hi2 =

Nr

m=1hHmMiM

Hi hm (31)

where hm is the m-th column vector ofHH. The following decomposition is assumed

MiMHi = UU

H (32)

where = diag[n], n = 1, 2,...,Nt. By Lemma 5 in [1], sinceUHhm has the same distribution

as hm, we need only consider MiMHi = . Then we have

hi2 =

Ntn=1

nNr

m=1

|hmn|2 (33)

For the power constraint, we have Nt

n=1n = 1.

By [29], since the random variableNr

m=1|hmn|

2 in (33) is identically chi-square distributed

with 2Nr degrees of freedom, the outage probability Pr(hi2 ) is maximized if all the

eigenvalues ofMiMHi are equal. With tr(MiM

Hi ) = 1, it implies that i = 1/Nt, 1 i Nt.



20/25

20

REFERENCES

[1] I. E. Telatar, Capacity of multi-antenna Gaussian channels, Eur. Trans. Telecom., vol 10, pp. 585-595, Nov. 1999.

[2] G. J. Foschini, M. J. Gans, On limits of wireless communications in a fading environment when using multiple antennas,

Wireless Personal Communications, vol. 6, no. 3, pp. 311-335, 1998.

[3] V. Tarokh, N. Seshadri, and A. Calderbank, Space-time codes for high data rate wireless communications: Performance

criterion and code construction, IEEE Trans. Inform. Theory, vol. 44, pp. 744-765, Mar. 1998.

[4] V. Tarokh, H. Jafarkhani, and A. R. Calderbank, Space-time block code from orthogonal designs, IEEE Trans. Inform.

Theory, vol. 45, pp. 1456-1467, July 1999.

[5] S. Alamouti, A simple transmitter diversity scheme for wireless communications, IEEE J. Select. Areas Commun., vol.

16, pp. 1451-1458, Oct. 1998.

[6] Y. Liu and M. Fitz, space-time turbo codes, 13th Annual Allerton Conf. on Commun. Control and Computing , Sep. 1999.

[7] D. Cui and A. M. Haimovich, Design and performance of turbo space-time coded modulation, IEEE GLOBECOM00,

vol. 3, pp1627-1631, Nov. 2000.

[8] D. Tujkovic, Recursive space-time trellis codes for turbo coded modulation, Proc. of GlobeCom 2000, San Francisco.[9] G. J. Foschini, Layered space-time architecture for wireless communication in fading environments when using multiple

antennas, Bell labs. Tech. J.,vol. 1, no. 2, pp. 41-59, 1996.

[10] G. D. Golden, G. J. Foschini, R. A. Valenzuela, and P. W. Wolniansky, Detection algorithm and initial laboratory results

using V-BLAST space-time communication architecture, Electron. Lett., vol. 35, pp. 14-16, Jan. 1999.

[11] H. El Gamal and A. R. Hammons Jr., A new approach to layered space-time coding and signal processing, IEEE Trans.

Inf. Theory, vol. 47, pp. 2321-2334. Sep. 2001.

[12] B. Hassibi and B. Hochwald, High-rate codes that are linear in space and time, IEEE Trans. Inform. Theory, vol. 48,

pp. 1804-1824, July 2002.

[13] X. Ma and G. B. Giannakis, Full-Diversity Full-Rate Complex-Field Space-Time Coding, IEEE Trans. Signal Processing,

vol. 51, no. 11, pp. 2917-2930, July 2003.

[14] C. Berrou, A. Glavieux, and P. Thitimajshima, Near Shannon limit error correcting coding and decoding: Turbo codes,

in Proc. IEEE Int. Conf. Commun., vol. 2, pp. 1064-1070, Geneva, Switzerland, May 1993.

[15] B. M. Hochwald and S. ten Brink Achieving Near-Capacity on a Multiple-Antenna Channel, IEEE Trans. Comm., vol.

51, pp. 389-399, Mar 2003.

[16] S. ten Brink, Convergence of iterative decoding, Electron. Lett., vol. 35, no. 13, pp. 1117-1118, Jun. 1999.

[17] S. ten Brink, Convergence behavior of iteratively decoded parallel concatenated codes, IEEE Trans. Commun., vol. 40,

pp. 1727-1737, Oct. 2001.

[18] A. van Zelst, R. van Nee, and G. A. Awater, Turbo-BLAST and its performance, in Proc. Vehicle Technology Conf., vol.

2, pp. 1282-1286, May 2001.

[19] X. Li and J. A. Ritcey, Bit-interleaved coded modulation with iterative decoding, in Proc. Int. Conf. Communications,

pp. 858-862, June 1999.

[20] A. M. Tonello, Space-time bit-interleaved coded modulation with an iterative decoding strategy, in Proc. Vehicle

Technology Conf., pp. 473-478, Sept. 2000.

[21] X. Wang and H. Poor, Iterative (turbo) soft interference cancellation and decoding for coded CDMA, IEEE Trans. Comm.,

vol. 47, pp. 1046-1061, July 1999.



21/25

21

[22] M. Tuchler, A. Singer, and R. Koetter, Minimum mean square error equalization using a priori information, IEEE Trans.

Signal Processing, vol. 50, pp. 673-683, Mar. 2002.

[23] L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, Optimal decoding of linear codes for minimizing symbol error rate, IEEE

Trans. Inf. Theory, vol 20, Issue: 2, pp.284-287, Mar. 1974.

[24] U. Fincke and M. Pohst, Improved methods for calculating vectors of short length in a lattice, including a complexity

analysis, in Math. Comput., vol. 44, pp. 463-471, Apr. 1985

[25] B. Vucetic and J. Yuan, Turbo Codes: Principles and Applications, Kluwer, 2000

[26] L. Zhang and D. Tse, Diversity and mutiplexing: A fundamental tradeoff in multiple antenna channels IEEE Trans.

Inform. Theory, vol. 49, pp. 1073-96, May 2003.

[27] A W. Marshall and I. Olkin, Ineqalities: Theory of Majorization and Its Application , Academic Press, Inc. (London)

Ltd., 1979.

[28] H. Boche and E. A. Jorswieck, On Schur-convexity of expectation of weighted sum of random variables with applications,

Journal of Inequalities in Pure Applied Mathematics, vol. 5, Issue 2, Article 46, 2004.

[29] M. E. Bock, P. Diaconis, F. W. Huffer and M. D. Perlman, Inequalities for linear combinations of Gamma random

variables, Canada J. Statistics, vol. 15, pp. 387-395, 1987.

[30] R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press, 1985.

[31] T. M. Cover and J. A. Thomas, Elements of Information Theory, New York: Wiley, 1991.

[32] T. Rapaport, Wireless Communications: Principles and Practice, 2nd ed. Prentice Hall, 2001

[33] J. Proakis, Digital Communications, 4th ed. New York: McGraw-Hill.



22/25

22

Fig. 1. System Block Diagram.

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

IE1

(IA1)w

hereoutputIE

1becomesinputIA

2

IE2

(IA2

) where output IE2

becomes input IA1

IE1

=(IE1

+I

E1

)/2

IE2

c

Outer Decoder

IE1

cIE1

of symbol group 2

IE1

of symbol group 1

IE1

c

IE1

c

Fig. 2. A typical EXIT chart of joint iterative demodulation/decoding receiver for QPSK constellation: two input symbols per

modulation block.

Fig. 3. BCJR/MAP decoding with different a priori LLR inputs: I(La; x) = 12 [I(La1; x) + I(La2;x)]



23/25

23

(a) Case 1: var1=0.4,var2=9.0 and var=1.4 (b) Case 2: var1=2.0,var2=4.0 and var=2.7

Fig. 4. Histograms ofx for the BCJR/MAP decoder (a) and (b). x1, x2 and x are the soft outputs corresponding to

Le1, Le2 in BCJR/MAP decoder (a) and Le in BCJR/MAP decoder (b), respectively. var1,var2 and var are the associated

variances of the AWGN.

0 2 4 6 8 10 121

1.5

2

2.5

3

3.5

4

4.5

P/2

z(dB)

UncooperatedSumC

apacity(bits/channeluse) Gaussianopt

QPSKoptGaussianorth1

8PSKorth1

Gaussianorth2

8PSKorth2

(a) Target Rm = 3 bits per channel use

0 2 4 6 8 10 121

1.5

2

2.5

3

3.5

4

4.5

P/2

z(dB)

UncooperatedSumC

apacity(bits/channeluse)

Gaussianopt

QPSKoptGaussianorth1

16QAMorth1

Gaussianorth1

16QAMorth2

(b) Target Rm = 4 bits per channel use

Fig. 5. Uncooperated sum capacity versus P/2z for Nt = Nr = T = 2 under the Rayleigh ergodic flat fading channel. Dash

curves correspond to Gaussian inputs and solid lines correspond to specific constellations. opt, orth1 and orth2 correspond

to Scheme 1, Alamouti scheme, the orthogonal scheme (a subset of Scheme 1), respectively.



24/25

24

0 0.5 1 1.5 20.7

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

IA

(a priori info.)

AverageIE(extrinsicinfo.)

=0.2

Scheme 1:=10o

Scheme 2:=10o

Scheme 3:=10o

Scheme 1:=90o

Scheme 2:=90o

Scheme 3:=90o

=90o

=10o

(a) when = 0.2

0 0.5 1 1.5 20.9

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

IA

(a priori info.)

AverageIE(extrinsicinfo.)

=1.0

Scheme 1:=10o

Scheme 2:=10o

Scheme 3:=10o

Scheme 1:=90o

Scheme 2:=90o

Scheme 3:=90o

=90o

=10o

(b) when = 1.0

Fig. 6. The EXIT characteristics of the three schemes under various channels H.

1 1.2 1.4 1.6 1.8 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

IE1

(0) (extrinsic info. without a priori)

CDF

Scheme 1slowScheme 2slow

Scheme 3slow

Scheme 1fast

Scheme 2fast

Scheme 3fast

(a) IE1(0)

1.6 1.65 1.7 1.75 1.8 1.85 1.9 1.95 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

IE1

(Q) (extrinsic info. with perfect a priori)

CDF

Scheme 1slowScheme 2slow

Scheme 3slow

Scheme 1fast

Scheme 2fast

Scheme 3fast

(b) IE1(Q)

Fig. 7. CDFs ofIE1(0) and IE1(Q) comparisons for the three schemes under the different Rayleigh ergodic flat fading

channels. fast indicates fast fading channel and slow indicates slow fading channel.



25/25

25

0 2 4 6 8 1010

3

102

101

100

Eb/No (dB)

FER

Scheme 1:1st it

Scheme 1: 8th it

Scheme 2:1st it

Scheme 2: 8th it

Scheme 3:1st it

Scheme 3: 8th it

fast

slow

slowfast

Fig. 8. FER comparison of the three schemes under different Rayleigh ergodic flat fading channel. fast indicates fast fading

channel and slow indicates slow fading channel.

design4cstm_jrnl

Documents