modulation and coding for linear gaussian channels - … · 2001-03-16 · 2384 ieee transactions...

32
2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear Gaussian Channels G. David Forney, Jr., Fellow, IEEE, and Gottfried Ungerboeck, Fellow, IEEE (Invited Paper) Abstract—Shannon’s determination of the capacity of the linear Gaussian channel has posed a magnificent challenge to succeeding generations of researchers. This paper surveys how this challenge has been met during the past half century. Orthogonal minimum- bandwidth modulation techniques and channel capacity are dis- cussed. Binary coding techniques for low-signal-to-noise ratio (SNR) channels and nonbinary coding techniques for high-SNR channels are reviewed. Recent developments, which now allow capacity to be approached on any linear Gaussian channel, are surveyed. These new capacity-approaching techniques include turbo coding and decoding, multilevel coding, and combined coding/precoding for intersymbol-interference channels. Index Terms— Binary coding, block codes, canonical channel, capacity, coding gain, convolutional codes, lattice codes, multi- carrier modulation, nonbinary coding, orthogonal modulation, precoding, shaping gain, single-carrier modulation, trellis codes. I. INTRODUCTION T HE problem of efficient data communication over linear Gaussian channels has driven much of communication and coding research in the half century since Shannon’s origi- nal work [93], [94], where this problem was central. Shannon’s most celebrated result was the explicit computation of the capacity of the additive Gaussian noise channel. His result has posed a magnificent challenge to succeeding generations of researchers. Only in the past decade can we say that methods of approaching capacity have been found for practically all linear Gaussian channels. This paper focuses on the modulation, coding, and equalization techniques that have proved to be most effective in meeting Shannon’s challenge. Approaching channel capacity takes different forms in dif- ferent domains. There is a fundamental difference between the power-limited regime with low signal-to-noise ratio (SNR) and the high-SNR bandwidth-limited regime. Early work focused on the power-limited regime. In this domain, binary codes suffice, and intersymbol interference (ISI) due to nonflat channel responses is rarely a serious problem. As early as the 1960’s, sequential decoding [119] of binary convolutional codes [31] was shown to be an implementable method for achieving the cutoff rate , which at low SNR is 3 dB away from the Shannon limit. Many Manuscript received February 9, 1998; revised June 5, 1998. G. D. Forney, Jr. is with Motorola, Information Systems Group, Mansfield, MA 02048-1193 USA (e-mail: [email protected]). G. Ungerboeck was with IBM Research Division, Zurich Research Labora- tory, CH-8803 Rueschlikon, Switzerland (e-mail: [email protected]). Publisher Item Identifier S 0018-9448(98)06314-7. leading communications engineers took [78], [118] to be the “practical capacity,” and concluded that the problem of reaching capacity had effectively been solved. Nonetheless, only in the 1990’s have coding schemes that reach or ex- ceed actually been implemented, even for such important and effectively ideal additive white Gaussian noise (AWGN) channels as deep-space communication channels. Meanwhile, in the bandwidth-limited regime there was essentially no practical progress beyond uncoded multilevel modulation until the invention of trellis-coded modulation in the mid-1970’s and its widespread implementation in the 1980’s [104]. An alternative route to capacity in this regime is via multilevel coding, which was also introduced during the 1970’s [60], [75]. Trellis-coded modulation and multilevel coding are both based on the concept of set partitioning. In the late 1980’s, constellation shaping was recognized as a separable part of the high-SNR coding problem, which yields a small but essential contribution to approaching capacity [48]. For channels with strongly frequency-dependent attenuation or noise characteristics, it had long been known that mul- ticarrier modulation could in principle be used to achieve the power and rate allocations prescribed by “water pouring” (see, e.g., [51]). However, practical realizations of multicarrier modulation in combination with powerful codes have been achieved only in recent years [90]. Multicarrier techniques are particularly attractive for channels for which the capacity- achieving band consists of multiple disconnected frequency intervals, as may happen with narrowband interference. When the capacity-achieving band is a single interval, methods for approaching capacity using serial transmission and nonlinear transmitter pre-equalization (“precoding”) have been developed in the past decade [34]. With these methods, the received signal is apparently ISI-free, so ideal-channel decoding can be performed at the channel output. In addition, signal redundancy at the channel output is exploited for constellation shaping at the channel input. The performance achieved is equivalent to the performance that would be obtained if ISI could be perfectly canceled in the receiver with decision-feedback equalization (DFE). At high SNR, the effective SNR is approximately equal to the optimal effective SNR achieved by water pouring. The capacity of an ISI channel can therefore be approached as closely as the capacity of an ideal ISI-free channel. Currently, the invention of turbo codes [13] and the re- discovery of low-density parity-check (LDPC) codes [50] 0018–9448/98$10.00 1998 IEEE

Upload: others

Post on 14-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

Modulation and Coding forLinear Gaussian Channels

G. David Forney, Jr.,Fellow, IEEE, and Gottfried Ungerboeck,Fellow, IEEE

(Invited Paper)

Abstract—Shannon’s determination of the capacity of the linearGaussian channel has posed a magnificent challenge to succeedinggenerations of researchers. This paper surveys how this challengehas been met during the past half century. Orthogonal minimum-bandwidth modulation techniques and channel capacity are dis-cussed. Binary coding techniques for low-signal-to-noise ratio(SNR) channels and nonbinary coding techniques for high-SNRchannels are reviewed. Recent developments, which now allowcapacity to be approached on any linear Gaussian channel, aresurveyed. These new capacity-approaching techniques includeturbo coding and decoding, multilevel coding, and combinedcoding/precoding for intersymbol-interference channels.

Index Terms—Binary coding, block codes, canonical channel,capacity, coding gain, convolutional codes, lattice codes, multi-carrier modulation, nonbinary coding, orthogonal modulation,precoding, shaping gain, single-carrier modulation, trellis codes.

I. INTRODUCTION

T HE problem of efficient data communication over linearGaussian channels has driven much of communication

and coding research in the half century since Shannon’s origi-nal work [93], [94], where this problem was central. Shannon’smost celebrated result was the explicit computation of thecapacity of the additive Gaussian noise channel. His resulthas posed a magnificent challenge to succeeding generations ofresearchers. Only in the past decade can we say that methods ofapproaching capacity have been found for practically all linearGaussian channels. This paper focuses on the modulation,coding, and equalization techniques that have proved to bemost effective in meeting Shannon’s challenge.

Approaching channel capacity takes different forms in dif-ferent domains. There is a fundamental difference between thepower-limited regime with low signal-to-noise ratio (SNR) andthe high-SNR bandwidth-limited regime.

Early work focused on the power-limited regime. In thisdomain, binary codes suffice, and intersymbol interference(ISI) due to nonflat channel responses is rarely a seriousproblem. As early as the 1960’s, sequential decoding [119]of binary convolutional codes [31] was shown to be animplementable method for achieving the cutoff rate, whichat low SNR is 3 dB away from the Shannon limit. Many

Manuscript received February 9, 1998; revised June 5, 1998.G. D. Forney, Jr. is with Motorola, Information Systems Group, Mansfield,

MA 02048-1193 USA (e-mail: [email protected]).G. Ungerboeck was with IBM Research Division, Zurich Research Labora-

tory, CH-8803 Rueschlikon, Switzerland (e-mail: [email protected]).Publisher Item Identifier S 0018-9448(98)06314-7.

leading communications engineers took [78], [118] to bethe “practical capacity,” and concluded that the problem ofreaching capacity had effectively been solved. Nonetheless,only in the 1990’s have coding schemes that reach or ex-ceed actually been implemented, even for such importantand effectively ideal additive white Gaussian noise (AWGN)channels as deep-space communication channels.

Meanwhile, in the bandwidth-limited regime there wasessentially no practical progress beyond uncoded multilevelmodulation until the invention of trellis-coded modulationin the mid-1970’s and its widespread implementation in the1980’s [104]. An alternative route to capacity in this regimeis via multilevel coding, which was also introduced duringthe 1970’s [60], [75]. Trellis-coded modulation and multilevelcoding are both based on the concept of set partitioning. Inthe late 1980’s, constellation shaping was recognized as aseparable part of the high-SNR coding problem, which yieldsa small but essential contribution to approaching capacity [48].

For channels with strongly frequency-dependent attenuationor noise characteristics, it had long been known that mul-ticarrier modulation could in principle be used to achievethe power and rate allocations prescribed by “water pouring”(see, e.g., [51]). However, practical realizations of multicarriermodulation in combination with powerful codes have beenachieved only in recent years [90]. Multicarrier techniquesare particularly attractive for channels for which the capacity-achieving band consists of multiple disconnected frequencyintervals, as may happen with narrowband interference.

When the capacity-achieving band is a single interval,methods for approaching capacity using serial transmissionand nonlinear transmitter pre-equalization (“precoding”) havebeen developed in the past decade [34]. With these methods,the received signal is apparently ISI-free, so ideal-channeldecoding can be performed at the channel output. In addition,signal redundancy at the channel output is exploited forconstellation shaping at the channel input. The performanceachieved is equivalent to the performance that would beobtained if ISI could be perfectly canceled in the receiverwith decision-feedback equalization (DFE). At high SNR, theeffective SNR is approximately equal to the optimal effectiveSNR achieved by water pouring. The capacity of an ISIchannel can therefore be approached as closely as the capacityof an ideal ISI-free channel.

Currently, the invention of turbo codes [13] and the re-discovery of low-density parity-check (LDPC) codes [50]

0018–9448/98$10.00 1998 IEEE

Page 2: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2385

have created tremendous excitement. These schemes operatesuccessfully at rates well beyond the cutoff rate, withintenths of a decibel of the Shannon limit, in both the low-SNRand high-SNR regimes. Whereas coding for rates belowis a rather mature field, coding in the beyond-regime isin its early stages, and theoretical understanding is still weak.At , it seems that a “phase transition” occurs from a staticregime of regular code structures to a statistical regime ofquasirandom codes, comparable to a phase transition betweena solid and a liquid or gas.

In Section II, we define the general linear Gaussian channeland the ideal band-limited additive white Gaussian noise(AWGN) channel. We review how orthogonal modulationtechniques convert a waveform channel to an equivalent idealdiscrete-time AWGN channel. We discuss the Shannon limitfor such channels, and emphasize the difference betweenthe low-SNR and high-SNR regimes. We give baseline per-formance curves of uncoded -ary PAM modulation, andevaluate the error probability as a function of the gap tocapacity. We review union-bound performance analysis, whichis a useful tool below , but useless beyond .

In Section III, we discuss the performance and complexity ofbinary block codes and convolutional codes for the low-SNRregime at rates below . We also examine more powerfulbinary coding and decoding techniques such as sequential de-coding, code concatenation with outer Reed–Solomon codes,turbo codes, and low-density parity-check codes.

In Section IV, we do the same for lattice and nonbinarytrellis codes, which are the high-SNR analogs of binary blockand convolutional codes, respectively. We also discuss morepowerful schemes, such as multilevel coding with binary turbocodes [109].

Finally, in Section V, we consider the general linear Gauss-ian channel. We review water pouring and discuss approachingcapacity with multicarrier modulation. For serial single-carriertransmission, we explain how to reduce the channel to anequivalent discrete-time channel without loss of optimality. Weshow that capacity can be approached if the intersymbol inter-ference in this channel can be eliminated, and discuss varioustransmitter precoding techniques that have been developed toachieve this objective.

Appendix I discusses various forms of multicarrier modu-lation. In Appendix II, several uniformity properties that haveproved to be helpful in the design and analysis of Euclidean-space codes are reviewed. Appendix III addresses discrete-timespectral factorization.

II. M ODULATION AND CODING

FOR THE IDEAL AWGN CHANNEL

In this section, we first define the general linear Gauss-ian channel and the ideal band-limited AWGN channel. Wethen review minimum-bandwidth orthogonal pulse amplitudemodulation (PAM) for serial transmission of real and complexsymbol sequences over the ideal AWGN channel. As is wellknown, orthogonal transmission at a rate of real (re-spectively, complex) symbols per second requires a minimumone-sided bandwidth of Hz (respectively, Hz).

This can be accomplished in a variety of ways. Any suchmodulation technique converts the continuous-time AWGNchannel without loss of optimality to an ideal discrete-timeAWGN channel.

We then review the channel capacity of the ideal AWGNchannel, and give capacity curves for equiprobable-aryPAM ( -PAM) inputs. We emphasize the significant differ-ences between the low-SNR and high-SNR regimes. As afundamental figure of merit for coding, we use the normalizedsignal-to-noise ratio SNR defined in Section II-E, andcompare it with the traditional figure of merit . Wepropose that should be used only in the low-SNRregime. We give baseline performance curves for uncoded 2-PAM modulation as a function of SNR and , andfor -PAM as a function of SNR . Finally, we discussunion-bound performance analysis.

A. General Linear Gaussian Channel andIdeal AWGN Channel

The generallinear Gaussian channelis a real waveformchannel with input signal , a channel impulse response

, and additive Gaussian noise . The received signal is

(2.1)

where “ ” denotes convolution. The Fourier transform (spec-tral response) of will be denoted by . The inputsignal is subject to a power constraint, and the noise hasone-sided power spectral density (p.s.d.) .

Since all waveforms are real and thus have Hermitian-symmetric spectra about , only positive frequencies needto be considered. We will follow this long-established tradition,although for analytic purposes it is often preferable to considerboth positive and negative frequencies. All bandwidths andpower spectral densities in this paper will therefore be “one-sided.”

The ideal band-limited additive white Gaussian noise(AWGN) channel(for short, the ideal AWGN channel) is alinear Gaussian channel with flat (constant) andover a frequency band of one-sided bandwidth .Signals can be transmitted only in the band, which is notnecessarily one continuous interval. The restriction tomaybe due either to the channel ( for ) or tosystem constraints.

Without loss of generality, we may normalize towithin , so that

(2.2)

where is AWGN with one-sided p.s.d. . Thesignal-to-noise ratio is then

SNR (2.3)

B. Real and Complex Pulse Amplitude Modulation

Let be a sequence of real modulation symbols rep-resenting digital data. Assume that represents a setof real orthonormal waveforms; i.e., ,

Page 3: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

where the integral is over all time, and denotes the Kro-necker delta function: if , else . A PAMmodulator transmits the continuous-time signal

(2.4)

over an ideal AWGN channel. An optimal demodulator recov-ers a sequence of noisy estimates of the transmitted sym-bols by correlating the received signalwith the waveforms (matched filtering)

(2.5)

By optimum detection theory [118], the sequence is aset of sufficient statistics about the sequence ; i.e., allinformation about that is contained in the continuous-time received signal is condensed into the discrete-timesequence . Moreover, the orthonormality of theensures that there is no intersymbol interference (ISI), andthat the sequence is a set of independent and identicallydistributed (i.i.d.) Gaussian random noise variables with meanzero and variance . The waveform channel is thusreduced to an equivalentdiscrete-time ideal AWGN channel.

For serial PAM transmission, the orthogonal waveforms arechosen to be time shifts of a pulse by integer multiplesof the modulation interval ; i.e., , sothat the transmitted signal becomes

(2.6)

The orthonormality condition requires that

which is equivalent to

(2.7)

The frequency-domain equivalent to (2.7) is known as theNyquist criterionfor zero ISI, namely,

for all (2.8)

where is the Fourier transform ofand is the -aliased spectrum of .

The Nyquist criterion imposes no restriction on the phaseof . Because is periodic with period , andfor real signals and thus are symmetric around

, it suffices that for the frequency interval. This implies that for eachfor at least one . It

follows that serial ISI-free transmission of real symbolsper second over a real channel requires a minimum one-sidedbandwidth of Hz, or 1/2 Hz per symbol dimensiontransmitted per second (1/2 Hz/dim/s).

For serial baseband transmission, the Nyquist criterion ismet with minimum bandwidth by the brick-wall spectrum

elsewhere

(2.9)

Solutions requiring more bandwidth include spectra withsymmetric finite rolloffs at the band-edge frequencies ,e.g., the well-known family ofraised-cosinespectra [74]. Ofcourse, the Nyquist criterion is also satisfied by the spectrum

, which corresponds to “unfil-tered” modulation with the rectangular pulsefor ( elsewhere).

Serial passband transmissionof real modulation symbolscan be achieved with minimum bandwidth byfor , , and zero elsewhere.This type of modulation may be referred to ascarrierlesssingle-sideband(CSSB) modulation; it is not often used. Serialpassband transmission of complex modulation symbols

by carrierless amplitude-phase(CAP) modulation orquadrature amplitude modulation(QAM) is usually preferred.

To transmit a complex signal over a real passband channel,one may use a complex signal with only positive-frequencycomponents. Sending the real part thereof creates a negative-frequency image spectrum. In the receiver, the complex signalis recovered by suppressing the negative-frequency spectrum.Let be as defined in (2.9), and let

where so that the Fourier transform of isanalytic, i.e., for . Then is the Hilberttransform of , and the two pulses are said to form aHilbert pair. The two pulses are orthogonal with energy ,and both have one-sided bandwidth Hz centered at .

The transmit signals for minimum-bandwidth CAP andQAM are, respectively,

(2.10)

and

(2.11)

Notice that QAM is equivalent to CAP with the symbolsreplaced by the rotated symbols. If is an integer multipleof , then there is no difference between CAP and QAM.

An advantage of CAP over CSSB is that the CAP spectrumdoes not need to be aligned with integer multiples of , asis the case for CSSB. CAP may be preferred over QAM when

does not substantially exceed . On the other hand,QAM is the modulation of choice when . For ISI-free transmission of complex (two-dimensional) signals

Page 4: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2387

CAP and QAM require a minimum one-sided bandwidth ofHz, which again is 1/2 Hz/dim/s.

There exists literally an infinite variety of orthonormalmodulation schemes with the same bandwidth efficiency.Consider general serial/parallel PAM modulation in which real

-vectors are transmitted at a rateof vectors per second over an ideal AWGN channel. Let

be an -vector of realpulses. A general PAM modulator transmits

(2.12)

where denotes the inner product. If is a set oforthonormal functions, an optimum demodulator may recovernoisy estimates of of from by matched filtering

(2.13)

The sequence is a sequence of i.i.d. Gaussian-vectors,whose elements have variance. Thus the AWGN waveformchannel is converted to a discrete-time vector AWGN channel,which is equivalent to real discrete-time AWGN channelsoperating in parallel.

Let be the vec-tor of the Fourier transforms of the elements of . Thefrequency-domain condition for orthonormality is thegener-alized Nyquist criterion[74]

for all (2.14)

In (2.14), denotes the conjugate transpose of ,is the matrix

, denotes the -aliased spectrumof , and is the identity matrix. As with(2.8), it suffices that for the frequency interval

. Now consider the matrix

(2.15)

Then (2.14) can be written as ,which implies that must have rank for all . It followsthat the total one-sided bandwidth for which some ,

, is nonzero must be at least Hz.Because symbol dimensions are transmitted per second,again a minimum one-sided bandwidth of 1/2 Hz/dim/s isneeded.

If meets the minimum-bandwidth condition, then thenonzero rows of must form an unitary

matrix in order that . Conversely, any set

of unitary matrices , defines a minimum-bandwidth orthonormal transmission scheme. The number ofsuch schemes is therefore uncountably infinite.

Appendix I describes two currently popular multicarriermodulation schemes, namely, discrete multitone (DMT) anddiscrete wavelet multitone (DWMT) modulation.

The optimum design of transmitter and receiver filters forserial/parallel transmission over general noisy linear channelshas been discussed in [28].

C. Capacity of the Ideal AWGN Channel

We have seen that all orthonormal modulation schemes withoptimum matched-filter detection are equivalent to an idealdiscrete-time AWGN channel with real input symbolsandreal output signals , where . The i.i.d. Gaussiannoise variables have zero mean and variance .

Let the average input-symbol energy per dimension be, and let the average energy per information bit

be , where is the code rate in bits per dimension(b/dim). Because and , the signal-to-noise ratio (SNR) of the AWGN waveform channel and thediscrete-time AWGN channel are identical:

SNR (2.16)

The channel capacity is the maximum mutual informationbetween channel input and output, which is obtained with aGaussian distribution over the symbolswith average power

. Shannon’s most famous result states the capacity in bitsper second as

SNR (2.17)

Equivalently, the capacity in bits per dimension is given by

SNR [b/dim] (2.18)

Shannon [93], [94] showed that reliable transmission ispossible for any rate , and impossible if .(Unless stated otherwise, in this paper capacityand coderates are given in bits per dimension.)

The capacity can be upper-bounded and approximated forlow SNR by a linear function in SNR, and lower-boundedand approximated for high SNR by a logarithmic function inSNR, as follows:

SNR SNR (2.19)

SNR SNR (2.20)

A finite set from which modulation symbols can bechosen is called asignal alphabetor signal constellation.An -PAM constellation contains equidistant realsymbols centered on the origin; i.e.,

, where is the minimum distancebetween symbols. For example, isa 4-PAM constellation with . If the symbols areequiprobable, then the average symbol energy is

(2.21)

Page 5: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

2388 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

Fig. 1. Capacity of the ideal AWGN channel with Gaussian inputs and with equiprobableM -PAM inputs.

Fig. 1 shows the capacity and the mutual informationachieved with equiprobable -PAM (“equiprobable -PAMcapacity”), for , as a function of SNR. The

-PAM curves saturate because information cannot be sentat a rate higher than .

D. Low-SNR and High-SNR Regimes

We see from Fig. 1 that in the low-SNR regime an equiprob-able binary alphabet is nearly optimal. For SNR (0 dB),the reduction in capacity is negligible.

In the high-SNR regime, the capacity of equiprobable-PAM constellations asymptotically approaches a straight lineparallel to the capacity of the AWGN channel. The asymptoticloss of (1.53 dB) is due to using a uniform rather than aGaussian distribution over the signal set. To achieve capacity,the use of powerful coding with equiprobable-PAM signalsis not enough. To obtain the remaining 1.53 dB, constellation-shaping techniques that produce a Gaussian-like distributionover an -PAM constellation are required (see Section IV-B).

Thus coding techniques for the low-SNR and high-SNRregimes are quite different. In the low-SNR regime, binarycodes are nearly optimal, and no constellation shaping isrequired. Good binary coding and decoding techniques havebeen known since the 1960’s.

On the other hand, in the high-SNR regime, nonbinarysignal constellations must be used. Very little progress indeveloping practical codes for this regime was made untilthe advent of trellis-coded modulation in the late 1970’sand 1980’s. To approach capacity, coding techniques mustbe supplemented with constellation-shaping techniques. More-over, on bandwidth-limited channels, ISI is often a dominantimpairment, and practical techniques for combined coding,shaping, and equalization are required to approach capacity.Most of the progress in these areas has been made only in thepast decade.

For these reasons, we discuss coding for the low-SNRand high-SNR regimes separately in Sections III and IV, andcoding for the general linear Gaussian channel in Section V.

E. Baseline Performance of Uncoded-PAMand Normalized SNR

In an uncoded -PAM system, informationbits are independently encoded into each-PAM symboltransmitted. In the receiver, the optimum decoding rule isthen to make independent symbol-by-symbol decisions. Theprobability that a Gaussian noise variable exceeds halfof the distance between adjacent -PAM symbols is

, where [74]

(2.22)

The error probability per symbol is given byfor the two outer points, and byfor the inner points, so the average error

probability per symbol is

SNR(2.23)

where SNR follows from (2.16)and (2.21). Thus is a function only of and SNR.The circles in Fig. 1 indicate the values of SNR for which

is achieved.The capacity formula (2.18) can be rewritten as

SNR . This suggests defining thenormalizedSNR

SNR SNR (2.24)

Page 6: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2389

Fig. 2. Bit-error probabilityPb(E) versusEb=N0 for uncoded 2-PAM, and symbol-error probabilityPs(E) versus SNRnorm for uncodedM -PAM (M large).

where is the actual data rate of a given modulation andcoding scheme. For a capacity-achieving scheme,equalsthe channel capacity and SNR (0 dB). If ,as will always be the case in practice, then SNR . Thevalue of SNR thus signifies how far a system is operatingfrom the Shannon limit (the “gap to capacity”).

For uncoded -PAM, from and (2.24) oneobtains

SNR SNR SNR

Therefore, the average error probability per symbol of uncoded-PAM can be written as

SNR

SNR large (2.25)

Note that the baseline -PAM performance curve ofversus SNR is nearly independent of , if is large.This shows that SNR is appropriately normalized for ratein the high-SNR regime.

Because SNR by (2.16), the general relationbetween SNR and at a given rate (in bits perdimension) is given by

SNR (2.26)

If , then SNR , so the two figuresof merit are equivalent. If , then SNR ;if , then SNR .

In the low-SNR regime, if bandwidth is truly unconstrained,then as bandwidth is increased to permit usage of powerfullow-rate binary codes, both SNR andtend toward zero. In

this power-limited regime, it is appropriate to use as a figureof merit the traditional ratio .

From the general Shannon limit SNR , we obtainthe Shannon limit on for a given rate

(2.27)

This lower bound decreases monotonically with, and asapproaches, it approaches theultimate Shannon limit

1.59 dB (2.28)

However, if bandwidth is limited, then the Shannon limit onis higher. For example, if , then

(0 dB).Because SNR at , the error

probability per symbol (or per bit) of uncoded 2-PAM maybe expressed in two equivalent ways

SNR (2.29)

In this paper we will use in the low-SNR regimeand SNR in the high-SNR regime as the fundamentalfigures of merit of uncoded and coded modulation schemes.This practice appears to be on its way to general adoption.

The “effective coding gain” of a coded modulation schemeis measured by the reduction in required or SNR toachieve a certain target error probability relative to a baselineuncoded scheme. In the low-SNR regime, the baseline will betaken as 2-PAM; in the high-SNR regime, the baseline will betaken as -PAM ( large).

Fig. 2 gives the probability of bit error for uncoded 2-PAM as a function of both SNR and . Note that theaxis for is shifted by a factor of (1.76 dB) relativeto the axis for SNR . At , the baseline

Page 7: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

2390 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

uncoded binary modulation scheme operates about 12.5 dBaway from the Shannon limit. Therefore, a coding gain ofup to 12.5 dB in is in principle possible at this ,provided that bandwidth can be expanded sufficiently to permitthe use of powerful very-low-rate binary codes . Ifbandwidth can be expanded by a factor of only, then withbinary codes of rate a coding gain of up to about10.8 dB can in principle be achieved.

Fig. 2 also depicts the probability of symbol errorfor uncoded -PAM as a function of SNR for large

(in this case, the axis should be ignored). At, a baseline uncoded -PAM modulation

scheme operates about 9 dB away from the Shannon limitin the bandwidth-limited regime. Thus if bandwidth is a fixed,nonexpandable resource, a coding gain of up to about 9 dBin SNR is in principle possible at . Thisconclusion holds even at .

F. The Union Bound

The union bound is a useful tool for evaluating the perfor-mance of moderately powerful codes, although it breaks downfor rates beyond the cutoff rate .

The union bound is based on evaluating pairwise errorprobabilities between pairs of coded sequences. On an idealreal discrete-time AWGN channel, the received sequence

is the sum of the transmitted coded sequenceand ani.i.d. Gaussian sequence with mean and variance persymbol. Because , maximum-likelihood (ML) decoding is equivalent to minimum-distancedecoding.

Given two coded sequences and that differ by theEuclidean distance , the probability that the receivedsequence will be closer to than to is given by

(2.30)

The probability that will be closer to than to for any, which is precisely the probability of error with ML

decoding, is thus upperbounded by

(2.31)

The average probability of error over all coded sequencesis upperbounded by theunion bound

(2.32)

where is the average number of coded sequencesat distance from .

The Gaussian error probability function decays expo-nentially as . Therefore, if does not rise toorapidly with , the union bound is dominated by its first term,which is called theunion bound estimate(UBE)

(2.33)

where is the minimum distance between coded sequences,and is the average number of sequences at minimumdistance from a given coded sequence. The squared

argument of the function in the union bound estimateyields a first-order estimate of performance, namely,

SNR

(2.34)

where is the squared baseline distance, isthe baseline rate, and is the actual rate.

In comparison to the baseline squared distance, there is adistance gainof a factor of . However, in comparisonto the baseline rate there is also aredundancy loss. Inthe high-SNR regime, the redundancy loss is approximately

, where is the redundancyof the code in b/dim. In the low-SNR regime, with 2-PAMsignaling , the redundancy loss is simply a factor of

. The product of these factors gives thenominal(or asymptotic)coding gain, which is based solely on theargument of the function in the UBE.

The effective coding gainis the difference between thesignal-to-noise ratios required to achieve a certain target errorrate with a coded modulation scheme and to obtain the sameerror rate with an uncoded scheme of the same rate. Theeffective coding gain is typically less than the nominal codinggain, because theerror coefficient is typically largerthan the baseline error coefficient. Note that dependson exactly what is being estimated, e.g., bit-error probability,block-error probability, etc., over which interval, per bit, perdimension, per block, etc. A rule of thumb based on the slopeof the curve in the – region is that everyincrease of a factor of two in the error coefficient costsabout 0.2 dB in effective coding gain, provided that isnot too large.

The union bound blows up even for optimal codes at thecutoff rate , i.e., the corresponding error exponent goesto zero [51]. For low-SNR channels, the cutoff rate is 3dB away from the Shannon limit; i.e., the cutoff rate limitis (1.42 dB). For high-SNR channels, thecutoff rate corresponds to an SNR which is a factor of

away from the Shannon limit [95]; i.e., the cutoff ratelimit is SNR (1.68 dB).

Beyond , codes cannot be analyzed by using the unionbound estimate. Any apparent agreement between a unionbound estimate and actual performance in the region between

and must be regarded as fortuitous. Because the oper-ational significance of minimum distance or nominal codinggain is that they give a good estimate of performance via theunion bound, their significance in the beyond- regime isquestionable.

III. B INARY CODES FORPOWER-LIMITED CHANNELS

In this section we discuss coding techniques for power-limited (low-SNR) ideal AWGN channels. As we have seen,low-rate binary codes are near-optimal in this regime, as longas soft decisions are used at the output. We develop a unionbound estimate of probability of error per bit with maximum-likelihood (ML) decoding. We also show that the nominal

Page 8: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2391

coding gain of a rate- binary code with minimumHamming distance over baseline 2-PAM is simply

.Then we give the effective coding gains of known moderate-

complexity block and convolutional codes versus the branchcomplexity of their minimal trellises, assuming ML decodingat rates below . Convolutional codes are clearly superior interms of coding gain versus trellis complexity.

Finally, we briefly discuss higher-performance codes, in-cluding convolutional codes with sequential decoding, con-catenated codes with outer Reed–Solomon codes, and turbocodes and other capacity-approaching codes.

A. Optimality of Low-Rate Binary LinearCodes with Soft Decisions

We have seen in Fig. 1 that on an AWGN channel withSNR , the capacity with equiprobable binary signaling isnegligibly less than the true capacity. Moreover, on symmetricchannels such as the ideal AWGN channel, it has long beenknown that there is no reduction in capacity if one restrictsone’s attention tolinear binary codes [31], [51].

As shown by (2.26), if the code rate is greaterthan zero, the figure of merit is lower-bounded by

, which exceeds the ultimate Shannon limit of( 1.59 dB). If is small, then

(3.1)

so the lower bound is approximately

decibels (3.2)

For example, the penalty to operate at rate isabout 0.77 dB, which is not insignificant. However, there maybe a limit on bandwidth, which scales as . Moreover,even if bandwidth is unlimited, there is usually a technology-dependent lower limit on the energy per transmittedsymbol below which the performance of other system elements(e.g., tracking loops) degrades significantly.

On the other hand, whereas two-level quantization of thechannel input costs little, two-level quantization of the channeloutput (hard decisions) costs of the order of 2–3 dB [118].Optimized three-level quantization (hard decisions with era-sures) costs only about half as much (1–1.5 dB) [39]. However,optimized uniform eight-level (3-bit) quantization (quantizedsoft decisions) costs only about 0.2 dB, and has often beenused in practice [55].

B. The Union Bound and Nominal Coding Gain

An binary linear block code is a -dimensionalsubspace of the-dimensional binary vector space suchthat the minimum Hamming distance between any two code

-tuples is . Its size is .For use on the Gaussian channel, code bits are usually

mapped to real numbers via the standard 2-PAM map. The Euclidean image is then

a subset of vertices of the verticesof an -cube of side centered on the origin. If two code

-tuples , differ in places, then the squared

distance between their Euclidean images andis . Consequently, the minimum squared Euclideandistance between any two vectors in is

(3.3)

Moreover, if has words of minimum weight , thenby linearity there are vectors at minimum squareddistance from every vector in .

The union bound estimate (UBE) of the probability of ablock decoding error is then

(3.4)

Because and , we may writethe UBE as

(3.5)

where thenominal coding gainof an binary linearcode is defined as

(3.6)

This is the product of a distance gain of a factor ofwith aredundancy loss of a factor of . The uncoded 2-PAMbaseline corresponds to a code, for whichand .

For comparability with the uncoded baseline, it is appro-priate to normalize by to obtain the probability ofblock decoding error per information bit (not the bit-errorprobability)

(3.7)

in which the normalized error coefficient is .Graphically, a curve of the form of (3.7) may be obtained

simply by moving the baseline curveof Fig. 2 to the left by (in decibels), and upward by afactor of .

C. Biorthogonal Codes

Biorthogonal codes are asymptotically optimal codes for theideal AWGN channel, provided that bandwidth is unlimited.They illustrate both the usefulness and the limitations of unionbound estimates.

For any integer there exists a binarylinear code , e.g., a first-order Reed–Muller code, such thatits Euclidean image is a biorthogonal signal set in

-space (i.e., a set of orthogonal vectors and theirnegatives). Its nominal coding gain is

(3.8)

which goes to infinity as goes to infinity.Such a code consists of the all-zero word, the all-one word,

and codewords of weight . The unionbound on block error probability thus becomes

(3.9)

Page 9: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

2392 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

TABLE IPARAMETERS AND CODING GAINS OF REED–MULLER CODES AND THE GOLAY CODE

where we have used the upper bound .Thus if (1.42 dB), or if SNR ,then the union bound approaches zero exponentially with.Thus the union bound shows that with biorthogonal codes it ispossible to approach within 3 dB of the Shannon limit, whichcorresponds to the cutoff rate of the low-SNR AWGNchannel.

A more refined upper bound [118] shows that for

(3.10)

which approaches zero exponentially withif( 1.59 dB), or if SNR . In other words, this boundshows that biorthogonal codes can achieve an arbitrarily smallerror probability for any above the ultimate Shannonlimit. It also illustrates the limitations of the union bound,which blows up 3 dB away.

Of course, the code rate approaches zerorapidly as becomes large, so long biorthogonal codes canbe used only if bandwidth is effectively unlimited. They canbe decoded efficiently by fast Hadamard transform techniques[43], but even so the decoding complexity is proportional to

, which increases exponentially with.A biorthogonal code decoded by a fast

Hadamard transform (“Green machine”) was used for anearly deep-space mission (Mariner, 1969) [79].

D. Effective Coding Gains of Known Codes

For moderate codelengths, Bose–Chaudhuri–Hocquenghem(BCH) codes form a large and well-known class of binarylinear block codes that often are the best known codes interms of their parameters . Unfortunately, not muchis known about soft-decision ML decoding algorithms forsuch codes. The most efficient general methods are trellis-based methods—i.e., the code is represented by a trellis,and the Viterbi algorithm (VA) is used for ML decoding.

There has been considerable recent work on efficient trellisrepresentations of binary linear block codes, including BCHcodes [107].

Reed–Muller (RM) codes are as good as BCH codes for, and almost as good for in terms of

the parameters . Efficient trellis representations areknown for all RM codes [43]. In fact, in terms of nominalcoding gain or effective coding gain versus trelliscomplexity, RM codes are often better than BCH codes [107].Moreover, the number of minimum-weight codewords isknown for all RM codes, but not for all BCH codes [77].

In Table I we give the parameters , , andfor all RM codes with lengths up to and code

rates up to , including biorthogonal codes, andalso for the Golay code. We also give a parameterthat measures trellis complexity (the log branch complexity).To estimate from the effective coding gain ,we use the normalized error coefficient , andapply the “0.2-dB loss per factor of” rule, which is notvery accurate for large . We remind the reader that theseestimated effective coding gains assume soft decisions, MLdecoding, and the accuracy of the union bound estimate, andthat the complexity parameter assumes trellis-based decoding.Moreover, complexity comparisons are always technology-dependent.

There also exist algebraic decoding algorithms for all ofthese codes, as well as for BCH codes. Algebraic error-correcting decoders that are based on hard decisions are notappropriate for the AWGN channel, since hard decisions cost2 to 3 dB. For BCH and other algebraic block codes, theredo exist efficient soft-decision decoding algorithms such asgeneralized minimum distance (GMD) decoding [39] and theChase algorithms [19] that are capable of approaching MLperformance; furthermore, the complexity of “one-pass” GMDdecoding [12] is not much greater than that of error-correctiononly. However, we know of no published results showing that

Page 10: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2393

TABLE IIPARAMETERS AND CODING GAINS OF SELECTED LOW-RATE COVOLUTIONAL CODES

the performance–complexity tradeoff for moderate-complexitybinary block codes with soft-decision algebraic decoding canbe better than that of moderate-complexity binary convolu-tional codes with trellis-based decoding on AWGN channels.

From this table it appears that the best tradeoff betweeneffective coding gain and trellis complexity is obtained withlow-rate biorthogonal codes, if bandwidth is not an issue. Aneffective coding gain of about 5 dB can be obtained with the

code, whose code rate is , with a trellisbranch complexity of only . The and

codes, which are close cousins, can also obtainan effective coding gain of about 5 dB, with a somewhatgreater branch complexity of , but with a morecomfortable code rate of . Beyond these codes,complexity increases rapidly.

A similar analysis can be performed for binary linear con-volutional codes. For a rate- binary linear convolutionalcode with free distance , the nominal coding gain is again

. If is the number of weight- codesequences that start in a given-input, -output block, thenthe appropriate error coefficient to measure the probability oferror event per information bit is again .

Table II gives the parameters of the best known rate-time-invariant binary linear convolutional codes for , ,

with constraint lengths [18]. The log branch com-plexity parameter equals . (For some combinations ofparameters , superior time-varying codes are known[70].)

It is apparent from a comparison of Tables I and II that con-volutional codes offer a much better performance/complexitytradeoff than block codes. It is possible to obtain more than 6dB of effective coding gain with a -state or -state rate-or rate- convolutional code with a simple, regular trellis(VA) decoder. No block code approaches such performancewith such modest complexity.

Consequently, for power-limited channels such as the deep-space channel, convolutional codes rather than block codeshave been used almost from the earliest days [26]. Althoughthis is due in part to the historical fact that efficient maximum-likelihood (ML) or near-ML soft-decision decoding algorithmswere invented much earlier for convolutional codes, TablesI and II show that convolutional codes have an inherentperformance/complexity advantage.

E. Sequential Decoding

Sequential decoding of convolutional codes was perhaps theearliest near-ML decoding algorithm, and is still one of thefew that can operate at the cutoff rate . For this reason,

Page 11: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

2394 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

sequential decoding was the first coding scheme used for thedeep-space application (Pioneer, 1968) [26].

Although there exist many variants of sequential decoding,they have in common a sequential search through a codetree (not a trellis) for the ML code sequence. The codeconstraint length can be infinite in principle, although forsynchronization purposes (whether via zero-tail termination,tail-biting, or self-resynchronization) it is usually chosen to benot too large (of the order of – ).

The amount of computation to decode a given number ofinformation bits is (highly) variable in sequential decoding,but is more or less independent of. The distribution ofcomputation follows a Pareto distribution,

, where is the Pareto exponent [63]. The Pareto exponentis equal to at the cutoff rate , which is usually consideredto be the practical limit of sequential decoding (because aPareto distribution has a finite mean only if ). Thuson the low-SNR ideal AWGN channel, sequential decoding oflow-rate binary linear convolutional codes can approach theShannon limit within a factor of about 3 dB.

If is large enough, then sequential decoders rarely makeerrors; rather, they fail to decode due to excessive computation.This error-detection property is useful in many applications,such as deep-space telemetry.

Moreover, experiments have shown that bidirectional se-quential decoding of moderate-length blocks can approxi-mately double the Pareto exponent and thus significantlyreduce the gap to capacity, even after giving effect to the rateloss due to zero-tail termination [65].

In view of these desirable properties, it is something ofa mystery why sequential decoding has received so littlepractical attention during the past 30 years.

F. Concatenated Codes and RS Codes

Like the Viterbi algorithm and the Lempel–Ziv algorithm,concatenated codes were originally introduced to solve atheoretical problem [39], but have turned out to be useful fora variety of practical applications.

The basic idea is to use a moderate-strength “inner code”with an ML or near-ML decoding algorithm to achieve amoderate error rate like – at a code rate as close tocapacity as possible. Then a powerful algebraic “outer code”capable of correcting many errors with low redundancy is usedto drive the error rate down to as low an error rate as maybe desired. It was shown in [39] that the error rate could bemade to decrease exponentially with blocklength at any rateless than capacity, while decoding complexity increases onlypolynomially.

Reed–Solomon (RS) codes are ideal for use as outer codes.(They are not suitable for use as inner codes because they arenonbinary.) RS codes are defined over finite fieldswhoseorder is typically large (e.g., ). AnRS code over exists for every and[77]. The minimum distance is as large as possible,in view of the Singleton bound [77].

Very efficient polynomial-time algebraic decoding algo-rithms are known, not just for error correction but also

for erasure-and-error correction and even for near-ML soft-decision decoding (i.e., generalized minimum-distance decod-ing [39]). RS error-correction algorithms are standard in VLSIlibraries, and today are typically capable of decoding dozensof errors per block over at data rates of tens of megabitsper second (Mb/s) [98].

When used with interleaving, RS codes are also ideal burst-error correctors in the sense of requiring the least possibleguard space [40].

For these reasons RS codes are used in a large variety ofapplications, including the NASA deep-space standard adoptedin the 1970’s [24]. RS codes are clearly the outstandingpractical success story of the field of algebraic block coding.

G. Turbo Codes and Other Capacity-Approaching Codes

The invention of “turbo codes” [13] put a decisive end tothe long-standing conjecture that the cutoff ratemight rep-resent the “practical capacity.” Performance within tenths ofa decibel of the Shannon limit is now routinely demonstratedwith reasonable decoding complexity, albeit with large delay[61].

The original turbo code operates as follows. An informationbit sequence is encoded in a simple (e.g.,-state) recur-sive systematic rate- convolutional encoder to produceone check bit sequence. The same information bit sequenceis permuted in a very long (e.g., – bits) interleaverand then encoded in a second recursive systematic rate-convolutional encoder to produce a second check bit sequence.The information bit sequence and both check bit sequences aretransmitted, so the code rate is . (Puncturing can be usedto raise the rate.)

Decoding is performed in an iterative fashion as follows.The received sequences corresponding to the information bitsequence and the first check bit sequence are decoded by asoft-decision decoder for the first convolutional code. Theoutputs of this decoder are a sequence of soft decisions foreach bit of the information sequence. (This is done by someversion of the forward–backward algorithm [5], [6].) Thesesoft decisions are then used by a similar decoder for the secondconvolutional code, which hopefully produces still better softdecisions that can then be used by the first decoder for a newiteration. Decoding iterates in this way for 10 to 20 cycles,before hard decisions are finally made on the information bits.

Empirically, operation within 0.3–0.7 dB of the Shannonlimit can be achieved at moderate error rates. Theoreticalunderstanding of turbo codes is still weak. At low ,turbo codes appear to behave like random codes whose block-length is comparable to the interleaver length [84]. At higher

, the performance of a turbo decoder is dominated bycertain low-weight codewords, whose multiplicity is inverselyproportional to the interleaver length (the so-called “errorfloor” effect) [11].

The turbo decoding algorithm is now understood to bethe general sum-product (APP) decoding algorithm, with aparticular update schedule that is well matched to the turbo-code structure [116]. However, little is known about thetheoretical performance and convergence properties of this

Page 12: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2395

algorithm applied to turbo codes, or more generally to codesdefined on graphs with cycles.

Many variants of the turbo coding idea have been studied.Different arrangements of compound codes have been inves-tigated, including “serially concatenated” codes like those of[39], which mitigate the error floor effect [9]. The decodingalgorithm has been refined in various ways. Overall, however,improvements have been minor.

A turbo coding scheme is now being standardized for futuredeep-space missions [29].

Even more recently, codes and decoding algorithms that ap-proach turbo coding performance have been devised accordingto quite different principles. With these methods turbo-codeperformance has been approached and even exceeded. Notableamong these are low-density parity-check (LDPC) codes,originally proposed by Gallager in the early 1960’s [50],subsequently almost forgotten, and recently rediscovered byvarious authors [76], [96], [116]. Like turbo codes, these codesmay be viewed as “codes defined on graphs” and may bedecoded by similar iterative APP decoding algorithms [116].Long LDPC codes can operate well beyond; furthermore,they have no error floor and rarely make decoding errors, butrather simply fail to decode [76]. Theoretically, it has beenshown that long LDPC codes can achieve rates up to capacitywith ML decoding [50], [76]. Very recently, nonbinary LDPCcodes have been devised that outperform turbo codes [27].However, the parallel (“flooding”) version of the iterativedecoding algorithm that is usually used with these codes iscomputationally expensive, and their lengths are comparableto those of turbo codes.

IV. NONBINARY CODES FORBANDWIDTH-LIMITED CHANNELS

In this section we discuss coding techniques for high-SNR ideal bandwidth-limited AWGN channels. In this regimenonbinary signal alphabets such as-PAM must be usedto approach capacity. Using large-alphabet approximations,we show that the total coding gain of a coded-modulationscheme for the high-SNR ideal AWGN channel is the sum ofa coding gain and a shaping gain. At high SNR’s, the codingand shaping problems are separable.

Shaping gains are obtained by using signal constellations inhigh-dimensional spaces that are bounded by a quasisphericalregion, rather than the cubical region that results from indepen-dent -PAM signaling; or, alternatively, by using points in alow-dimensional constellation with a Gaussian-like probabilitydistribution rather than equiprobably. The maximum possibleshaping gain is a factor of (1.53 dB). We briefly mentionseveral shaping methods that can easily obtain about 1 dB ofshaping gain.

In the high-SNR regime, the SNR gap between uncodedbaseline performance at and the cutoff limitwithout shaping is about 5.8 dB. This gap arises as follows: thegap to the Shannon limit is 9 dB; with shaping the cutoff limitis 1.7 dB below the Shannon limit; and without shaping thecutoff limit is 1.5 dB lower than with shaping. Coding gainsof the order of 3–5 dB at error probabilities of – canbe obtained with moderate complexity.

The two principal classes of high-SNR codes are latticesand trellis codes, which are analogous to binary block andconvolutional codes, respectively. By now the principles ofconstruction of such codes are well understood, and it seemslikely that the best codes have been found. We plot theeffective coding gains of these known moderate-complexitylattices and trellis codes versus the branch complexity of theirminimal trellises, assuming maximum-likelihood decoding.Trellis codes are somewhat superior, due mainly to their lowererror coefficients.

We briefly discuss coding schemes that can achieve codinggains beyond , of the order of 5 to 7 dB, at error prob-abilities of – . Multilevel schemes with multistagedecoding allow the use of high-performance binary codessuch as those described in the previous section to be usedto approach the Shannon limit of high-SNR channels. Suchhigh-performance schemes naturally involve greater decodingcomplexity and large delays.

We particularly note techniques that have been used intelephone-line modem standards, which have generally re-flected the state of the art for high-SNR channels.

A. Lattice Constellations

It is clear from the proof of Shannon’s capacity theoremthat an optimal block code for a high-SNR ideal band-limitedAWGN channel consists of a dense packing of signal pointswithin a sphere in a high-dimensional Euclidean space. Mostof the known densest packings are lattices [25]. In thissection we briefly describe lattice constellations, and analyzetheir performance using the union bound estimate and large-constellation approximations.

An -dimensionallattice is a discrete subgroup of -space , which without essential loss of generality maybe assumed to span . The points of the lattice form auniform infinite packing of . By the group property, eachpoint of the lattice has the same number of neighbors ateach distance, and all decision regions of a minimum-distancedecoder (Voronoi regions) are congruent and tesselate.These properties hold for any lattice translate .

The key geometrical parameters of a lattice are theminimumsquared distance between lattice points, thekissingnumber of nearest neighbors to any lattice point,and thevolume of -space per lattice point, whichis equal to the volume of any Voronoi region [25]. TheHermite parameteris the normalized parameter

, which we will soon identify as the nominalcoding gain of .

A lattice constellation is the finiteset of points in a lattice translate that lie within a compactbounding region of -space. The key geometric propertiesof the region are itsvolume and theaverage energy

per dimension of a uniform probability density functionover :

(4.1)

The normalized second momentof is defined as[25].

Page 13: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

2396 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

For example, an -PAM constellation with even

is a one-dimensional lattice constellation withand . The key geometrical

parameters of are

(4.2)

The key parameters of are

(4.3)

Both and are invariant under scaling, orthogo-nal transformations, and Cartesian products; i.e.,

and , where is any scalefactor, is any orthogonal matrix, and is any positiveinteger. In particular, this implies that forany version of an integer lattice , and that the normalizedsecond moment of any -cube centered at the origin is .

For large lattice constellations, one may use the followingapproximations, the first two of which are collectively knownas thecontinuous approximation[48]:

• the size of the constellation is ;• the average energy per dimension of an equiprobable

distribution over is ;• for large rates , we have

;• the average number of nearest neighbors to any point in

is approximately .

The union bound estimate on probability of block decodingerror is

(4.4)

Because

(4.5)

SNR (4.6)

SNR SNR

(4.7)

we may write the union bound estimate as

SNR (4.8)

where thenominal coding gainof and theshaping gainofare defined, respectively, as

(4.9)

(4.10)

For a baseline -PAM constellation, we have, and the UBE reduces to

SNR

The nominal coding gain measures the increase indensity of over a baseline lattice, or . The shaping gain

measures the decrease in average energy ofrelative toa baseline region, namely, an interval or an -cube . Both contribute a multiplicative factorof gain to the argument of the function.

As before, the effective coding gain is reduced by theerror coefficient . For comparability with the -PAMuncoded baseline, it is appropriate to normalize by thedimension to obtain the probability of block decoding errorper symbol

SNR

(4.11)

in which the normalized error coefficient is.

Graphically, a curve of the form of (4.11) may beobtained simply by moving the baseline curve

SNR of Fig. 2 to the left by and (indecibels), and upward by a factor of .

B. Shaping Gain and Shaping Techniques

Although shaping is a newer and less important topic thancoding, we discuss it first because its story is quite simple.

The optimum -dimensional shaping region is an -sphere. The key geometrical parameters of an-sphereof radius for even are [25]

(4.12)

By Stirling’s approximation, namely as goesto infinity [51], we have

(4.13)

The shaping gain of an -sphere is plotted for dimensionsin Fig. 3. Note that the shaping

gain of a -sphere is approximately 1 dB. However, forlarger dimensions, we see from Fig. 3 that the shaping gainapproaches theultimate shaping gain (1.53 dB) ratherslowly.

The projection of a uniform probability distribution overan -sphere onto one or two dimensions is a nonuniformprobability distribution that approaches a Gaussian density as

. The ultimate shaping gain of (1.53 dB) maybe derived alternatively as the difference between the averagepower of a uniform density over an interval and that of aGaussian density with the same differential entropy [48].

Shaping therefore induces a Gaussian-like probability dis-tribution on a one-dimensional PAM (or two-dimensionalQAM) constellation, rather than an equiprobable distribution.

Page 14: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2397

Fig. 3. Shaping gains ofN -spheres overN -cubes forN = 1; 2; 4; � � � ; 512.

In principle, with spherical shaping, the lower dimensionalconstellation will become arbitrarily large, even with fixedaverage power. In practice, the lower dimensional “shapingconstellation expansion” is constrained by design to a permit-ted peak amplitude. If the -dimensional shape approximatesspherical shaping subject to this constraint, then the lowerdimensional probability distribution approaches a truncatedGaussian distribution [48].

A nonuniform distribution over a low-dimensional constel-lation may alternatively be obtained by codes over bits thatspecify regions of the constellation [16].

With large constellations, shaping can be implemented moreor less independently of coding by operations on the “mostsignificant bits” of PAM or QAM constellation labels, whichaffect the large-scale shape of the-dimensional constellation.In contrast, coding affects the “least significant bits” anddetermines small-scale structure.

Two practical schemes that can easily obtain shaping gainsof 1 dB or more while limiting two-dimensional (2-D) shapingconstellation expansion to a factor of or less are “trellisshaping” and “shell mapping.”

Trellis shaping [45] is a kind of dual to trellis coding. Usinga trellis, one defines a set of equivalence classes of codedsequences. The shaping operation consists of determiningamong equivalent sequences in the trellis the minimum-energysequence. Shaping gains of the order of 1 dB are easilyobtained.

In shell mapping [49], [66], [68], [69], [72] the signalsin an -dimensional lattice constellation are in principlelabeled in order of increasing signal energy, and a one-to-one map is defined between possible input data sequencesand an equal number of least energy constellation points. Inpractice, generating-function methods are used to label pointsin Cartesian-product constellations in approximate increasingorder of energy based on shell-mapping labelings of lowerdimensional constituent constellations. The V.34 modem uses

-dimensional shell mapping, and obtains shaping gains ofthe order of 0.8 dB with 2-D shaping constellation expansionlimited to 25% [47]. These shaping methods are also useful

for accommodating noninteger numbers of bits per symbol.The latter objective is also achieved by a mapping techniqueknown as “modulus conversion” [3], which is an enumerativeencoding technique for mapping large integers into blocks(“mapping frames”) of signals in lexicographical order, wherethe sizes of the signal constellations do not need to be powersof .

C. Coding Gains of Dense Lattices

Finding the densest lattice packings in a given number ofdimensions is a mathematical problem of long standing. Asummary of the densest known packings is given in [25]. Thenominal coding gains of these lattices in up to 24 dimensions isplotted in Fig. 4. Notable lattices include the eight-dimensionalGosset lattice , whose nominal coding gain is(3 dB), andthe 24-dimensional Leech lattice , whose nominal codinggain is (6 dB).

In contrast to shaping gain, the nominal coding gains ofdense -dimensional lattices become infinite as .For example, for any integer , there exists a -dimensional Barnes–Wall lattice whose nominal coding gain is

(which is by no means the densest possible for large)[25].

However, effective coding gains cannot become infinite.Indeed, the Shannon limit shows that no lattice can have acombined effective coding gain and shaping gain greater than9 dB at . This limits the maximum possibleeffective coding gain to 7.5 dB, because shaping gain cancontribute up to 1.53 dB.

What limits effective coding gain is the number of nearneighbors, which becomes very large for high-dimensionaldense lattices. For example, the kissing number of the -dimensional Barnes–Wall (BW) lattice is [25]

BW (4.14)

which yields, for example, BW andBW

Page 15: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

2398 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

Fig. 4. Nominal and effective coding gains of densest known lattices in dimensionsN � 24.

D. Trellis Codes

Trellis codes are dense packings in infinite-dimensionalEuclidean sequence space. Trellis codes are to lattices asbinary convolutional codes are to block codes. We will seethat trellis codes give a better performance/complexity tradeoffthan lattices in the high-SNR regime, just as convolutionalcodes give a better performance/complexity tradeoff than blockcodes in the low-SNR regime, although the difference is notas dramatic.

The key ideas in the invention of trellis codes [104] were

• use of minimum squared Euclidean distance as the designcriterion;

• set partitioning of a constellation into subsets;• maximizing intrasubset minimum squared distances;• rate- convolutional coding over the subsets;• selection of signals within subsets by further uncoded bits

(if necessary).

Most practical trellis codes have used either-dimensionallattice constellations based on versions of with ,

, , or , or -dimensional -PSK constellations withor and . We will focus on lattice-type

trellis codes in this section.Set partitioning of an -dimensional constellation into

subsets is usually performed bypartitioning steps, with eachtwo-way selection being identified by a label bit ,

[104], [105]

(4.15)

The quantities are the intrasubsetminimum squared distances of theth-level (sub)sets ,

, , . The objective of set partitioning is toincrease these distances as much as possible at every subsetlevel. Set partitioning is continued until is at least as largeas the desired minimum squared distance betweentrellis code sequences. The binary-tuples arecalled the subset labels of the final subsets .

Partitioning one- or two-dimensional constellations in thisway is trivial [104]. Partitioning of higher dimensional con-stellations was first performed starting from lower dimensionalpartitions [113]. Subsequently, partitioning of lattice constel-lations was introduced using lattice partitions [17], [42], [43](or, more generally, group-theoretic partitions [44]), as we willdiscuss below.

In almost all cases of practical interest, the two first-levelsubsets and will be geometrically congruent toeach other (see Appendix II-B); i.e., they will differ only bytranslation, rotation, and/or reflection.

An encoder for a trellis code then operates as shown inFig. 5 [105]. Given input bits per -dimensional symbol,

input bits are encoded by a rate- binary convolutionalencoder with states into a coded sequence of subset labels

. At time , the label is used to select subset . Theremaining input bits are used to choose one signalfrom signals in the selected subset . The size ofmust therefore be , or a factor of (the “codingconstellation expansion” factor per dimensions) larger thanneeded to send uncoded bits per symbol. (If there is anyshaping, it is performed on the uncoded bits and results infurther “shaping constellation expansion.”)

Page 16: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2399

Fig. 5. Trellis-code encoder.

A trellis code may be maximum-likelihood decoded by aViterbi algorithm (VA) decoder as follows. Given a receivedpoint , the receiver first finds the closest signal point

in each subset . This is called subset decoding. AVA decoder then finds the code sequence for which thesignals chosen in the subsets are closest to the entire receivedsequence . The decoding complexity is dominated by thecomplexity of the VA decoder, which is approximately givenby the branch complexity of the convolutional code,normalized by the dimension .

In practice, the redundancy of the convolutional code isalmost always one bit per -dimensional symbol

, so that the coding constellation expansion factor isper dimensions. The lowest level (“least significant”)

label bit is then the sole parity-check bit. If the con-volutional code is linear, then in -transform notation thecheck sequence is determined from the informationbit sequences by a linear parity-checkequation of the form

(4.16)

where

is a set of relatively-prime parity-check polynomials. Ifthe greatest degree of any of these polynomials is, thena minimal encoder for the code has states.

The convolutional code is chosen primarily to maximizethe minimum squared distance between trellis-codesequences. If the redundancy is one bit per symbol, thena good first step is to ensure that in any encoder statethe set of possible next outputs is either or , sothat a squared distance contribution of is immediatelyobtained when code sequences diverge from a common state.With a linear convolutional code, this is achieved by choosingthe low-order coefficients of the parity-check polynomials

such that and , . Thesame argument applies in the reverse time direction to thehigh-order coefficients . This design rule guarantees that

[104].An important consequence of such a choice is that at timethe lowest level label bit , which selects or ,

depends only on the encoder state. This observation leads tothe idea offeedback trellis encoding[71], which was proposedduring the development of the V.34 modem standard to permitan improved type of precoding (see Section V). With this

technique, the sequence of encoding operations at timeisas follows:

a) the check bit is determined from the encoder state;b) the input bits at time select a signal from ;c) the remaining label bits are determined

from ;d) the next encoder state is determined from and .

E. Lattice Constellation Partitioning

Partitioning of lattice constellations by means of latticepartitions is done as follows [17], [43]. One starts with an-dimensional lattice constellation , where is almostalways a version of an -dimensional integer lattice . Theconstellation is partitioned into subsets of equalsize that are congruent to a sublatticeof indexin , so that is the union of cosets (translates) of .The subsets are then the points of that lie in eachsuch subset, which form sublattice constellations of the form

. The region must be chosen so that there are anequal number of points in each subset. The sublatticeisusually chosen to be as dense as possible.

Codes based on lattice partitions are calledcosetcodes. The nominal coding gain of a coset code based ona lattice is [42]

(4.17)

where is the redundancy of the convolutionalencoder in bits per dimension. Again, this coding gain is theproduct of a distance gain of over ,and a redundancy loss . The effective coding gain isreduced by the amount that the error coefficientper dimension exceeds the baseline-PAM error coefficientof per dimension. Again, the rule of thumb that an increaseof a factor of two costs 0.2 dB may be used.

The encoder redundancy also leads to a “codingconstellation expansion ratio” of a factor of per twodimensions [42]—i.e., a factor of, , , for 1D, 2D, 4D,

codes, respectively. Minimization of coding constellationexpansion has motivated the use of higher dimensional trelliscodes.

F. Coding Gains of Known Trellis Codes

Fig. 6 shows the effective coding gains of important familiesof trellis codes for lattice-type signal constellations. Theeffective coding gains are plotted versus their VA decoding

Page 17: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

2400 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

Fig. 6. Effective coding gain versus complexity for Ungerboeck and Wei codes; V.32 and V.34 codes circled.

complexity, measured by a detailed operation count. The codesconsidered are as follows:

1) The original 1D (PAM) trellis codes withof Ungerboeck [104], based on rate- convolutionalcodes with and the four-way partition .

2) The 2D (QAM) trellis codes with ofUngerboeck [104], based on (except for the simplestcode with ) rate- convolutional codes with

and the eight-way partition (whereis a Hadamard matrix).

3) The 4D trellis codes with of Wei [113],based on

a) rate- - and -state convolutional codes and theeight-way partition ;

b) a rate- -state convolutional code and the 16-way partition ;

c) a rate- -state convolutional code and the 32-way partition ;

4) Two families of 8D trellis codes of Wei [113].

The V.32 modem (1984) uses an eight-state 2D trellis code,also due to Wei [112]. The performance/complexity tradeoffis the same as that of the original eight-state 2D Ungerboeckcode. However, the Wei code uses a nonlinear convolutionalencoder to achieve 90rotational invariance. This code has aneffective coding gain of about 3.6 dB, a branch complexity of

per two dimensions, and a coding constellation expansionratio of .

The V.34 modem (1994) specifies three 4D trellis codes,with performance and complexity equivalent to the 4D Weicodes circled on Fig. 6 [47]. All have a coding constellationexpansion ratio of . The -state code, which is the onlycode implemented by most manufacturers, is the original-

state 4D Wei code, which has an effective coding gain of about4.2 dB and a branch complexity of per four dimensions.The -state code is due to Williams [117] and is based ona 16-way partition of into 16 subsets congruent to ,where is a Hadamard matrix, to ensure that there areno minimum-distance error events whose length is only twodimensions; it has an effective coding gain of about 4.5 dB anda branch complexity of per four dimensions. The -statecode is a modification of the original 4D Wei code, modified toprevent quasicatastrophic error propagation; it has an effectivecoding gain of about 4.7 dB and a branch complexity ofper four dimensions.

It is noteworthy that no one has improved on the perfor-mance/complexity tradeoff of the original 1D and 2D trelliscodes of Ungerboeck or the subsequent multidimensionalcodes of Wei. By this time it seems safe to predict that noone will ever do so. There have, however, been new trelliscodes that feature other properties and have about the sameperformance and complexity, as described in the previoustwo paragraphs, and there may still be room for furtherimprovements of this kind.

Finally, we see that trellis codes have a performance/complexity advantage over lattice codes, when used withmaximum-likelihood decoding. Effective coding gains of4.2–4.7 dB, better than that of the Leech lattice or of BW,are attainable with less complexity and much less constellationexpansion. With the -state 1D or 2D trellis codes, effectivecoding gains of the order of 5.5 dB can be achieved. Thesegains are larger than the gains that can be obtained with latticecodes of far greater complexity.

On the other hand, it seems very difficult to obtain effectivecoding gains approaching 6 dB. This is not surprising, becauseat the effective coding gain at the Shannon limit

Page 18: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2401

is about 7.5 dB, and at the cutoff rate limit it is about 5.8 dB.To approach the Shannon limit, codes and decoding methodsof higher complexity are necessary.

So far in this section we have only discussed trellis codes forlattice-type signals. However, the principle of set partitioningand the general encoder structure of Fig. 5 are also valid fortrellis codes with signals from constellations which are not oflattice type, such as 8-PSK and 16-PSK constellations. Codetables for 8-PSK and 16-PSK trellis codes are given in [104]and [105]. A number of multidimensional linear PSK-typetrellis codes are described in [114], all fully rotationally invari-ant. In [87], principles of set partitioning of multidimensional

-PSK constellations are given, based on concepts ofmultilevel block coding; tables of linear -PSK trelliscodes are presented for and withthe largest degree of rotational invariance given for each code;and figures similar to Fig. 6 that summarize effective codinggains versus decoding complexity are shown.

G. Sequential Decoding in the High-SNR Regime

In the high-SNR regime, the cutoff rate is a factor of(1.68 dB) away from capacity [95]. Therefore, sequential

decoders should be able to achieve an effective coding gain ofabout 5.8 dB at . Experiments have confirmedthat sequential decoders can indeed achieve such performance[111].

H. Multilevel Codes and Multistage Decoding

To approach the Shannon limit even more closely, it is clearthat much more powerful codes must be used, together withdecoding methods that are simpler than optimal maximum-likelihood (ML) decoding, but with near-ML performance.Multilevel codes and multistage decoding may be used forthis purpose [60], [75]. Multilevel coding may be based on achain of sublattices of

(4.18)

which lead to a chain of lattice partitions ,. A different trellis encoder as in Fig. 5 may be used

independently on each such lattice partition.Remarkably, it follows from the chain rule of mutual

information that the capacity that can be obtainedby coding over the lattice partition is equal to thesum of the capacities that can be obtained byindependent coding and decoding at each level [46], [58],[67], [110]. With multistage decoding, decoding is performedseparately at each level; at theth level the decisions at lowerlevels are taken into account, whereas nocoding is assumed for the higher levels . If thepartition is “large enough” and appropriately scaled,then approaches the capacity of the ideal AWGNchannel.

The lattices may be the one- or two-dimensional integerlattices or , and the standard binary partition chains

(4.19)

may be used. Then a powerful binary code with rate close tocan be used at each level to approach the Shan-

non limit. In particular, by using turbo codes of appropriaterate at each level, it has been shown that reliable transmissioncan be achieved within 1 dB of the Shannon limit [109].

Powerful probabilistic coding methods such as turbo codesare really needed only at the lower levels. At the higherlevels, the channels become quite clean and the capacity

approaches , so that the desiredredundancy approaches zero. For these levels, algebraic codesand decoding methods may be more appropriate [46], [110].

In summary, multilevel codes and multistage decoding allowthe Shannon limit to be approached as closely in the high-SNR regime as it can be approached in the low-SNR regimewith binary codes. The state of the art in multilevel coding isreviewed in [110].

I. Other Capacity-Approaching NonbinaryCoded-Modulation Schemes

Various capacity-approaching alternative schemes to multi-level coding have been proposed.

One scheme, calledbit-interleaved coded modulation(BICM) [15], has been developed primarily for fadingchannels, but has turned out to be capable of approachingthe capacity of high-SNR AWGN channels as well. A BICMtransmitter comprises an encoder for a binary code, a “bitinterleaver,” and a signal mapper. The output sequence ofthe bit interleaver is segmented into-bit blocks, whichare mapped into a -point signal constellation using Graycoding. Although BICM cannot operate arbitrarily closely tocapacity, good performance close to capacity can be obtainedif is a long parallel or serially concatenated turbo code, andthe decoder is an iterative turbo decoder.

Other schemes that are more straightforward extensions ofstandard binary turbo codes are calledturbo trellis-coded mod-ulation (TTCM) [89] and parallel concatenated trellis-codedmodulation (PCTCM) [8]. Instead of two binary systematicconvolutional encoders, two rate- trellis encodersare used. The interleaver is arranged so that the two encodedoutput bits of the two encoders are aligned with the same

information bits. To avoid excessive rate loss, a specialpuncturing technique is used in TTCM: only one of the twocoded bits is transmitted with each symbol, and the twoencoder outputs are chosen alternately.

V. CODING AND EQUALIZATION

FOR LINEAR GAUSSIAN CHANNELS

In this section we discuss coding and equalization methodsfor a general linear Gaussian channel. We first present the“water-pouring” solution for the capacity-achieving transmitspectrum. Then we describe the multicarrier approach toapproaching capacity. The remaining parts of the section aredevoted to serial transmission. We show how the linear Gauss-ian channel may be converted, without loss of optimality, toa discrete-time AWGN channel with intersymbol interference.We then give a pictorial illustration of the coding principlesrequired to approach capacity. Finally, we review practical

Page 19: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

2402 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

Fig. 7. Optimization of transmit p.s.d.P (f) by “water pouring.”

precoding methods that have been developed to implementthese principles.

A. Water Pouring

The general linear Gaussian channel was characterized inSection II as a real channel with input signal , impulseresponse , and additive Gaussian noise with one-sidedp.s.d. . The channel output signal is(see Fig. 8 below). In addition, the transmit signal has tosatisfy a power constraint of the form

(5.1)

where is the one-sided p.s.d. of . The channel SNRfunction for this channel is defined as

SNR (5.2)

Intuitively, the preferred transmission band is where SNRis largest.

The optimum maximizes the mutual information be-tween channel input and output subject to the power constraint(5.1) and . A Lagrange multiplier argument showsthat the largest mutual information is achieved by the “water-pouring” solution [94], illustrated in Fig. 7 for a typicalSNR

SNR(5.3)

where is called the capacity-achievingband, and is a constant chosen such that (5.1) is satisfiedwith equality. Capacity is achieved if is a Gaussianprocess with p.s.d. , and in bits per second is equal to

SNR (5.4)

In analogy to an ideal AWGN channel, a linear Gaussianchannel may be roughly characterized by two parameters, itsone-sided bandwidth , and itseffective signal-to-noise ratioSNR , defined implicitly such that

SNR (5.5)

Comparison with (5.4) yields

SNR SNR

(5.6)In other words, SNR is the geometric mean of

SNR over . In further analogy to an ideal AWGN

channel, it is then also appropriate to define the normalizedsignal-to-noise ratio for a system operating at rate(in b/dim)on a general linear Gaussian channel as

SNR SNR (5.7)

where SNR again measures the “gap to capacity.”The capacity-achieving band is the most important feature

of the water-pouring spectrum. For many practical channels,will be a single frequency interval , as illustrated in

Fig. 7. However, in other cases, e.g., in the presence of severenarrow-band interference, may consist of multiple frequencyintervals. The dependency of mutual information on the exactchoice of is not as critical as its dependency on.Mutual information achieved with a flat spectrum overisoften nearly equal to capacity [64], [90].

B. Approaching Capacity with Multicarrier Transmission

One way of approaching capacity is suggested directly bythe water-pouring argument. The capacity-achieving bandmay be divided into disjoint subbands of small enough width

that SNR and thus are nearly constant overeach subband. Each subband can then be treated as an idealAWGN channel with bandwidth . Multicarrier transmissionmethods that can be used for this purpose are discussed inAppendix I.

The transmit power allocated to a subband centered atshould be approximately . Then the signal-to-noiseratio in that subband will be approximately SNR ,and the subband capacity will be approximately

SNR (5.8)

As is made sufficiently small, the aggregate powerin all subbands approaches and the aggregate capacityapproaches as given in (5.4). To approach this capacity,powerful coding in each subband at a rate near thesubband capacity is needed. To minimize delay,coding should be applied “across subbands” with variablenumbers of bits per symbol in each subband [14], [90].

Multicarrier transmission is inherently well suited for chan-nels with a highly frequency-dependent channel SNR function,particularly for channels whose capacity-achieving bandconsists of multiple intervals. The symbol length in eachsubchannel is of the order of , so that symbols becomelong, generally much longer than the response of thechannel. This makes multicarrier transmission less sensitive tomoderate impulsive noise than serial transmission, and simpli-fies equalization. However, the resulting delay is undesirablefor some applications.

As the transmitted signal is the sum of many indepen-dent components, its distribution tends to be Gaussian. Theresulting large peak-to-average ratio (PAR) is a potentialdisadvantage of multicarrier transmission. However, recentlyseveral methods of controlling the PAR of multicarrier signalshave been developed [81], [82].

C. Approaching Capacity with Serial Transmission

Alternatively, the capacity of a linear Gaussian channel maybe approached by serial transmission. In this subsection, we

Page 20: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2403

Fig. 8. Conversion of a continuous-time linear Gaussian channel to an equivalent canonical discrete-time linear Gaussian channel.

first examine the conversion of the waveform channel to anequivalent discrete-time channel with intersymbol interference.This is achieved by using a transmit filter that shapes thetransmit spectrum to the optimal water-pouring form, andemploying a sampled matched filter (MF) or whitened matchedfilter (WMF) in the receiver. With the WMF, a canonicaldiscrete-time channel with trailing intersymbol interference(ISI) and i.i.d. Gaussian noise is obtained, as depicted inFig. 8. We show that the contribution of ISI to channel capacityvanishes at high SNR. This suggests that combining an ISI-canceling equalization technique with ideal-channel codingand shaping will suffice to approach capacity. In the followingtwo subsections, we will then discuss the implementation ofthis approach.

We assume that the capacity-achieving bandis a singlepositive-frequency interval of width .If consists of several frequency intervals, then the sameapproach may be used over each interval separately, althoughthe practical attractiveness of this approach diminishes withmultiple intervals. We concentrate on minimum-bandwidthpassband transmission of complex symbols at modulation rate

. The transmission of complex signals over a realchannel has been discussed in Section II.

In Fig. 8, the transmitted real signal is

(5.9)

where ideally should be a sequence of i.i.d. complexGaussian symbols with variance (average energy) persymbol, and is a positive-frequency transmit symbolresponse with Fourier transform . The transmit filter ischosen so that the p.s.d. of is the water-pouring p.s.d.

; i.e.,

(5.10)

In the receiver, the received real signal is first passedthrough a filter which suppresses negative-frequency compo-nents and whitens the noise in the positive-frequency band.This filter may be called a noise-whiteningHilbert splitter,because in terms of one-dimensional signals it “splits” thepositive-frequency component of a real signal into its real andimaginary parts. The transfer function of this filter is

chosen such that at least over the band, and for . The resulting complex signal

is then

(5.11)

where is a symbol response with Fourier transform, and is normalized AWGN

with p.s.d. over the signal band .Because the set of responses represents a basis

for the signal space, by the principles of optimum detectiontheory [118] the set of -sampled matched-filter outputs

(5.12)

of a matched filter (MF) with response is a set ofsufficient statistics for the detection of the symbol sequence

. Thus no loss of mutual information or optimality occursin the course of reducing to the sequence . Thecomposite receive filter consisting of the noise-whitening filterand the MF has the transfer function

(5.13)

The Fourier transform of the end-to-end symbol response

(5.14)

has the properties of a real nonnegative power spectrum thatis band-limited to . Moreover, is equal to the noisep.s.d. at the MF output, because

(5.15)

The sampled output sequence of the MF is given by

(5.16)

where the coefficients are the sample values ofthe end-to-end symbol response . In general, the discrete-time Fourier transform of the sequence is the -aliasedspectrum

(5.17)

Page 21: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

2404 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

In our case, because is limited to a positive-frequencyband of width , there is no aliasing; i.e., the

-periodic function is equal to within .Because , is a Gaussian noise sequencewith autocorrelation sequence . Note thatis Hermitian-symmetric, because is real.

So far, we have obtained without loss of optimality anequivalent discrete-time channel model which can be writtenin -transform notation as

(5.18)

where is Hermitian-symmetric. We now proceed todevelop an alternative, equivalent discrete-time channel modelwith a causal, monic, minimum-phase (“canonical”) response

and white Gaussian noise . For this purpose, weappeal to the discrete-time spectral factorization theorem:

Spectral Factorization Theorem (Discrete-Time)[83]: Letbe an autocorrelation sequence with-transform

, and assume its discrete-time Fouriertransform satisfies the discrete-timePaley–Wiener condition

where denotes integration over any interval of width .Then can be factored as follows:

(5.19)

where is causal ( for), monic ( ), and minimum-phase, and

is the discrete-time Fourier transform of .The factor is the geometric mean of over a band ofwidth ; i.e.,

(5.20)

where the logarithms may have any common base.

The discrete-time Paley–Wiener criterion implies thatcan have only a discrete set of algebraic zeroes. Spectralfactorization is further explained in Appendix III.

A causal, monic, minimum-phase response is calledcanonical. The spectral factorization above is unique under theconstraint that be canonical.

If satisfies the Paley–Wiener condition, then (5.18)can be written in the form

(5.21)

in which is an i.i.d. Gaussian noise sequence withsymbol variance . Filtering by yieldsthe channel model of theequivalent canonical discrete-timeGaussian channel

(5.22)

where is an i.i.d. Gaussian noise sequence with variance. This requires that has a stable inverse, and hence

has a stable anti-causal inverse, which is true ifhas no spectral zeroes.

However, the invertibility of is not a serious issuebecause can be obtained directly as the sequence ofsampled outputs of awhitened matched filter(WMF) [41].The transfer function of the composite receive filter consistingof the noise-whitening filter and the WMF is given by

(5.23)

The only condition for the stability of this filter is the Pa-ley–Wiener criterion. It is shown in [41] that the time responseof this filter is always well defined.

(The channel model expressesthe output sequence as the sum of a noise-free sequence

and additive white Gaussian noise. Ifis an uncoded sequence of symbols from a finite signal

constellation, and is of finite length, then the noise-free received signal is the output ofa finite-state machine, and the sequence may be op-timally detected by maximum-likelihood sequence detection(MLSD)—i.e., by the Viterbi algorithm [41]. Alternatively, asshown in [103], MLSD may be performed directly on the MFoutput sequence , using a trellis of the same complexity.However, the main point of the development of this sectionis that MLSD is not necessary to approach capacity; powerfulcoding of plus tail-canceling equalization suffices!)

From the sequence of sampled WMF outputs thesequence of sampled MF outputs can always be obtainedby filtering with the stable filter response .Because is a set of sufficient statistics for the estimationof , must equally be a set of sufficient statistics. Themutual information between the channel input and the MF orWMF output sequence is therefore equal to the capacityas given in (5.4). Because

SNR (5.24)

and , the capacity and its high-SNRapproximation can be written as

(5.25)

We now show that at high SNR the capacity ofthe linear Gaussian channel is approximately equal to thecapacity of the ideal ISI-free channel that would be obtained ifsomehow ISI could be eliminated from the sequence . Inother words, ISI does not contribute significantly to capacity.This observation was first made by Price [85].

Suppose that somehow the ISI-causing “tail” of the responsecould be eliminated, so that in the receiver

rather than couldbe observed. The signal-to-noise ratio of the resulting idealAWGN channel would be SNR - . Thus thecapacity of the ISI-free channel, and its high-SNR approxi-mation, would be

- (5.26)

Page 22: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2405

Fig. 9. Coding objectives for a linear Gaussian channel with a known channel responseH.

Price observed that at high SNR - , i.e.,

-

(5.27)

where the equality in (5.27) follows from (5.20).An alternative “minimum-mean-squared error” (MMSE)

form of this result has been obtained in [22] and [23].This result shows that if both ISI and bias are eliminatedin an MMSE-optimized receiver, then the resulting signal-to-noise ratio SNR - is equal to SNR for anylinear Gaussian channel at any signal-to-noise ratio. If theinterference in the MMSE-optimized unbiased receiver wereGaussian and uncorrelated with the signal, then this wouldimply - with equality at all SNR’s;however, in general, this relation is again only approximate.

Price’s result and the alternative MMSE result suggeststrongly that the combination of ISI-canceling equalizationmethod and powerful ideal-channel coding and shaping willsuffice to approach channel capacity on any linear Gaussianchannel, particularly in the high-SNR regime.

D. Coding Objectives for Linear ISI-AWGN Channels

An intuitive explanation of the coding objectives for linearGaussian channels is presented in Fig. 9.

Signal sequences are shown here as finite-dimensional vec-tors , , , and . The convolution thatdetermines the noise-free output sequence becomes a vectortransformation by a channel matrix . Becausethe canonical response is causal and monic ,the matrix is triangular with an all-ones diagonal, so ithas unit determinant: . It follows that the lineartransformation from to is volume-preserving[23].

To achieve shaping gain (minimize transmit power for agiven volume), at the channel input the constellation points

should be uniformly distributed within a hypersphere. Thechannel matrix transforms the hypersphere into a hy-perellipse of equal volume containing an equal number ofconstellation points .

To achieve coding gain, the noise-free channel output vec-tors should be points in an infinite signal setwith gooddistance properties (figuratively shown as an integer lattice). If

the transmitter knows the channel matrix, it can predistortthe input vectors to be points in an appropriately predistortedsignal set (figuratively shown as a skewed integerlattice). The volumes and per point in thetwo signal sets are equal.

The noise-free output vectors are observed in the presenceof i.i.d. Gaussian noise . A minimum-distance detector forthe infinite signal set can therefore obtain the coding gainof on an ideal AWGN channel.

In summary, coding and shaping take place in two differentEuclidean spaces, which are connected by a known volume-preserving linear transformation. Coding and shaping may beseparately optimized, by choosing an infinite signal set witha large coding gain (density) for an ideal AWGN channel inthe coding space, and by choosing a shaping scheme with alarge shaping gain (small normalized second moment) in theshaping space. The channel intersymbol interference may beeliminated at the channel output by predistortion of the infinitesignal set at the input, based on the known channel response.

E. Precoding and Trellis Coding for Linear Gaussian Channels

Precoding is a pre-equalization method that achieves thecoding objectives set out in Section V-D, assuming that thecanonical channel response is known at the transmitter.A pre-equalized sequence is transmittedsuch that

a) the output sequence is theoutput of an apparently ISI-free ideal channel with inputsequence ;

b) the noise-free output sequence may therefore bea coded sequence from a code designed for the idealAWGN channel, and may be decoded by an ideal-channel decoder;

c) redundancy in may be used to minimize the aver-age power of the input sequence ,or to achieve other desirable characteristics for .

Precoding was first developed for uncoded-PAM trans-mission in two independent masters’ theses by Tomlinson [97]and Harashima [54]. Its application was not pursued at thattime because decision-feedback equalization was, and still is,the preferred ISI-canceling method for uncoded transmission.

Page 23: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

2406 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

Fig. 10. Tomlinson–Harashima precoding for one-dimensionalM -PAM transmission, with or without trellis coding.

With trellis coding, however, decision-feedback equalizationis no longer attractive, because reliable decisions are notavailable from a Viterbi decoder without significant delay.As we shall see, Tomlinson–Harashima precoding can becombined with trellis coding, but not with shaping.

Trellis precoding [35] was the first technique that allowedcombining trellis coding, precoding, and shaping. However,in trellis precoding, coding and shaping are coupled, whichto some degree inhibits control of constellation characteris-tics such as peak-to-average ratio (PAR). Therefore, duringthe development of the V.34 modem standard techniquescalled “flexible precoding” were proposed independently byEyuboglu [80], Cole and Goldstein [52], and Laroiaet al.[73]. In flexible precoding, coding and shaping are decoupled.Later, Laroia [71] proposed an “ISI precoding” technique thatused feedback trellis coding (see Section IV-D) to reduce“dither power.” Further improvements were made by Cole andEyuboglu [53] and by Betts [4], and the [53] scheme was ulti-mately adopted for the V.34 standard [62]. Recently, feedbacktrellis coding has been combined with Tomlinson–Harashimaprecoding [20].

We will describe Tomlinson–Harashima precoding and flex-ible precoding, with and without trellis coding. The generalprinciple applied in these schemes is as follows. Let thecanonical channel response bewritten as . Then the noise-free outputsequence of the channel is given by

(5.28)

where is the sequence of trailing intersymbol inter-ference. In all types of precoding, the precoder determines

from past signals and sendsso that the channel output will be a desired signal.

Tomlinson–Harashima (TH) precodingis illustrated inFig. 10 for one-dimensional transmission with-PAM signalconstellations as defined in Section II-C. In the language oflattice constellations introduced in Section IV-A, an-PAMsignal constellation with a signal spacing of is definedas

where for even and for odd, andis the fundamental Voronoi region

of the sublattice . The translates of by integer

multiples of then cover (“tile”) all of real signal space, soevery real signal may be expressed uniquely as for

and some integer . The uniqueso determined is called “ .”

Fig. 10 is drawn such that one can easily see that thenoiseless channel-output sequence is ,where is the sequence of PAM signals to betransmitted, and is an integer sequence generated by the“ ” unit in the precoder so that the elements of thetransmitted sequence are contained in ,and the elements of are in . The decoder operateson the noisy sequence and outputsthe sequence , which is then reduced by the “ ”unit to the sequence . Inthe absence of decoding errors, and therefore

. Note that “inverse precoding” is memoryless,so no error propagation can occur. Because the channelresponse does not need to be inverted in the receiver,

may exhibit spectral nulls, e.g., contain factors of theform .

The combination of trellis coding with TH precoding re-quires that is a valid code sequence.For practically all trellis codes, when is a multiple of and

is a code sequence, then is also a code sequence(in particular, this holds for “ ” or “ ” coset codes[42], [43]). Thus trellis coding and Tomlinson–Harashimaprecoding are easily combined.

With a more complicated constellation, trellis codingcan be combined with TH precoding by using the idea offeedback trellis coding [20], provided that a) the first-levelsubsets and of are congruent; b) the constellationshape region , namely, the union of the Voronoi regionsof the signals in , is a space-filling region (has the “tiling”property). Condition b) obviously holds for -PAM andsquare -QAM constellations. It also holds for a12-QAM “cross” constellation, for example, but not for a 32-QAM cross constellation. We note that signal constellationswhose sizes are not powers of two have become practicallyimportant in connection with mapping techniques such as shellmapping and modulus conversion (see Section IV-B).

In summary, Tomlinson–Harashima precoding permits thecombination of trellis coding with ISI-canceling (DFE-equivalent) equalization. Its main problem is the requirementthat the signal space can be “tiled” with translates of the

Page 24: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2407

Fig. 11. Flexible precoding for one-dimensionalM -PAM transmission, with or without trellis coding.

constellation shape region. Tomlinson–Harashima precodingresults in a small increase in transmit signal power, equal tothe average energy of a random “dither variable” uniformlydistributed over the Voronoi region around the constellationsignals. If the Voronoi region is , then the increasein signal power is , which is negligible for largeconstellations.

Flexible (FL) precodingis illustrated in Fig. 11, again forone-dimensional -PAM transmission. The general conceptof FL precoding is to subtract from a sequence of -PAM signals a sequence of smallest “dither” variablessuch that the channel-output sequence is a valid uncodedor coded sequence with elements in . The constellationshape region of the signal constellation can be arbitrary,so any shaping scheme can be used. At time, given theISI term , the “ ” unit in the precoder determines theinteger that is closest to . The dither variable isthe difference . Clearly, lies in the Voronoiinterval . It will typically be uniformlydistributed over , and thus increases the average energyof the transmit signal by .

As can be seen from Fig. 11, the sequence of noiselesschannel-output signals becomes

The decoder operates on the noisy sequenceand produces the sequence . To obtain the sequence

, the sequence is recoveredby channel inversion as shown in Fig. 11. A “bit-identical”realization of the two functional blocks shown in dashedblocks in Fig. 11 is absolutely essential.

The combination of flexible precoding with feedback trelliscoding works as follows. At time, the transmitter knows theinteger and the current trellis-code state. To continue avalid code sequence at the noiseless channel output, thetransmitter selects a data symbolsuch that is inthe first-level infinite subset or determined by .The symbol is chosen to lie in a desired constellation shaperegion , or according to some other shaping scheme. Thesymbol then determines the next state , as explainedin Section IV-D.

The main advantage of flexible precoding is that anyconstellation-shaping method can be used, whereas withTomlinson–Harashima precoding or trellis precoding, shapingis connected with coding. The main disadvantage is thatdecoding errors tend to propagate due to the channel inversionin the receiver, which is particularly troublesome on channelswith spectral nulls. A technique to mitigate error propagationin the inverse precoder is described in [37]. The mainapplication of flexible precoding so far has been in V.34modems. The application of precoding in digital subscriberlines has been studied in [38].

In conclusion, at high SNR’s, precoding in combination withpowerful trellis codes and shaping schemes allows the capacityof an arbitrary linear Gaussian channel to be approached asclosely as capacity can be approached on an ideal ISI-freeAWGN channel, with about the same coding and shapingcomplexity.

VI. FINAL REMARKS

Shannon’s papers established ultimate limits, but gave noconstructive methods to achieve them. They thereby threwdown the gauntlet for the new field of coding. In particular,they established fundamental benchmarks for linear Gaussianchannels that have posed tough challenges for subsequent codeinventors.

The discouraging initial progress in developing good con-structive codes was captured by the folk theorem [119]: “Allcodes are good, except those that we know of.” However, bythe early 1970’s, good practical codes had been developed forthe low-SNR regime:

• moderate-constraint-length binary convolutional codeswith Viterbi decoding for 3–6-dB coding gain withmoderate complexity;

• long-constraint length convolutional codes with sequen-tial decoding to reach ;

• oncatenation of RS outer codes with algebraic decodingand moderate-complexity inner codes to achieve very lowerror rates near .

By the 1980’s, the invention of trellis codes enabled similarprogress to be made in the high-SNR regime, at least forthe ideal band-limited AWGN channel. In the 1990’s, these

Page 25: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

2408 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

gains were extended to general linear Gaussian channels byuse of multicarrier modulation, or alternatively by use ofsingle-carrier modulation with transmitter precoding.

Notably, many of these techniques were developed bypractically oriented engineers in standards-setting groups andin industrial R&D environments. Moreover, as a consequenceof advances in VLSI design and technology, the lag betweeninvention and appearance in commercial products has oftenbeen very short.

By now, it certainly appears that the field of modulation andcoding for Gaussian channels has reached a mature state, atleast below the cutoff rate , even though very few systemsreaching have ever actually been implemented. In thebeyond- regime, however, the invention of turbo codes hastouched off intense activity that seems likely to continue wellinto the next half-century.

APPENDIX IMULTICARRIER MODULATION

The history of multicarrier modulation began more than 40years ago with an early system called Kineplex [30] designedfor digital transmission in the HF band. Work by Holsinger[57] and others followed.

The use of the discrete Fourier transform (DFT) for modula-tion and demodulation was proposed in [115]. DFT-based mul-ticarrier systems are now referred to asorthogonal frequency-division multiplexing(OFDM) or discrete multitone(DMT)systems [2], [14], [56], [90]. Basically, OFDM/DMT is aform of frequency-division multiplexing (FDM) in whichmodulation symbols are transmitted in individual subchannelsusing QAM or CAP modulation (see Section II-B). Current ap-plications of OFDM/DMT include digital audio broadcasting(DAB) [33] and asynchronous digital subscriber lines (ADSL)[1].

Another multicarrier transmission technique of the FDMtype became more recently known asdiscrete wavelet multi-tone(DWMT) modulation [91], [102]. DWMT modulation hasits origin in filter banks for subband source coding. Efficientimplementations of DWMT modulation and demodulationemploy the discrete cosine transform (DCT).

A comprehensive treatment of multirate systems and filterbanks is given in [106]. Recent reviews of theory and appli-cations of filter banks and wavelet transforms are presentedin [99].

The coding aspects of multicarrier systems have been dis-cussed in Section IV-B. In this appendix, we give briefdescriptions of DMT and DWMT modulation (together withsome background information). For the purposes of this appen-dix, we assume transmission over a noise-free discrete-timereal linear channel which is modeled by ,where is the channel response.

A. DFT-Based Filter Banks and DiscreteMultitone (DMT) Modulation

Before specifically addressing DMT modulation, we exam-ine the general concept of DFT-based multicarrier modulation.These systems subdivide the channel into narrowbandsubchannels whose center frequencies are spaced by integer

multiples of the subchannel modulation rate . Letbe the sequence of generally complex data symbols transmittedin the th subchannel, and let ,denote the -vector transmitted over these channels attime .

A DFT-based “synthesis” filter bank generates at ratethe transmit signals

(AI.1)

where , is a real causal symbol response witha baseband spectrum . This -sampled response issometimes called the “prototype response.” The least integer

such that for all is called the“overlap factor.” Note that the sequence has an -periodic spectrum and is composed of subchannels whosecenter frequencies are located at integer multiplies of ,as illustrated in Fig. 12.

If we express the time index as , withand , then (AI.1) becomes

(AI.2)

where is the inverse discreteFourier transform (IDFT) of , ,and , , for ,is the -sampled th-phase component of the prototyperesponse. Thus (AI.2) shows that the signals of may begenerated by independent convolutions of sequences of thetime-domain symbols with the corresponding phasecomponents of the prototype response. Such a systemis called a DFT-based polyphase filter bank [7], [106].

To obtain real transmit signals, the frequency-domainmodulation symbols must exhibit Hermitian symmetry; i.e.,

. Then the IDFT coefficients are realand hence the signals are real. The restrictionto Hermitian symmetry allows for real-signal degrees offreedom, i.e., it permits mappings of real data symbols

, into complex symbols ,. For even, one possible mapping is

(AI.3)

In the absence of distortion, the IDFT coefficients can berecovered from by a polyphase “analysis” filterbank, which performs the matched filter operations

(AI.4)

Page 26: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2409

Fig. 12. Spectrum of DFT-based multicarrier modulation (N even).

If the orthogonality conditions

(AI.5)

are satisfied for all and , then. Finally, the -vector , is

obtained as the DFT of , .We note from (AI.5) that in an ideal system each of the

phase components of the prototype response must individuallysatisfy the Nyquist condition for zero-ISI transmission in thatsubchannel. Then, summing (A1.4) overshows that also the prototype response satisfies the orthogo-nality condition for transmission at rate .

The orthogonality conditions are trivially satisfied if theprototype response is a sampled rectangular pulse of length,i.e., . In this case, nopolyphase filtering is required. However, because the spectraof the subchannels will then overlap according to theshape of the prototype response spectrum, small amounts ofchannel distortion can be sufficient to destroy orthogonalityand cause severe interchannel interference (ICI). For someapplications, a higher spectral concentration of the subchannelspectra will therefore be desirable. This leads to a filter designproblem, which consists in minimizing for given filter length,i.e., for given overlap factor , the spectral energy ofthe prototype response outside of the band whileapproximately maintaining orthogonality.

For DMT modulation, for is used, andthe transmitted signals could simply be the unfiltered IDFTcoefficients:

(AI.6)

In the absence of channel distortion, this simple schemewould be sufficient. However, if the channel responsehas length (i.e., ), then the lastIDFT coefficients of every transmitted block of IDFTcoefficients would interfere with the first coefficients ofthe next block (assuming ). In principle, combinationsof pre-equalization and post-equalization (before and after theDFT operation in the receiver) could be used to mitigate theeffects of this interference.

Practical DMT systems employ the simpler method of“cyclic extension,” originally suggested in [86], to cope withdistortion. If the channel response has at most length, then

every block of IDFT coefficients is cyclically extended bycoefficients. For example, with “cyclic prefix extension” the

th block is extended from length to length asfollows:

(AI.7)

Assume the actual length of the channel response is. Then the received sequence contains cyclic signal

repetitions within windows of length for every receivedblock of extended length . The receiver selects blocksof signals of length such that the received signals at thebeginning and end of these blocks are cyclically repeating.The DFT of these blocks is then

(AI.8)

where is the Fourier transform of the channel re-sponse at . Multiplication with the estimated inversespectral channel response yields the desired symbols .With this “one tap per symbol” equalization method, distortionis dealt with in a simple manner at the expense of a rate lossby the factor of . For example, the ADSL standard[1] specifies and , which results in a lossof 5.8%.

In practical DMT systems, one may not rely entirely on thecyclic-extension method to cope with distortion. An adaptivepre-equalizer may be employed to minimize the channel-response length prior to the DFT-based receiver operationsdescribed above. The problem is essentially similar to adjust-ing the forward filter in a decision-feedback equalizer or anMLSD receiver such that the channel memory is truncated toa given length [21], [36].

B. Cosine-Modulated Filter Banks and DiscreteWavelet Multitone (DWMT) Modulation

Cosine-modulated filter banks were first introduced forsubband speech coding. The related concept of quadraturemirror filters (QMF) was probably first mentioned in [32]. Inthe field of digital communications, multicarrier modulationby means of cosine-modulated filter banks is referred to asDWMT modulation [91], [102]. The main feature of DWMTis that all signal processing is performed on real signals.

Page 27: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

2410 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

Fig. 13. Spectrum of cosine-modulated multicarrier modulation.

DWMT can be understood as a form of multiple carrierlesssingle-sideband modulation (CSSB, see Section II-B).

The sequences , , are now sequencesof real modulation symbols. DWMT employs a real prototyperesponse with a baseband spectrum that satisfiesthe Nyquist criterion for ISI-free transmission at symbol rate

. Also, has no more than 100% spectral roll-off(i.e., for ). Furthermore, it is convenientto assume that has zero phase.

A cosine-modulated “synthesis” filter bank generates at ratethe transmit signals

(AI.9)

where the samples of the symbol response for theth sub-channel are given by

(AI.10)

The corresponding spectral symbol response is then

(AI.11)

The spectrum of the transmitted signal is shown in Fig. 13.One can verify that any two overlapping frequency-shiftedprototype spectra differ in phase by . This is the so-calledQMF condition, which ensures orthogonality [106].

For a DCT-based polyphase realization of cosine-modulatedfilter banks, see [106].

APPENDIX IIUNIFORMITY PROPERTIES OFEUCLIDEAN-SPACE CODES

A binary linear code has a useful and fundamental unifor-mity property: the code “looks the same” from the viewpointof any of its sequences. The number of neighbors at eachdistance is the same for all code sequences, and, providedthe channel has appropriate symmetry, the error probabilityis independent of which code sequence is transmitted. If thisuniformity property holds, then one has to consider only theset of code distances from the all-zero sequence (i.e., the set of

code weights), and in performance analysis one may assumethat the all-zero sequence was transmitted.

Similar uniformity properties are helpful in the design andanalysis of Euclidean-space codes. A variety of such proper-ties have been introduced, from the original “quasilinearity”property of Ungerboeck [104] to geometric uniformity [44] (aproperty that generalizes the group property of binary linearcodes), and finally rotational invariance. In this section webriefly review these properties.

A. Geometrical Uniformity

As we have seen in Section IV, for coding purposes itis useful to regard a large constellation based on a densepacking such as a lattice translate as an effectivelyinfinite constellation consisting of the entire packing. Thisapproximation eliminates boundary effects, and the resultinginfinite constellation usually has greater symmetry. As we haveseen, shaping methods (selection of a finite subconstellation)may be designed and analyzed separately in the high-SNRregime.

The symmetry group of a constellation is the set of isome-tries of Euclidean -space (i.e., rotations, reflections,translations) that map the constellation to itself. For a latticetranslate , the symmetry group includes the infiniteset , of all translations by lattice elements

. It will also generally include a finite set of orthogonaltransformations; e.g., the symmetry group of the QAM latticetranslate includes the eight symmetries ofthe square (the dihedral group ). A constellation isgeo-metrically uniformif it is the orbit of any of its points underits symmetry group. A lattice translate is necessarilygeometrically uniform, because it can be generated by thetranslation group , . An -PSK constellation isgeometrically uniform, because it can be generated by rotationsof multiples of .

A fundamental region of a lattice is a compactregion that contains precisely one point from each translate(coset) of . For example, any Voronoi region ofis a fundamental region if boundary points are appropriatelydivided between neighboring regions. The translates

, of any fundamental region tile -space

Page 28: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2411

without overlap, so that every can be uniquelyexpressed as for some and , whereis the unique element of in the coset . The volumeof any fundamental region is .

As discussed in Section IV, trellis codes are often based onlattice partitions . A useful viewpoint that has emergedfrom multilevel coding is to regard such codes as beingdefined on a fundamental region of the sublattice .The cosets of in are represented by theirunique representatives in . At the front end of thereceiver, each received point is mapped to the uniquepoint of in . Although such a “ - map”is not information-lossless, it has been shown that it doesnot reduce capacity [46]. Moreover, it decouples coding anddecoding implementation and performance from any codingthat may occur at other levels. Conceptually, this viewpointreduces infinite-constellation coding and decoding to codingand decoding using a finite constellation of points onthe compact region .

A geometrically uniform codebased on a lattice partitionmay be constructed as follows. Let be a label group

that is isomorphic to the quotient group ; or, moregenerally, let be a group of symmetries that cangenerate a set of subsets congruent to whose unionis . Let be a time axis (e.g., forblock codes, or for convolutional codes), let bethe set of all sequences of elements of definedon , which form a group under the componentwise groupoperation of , and let be any subgroup of .Then the orbit of any sequence of subsets of under isa geometrically uniform Euclidean-space code. In particular,for any geometrically uniform code, the set of distances fromany code sequence to all other code sequences is the same forall code sequences, and the probability of error on an idealAWGN channel does not depend on which sequence was sent.

This observation has stimulated research intogroup codes,namely codes that are subgroups of sequence groups,and their geometrically uniform Euclidean-space images. Al-though the ultimate goal of this research has been to constructnew classes of good codes, the main results so far have beento show that most known good codes constructed by othermethods are geometrically uniform. Ungerboeck’s original 1Dand 2D codes are easily shown to be geometrically uniform.Trott showed that the nonlinear Wei 4D-state trellis codeused in the V.32 modem standard may be represented as agroup code whose state space is the non-Abelian dihedralgroup [100]. The 4D -state Wei code and the -stateWilliams code used in the V.34 modem standard are bothgeometrically uniform; however, Sarvis and Trott have shownthat there is no geometrically uniform code with the sameparameters as the 4D -state Wei V.34 code [92].

B. Quasilinearity

A weaker uniformity property introduced in [104] is quasi-linearity. This property permits the minimum squared distance

of a trellis code to be found by simply computingthe weights of a set of error sequences, rather than by pairwisecomparison of all pairs of possible code sequences.

Quasilinearity depends only on the use of a rate-binary linear convolutional code in an encoder like that inFig. 5, and the fact that the subsets and associatedwith the first-level partition are geometrically congruent toeach other.

Let and be any two binary convolutional codesequences, and let their difference be the error sequence

, which by linearity is also a code sequence.For each error term , define the squared Euclidean weight

(AII.1)

where ranges over all binary -tuples withthe given bit , and and range over all signalsin the subsets labeled by and , respectively.

If and are congruent, then it is easy to see that. We may then simply write for

this common minimum.It then follows that the minimum squared distance

between any two trellis-code sequences that correspond todifferent convolutional-code sequences and is

(AII.2)

The proof is that and may differ arbitrarily in thelabel bits and , to which the

encoder adds appropriate parity-check bitsand , and forany the minimum is achievable witheither choice of .

Thus quasilinearity permits to be computed just asthe minimum Hamming distance of binary linear convolutionalcodes is computed; one simply replaces Hamming weights

by the squared Euclidean weights [104], [120]in trellis searches. This property has been used extensivelyin searches for the trellis codes reported in [104] and bysubsequent authors.

The number of codes to be searched for a code with thehighest value of can be further reduced significantlyif attains for all the set-partitioning lower bound

, where is the minimum intrasubset squareddistance at the first partitioning levelfor which [104].This property usually holds for large geometrically uniformconstellations.

C. Rotational Invariance

In carrier-modulated complex-signal transmission systems,the absolute carrier phase of the received waveform signalis generally not known. The receiver may demodulate thereceived signal with a constant carrier-phase offset ,where is the group of phase rotations that leave the two-dimensional signal constellation invariant. Then, if a signalsequence was transmitted, the decoder operates on thesequence of rotated signals . For example, for

-QAM we have . Rotationalinvariance is the property that rotated code sequences are alsovalid code sequences. This is obviously the case for uncodedmodulation with signal constellations that exhibit rotational

Page 29: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

2412 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

symmetries. To achieve transparency of the transmitted infor-mation under phase offsets, phase-differential encoding anddecoding may be employed.

A trellis code is fully rotationally invariant if for allcoded signal sequences and all phase offsets

the rotated sequences are code sequences, i.e.,. Note that signals here always denote

two-dimensional signals, which may be component signals ofa higher dimensional signal constellation.

Trellis codes are not necessarily rotationally invariant underthe same set of symmetriesas their two-dimensional signalconstellations. It has been shown that trellis codes based onlinear binary convolutional codes with mapping into two-dimensional -PSK or -QAM signals canat most be invariant under 180rotation. Fully rotationallyinvariant linear trellis codes exist only for higher dimensionalconstellations. Such codes are given for QAM in [105],[113], , and for -PSK in [87], [114],

; .Fully rotationally invariant trellis codes over two-

dimensional PSK or QAM constellations must be nonlinear[88], [112]. We briefly describe the analytic approachpresented in [88] for the design of nonlinear two-dimensionalPSK and QAM trellis codes by an example. Let a QAM signalconstellation be partitioned into eight subsets with subsetlabels , where(ring of integers modulo ). The labeling is chosen suchthat under 90 rotation the labels are transformed into

. A fully rotationally invarianttrellis code can then be found by requiring that for all labelsequences that satisfy a given parity-checkequation, the sequences alsosatisfy this equation. This is achieved by a binary nonlinearparity-check equation of the form

(AII.3)

where the coefficients of are binary, and the coefficientsof and the elements of are elements of . Thenotation means that from the binary representationof every element in the mostsignificant bit is chosen. Let denote theall-ones sequence. The codes defined by the parity-checkequation are rotationally invariant if

for all , which requires that thecoefficient sum of must be .

Code tables for fully rotationally invariant nonlinear trel-lis codes over two-dimensional QAM and -PSK

signal constellations are given in [88].Alternative methods for the construction of rotationally

invariant trellis codes are discussed in [10] and [101].

APPENDIX IIIDISCRETE-TIME SPECTRAL FACTORIZATION

The spectral factorization given by(5.19) may be obtained by expressing as a Fourier

series

(AIII.1)

whose coefficients are given by

(AIII.2)

The Paley–Wiener condition ensures that this Fourier seriesexists. Grouping the terms with negative, zero, and positiveindices of this Fourier series yields the desired factorization

(AIII.3)

In particular, this yields (5.20).To obtain an explicit expression for the coefficients of ,

define the formal power series

(AIII.4)

where is an indeterminate. Because ,we have

(AIII.5)

Taking formal derivatives yields

(AIII.6)

or, equivalently,

(AIII.7)

Repeated formal differentiation of (AIII.7) yields the recursiverelation for

(AIII.8)Finally, noting that

(AIII.9)

we obtain the explicit expressions

(AIII.10)

This shows that is causal and monic, and thatis uniquely determined by . For a proof that isminimum-phase, see [83].

Page 30: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2413

ACKNOWLEDGMENT

G. D. Forney wishes to acknowledge the education, stimula-tion, and support offered by many mentors and colleagues overthe past 35 years, particularly R. G. Gallager, J. L. Massey, J.M. Wozencraft, A. Kohlenberg, S. U. Qureshi, G. R. Lang, F.M. Longstaff, L.-F. Wei, and M. V. Eyuboglu.

G. Ungerboeck wishes to thank K. E. Drangeid, Directorof the IBM Zurich Research Laboratory, for his essentialsupport of this author’s early work on new modem conceptsand their implementation. He also gratefully acknowledges thestimulation and encouragement that he has received throughouthis career from numerous colleagues, in particular R. E.Blahut, G. D. Forney, J. L. Massey, and A. J. Viterbi.

REFERENCES

[1] American National Standards Institute, “Network and customer installa-tion interfaces—Asymmetrical digital subscriber line (ADSL) metallicinterface,” ANSI Std. T1E1, pp. 413–1995, Aug. 18, 1995.

[2] A. N. Akansu, P. Duhamel, X. Lin, and M. de Courville, “Orthogonaltransmultiplexers in communications: A review,”IEEE Trans. SignalProcessing, vol. 46, pp. 979–995, Apr. 1998.

[3] AT&T (B. Betts), “Additional details on AT&T’s candidate modemfor V.fast,” Contribution D157, ITU Study Group XVII, Geneva,Switzerland, Oct. 1991.

[4] AT&T (B. Betts), “Trellis-enhanced precoder—Combined coding andprecoding,” TIA TR30.1 contribution, Boston, MA, July 1993.

[5] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding oflinear codes for minimizing symbol error rate,”IEEE Trans. Inform.Theory, vol. IT-20, pp. 284–287, Mar. 1974.

[6] L. E. Baum and T. Petrie, “Statistical inference for probabilistic func-tions of finite-state Markov chains,”Ann. Math. Statist., vol. 37, pp.1554–1563, 1966.

[7] M. Bellanger, G. Bonnerot, and M. Coudreuse, “Digital filtering bypolyphase network: Application to the sample rate alteration and filterbanks,”IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-24,pp. 109–114, Apr. 1976.

[8] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, “Bandwidthefficient parallel concatenated coding schemes,”Electron. Lett., vol. 31,pp. 2067–2069, 1995.

[9] , “Serial concatenation of interleaved codes: Performance analy-sis, design, and iterative decoding,”IEEE Trans. Inform. Theory, vol.44, pp. 909–926, May 1998.

[10] S. Benedetto, R. Garello, M. Mondin, and M. D. Trott, “Rotationalinvariance of trellis codes—Part II: Group codes and decoders,”IEEETrans. Inform. Theory, vol. 42, pp. 766–778, May 1996.

[11] S. Benedetto and G. Montorsi, “Unveiling turbo codes: Some resultson parallel concatenated coding schemes,”IEEE Trans. Inform. Theory,vol. 42, pp. 409–428, Mar. 1996.

[12] E. R. Berlekamp, “Bounded distance+1 soft-decision Reed–Solomondecoding,”IEEE Trans. Inform. Theory, vol. 42, pp. 704–720, May 1996.

[13] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limiterror-correcting coding and decoding: Turbo codes,” inProc. 1993Int. Conf. Communication(Geneva, Switzerland, May 1993), pp.1064–1070.

[14] J. A. Bingham, “Multicarrier modulation for data transmission: An ideawhose time has come,”IEEE Commun. Mag., vol. 28, pp. 5–14, May1990.

[15] G. Caire, G. Taricco, and E. Biglieri, “Bit-interleaved coded modula-tion,” IEEE Trans. Inform. Theory, vol. 44, pp. 927–946, May 1998.

[16] A. R. Calderbank and L. H. Ozarow, “Nonequiprobable signaling on theGaussian channel,”IEEE Trans. Inform. Theory, vol. 36, pp. 726–740,July 1990.

[17] A. R. Calderbank and N. J. A. Sloane, “New trellis codes basedon lattices and cosets,”IEEE Trans. Inform. Theory, vol. IT-33, pp.177–195, Mar. 1987.

[18] M. Cedervall and R. Johannesson, “A fast algorithm for computing thedistance spectrum of convolutional codes,”IEEE Trans. Inform. Theory,vol. 35, pp. 1146–1159, Nov. 1989.

[19] D. A. Chase, “A class of algorithms for decoding block codes withchannel measurement information,”IEEE Trans. Inform. Theory, vol.IT-18, pp. 170–182, Jan. 1972.

[20] G. Cherubini, S. Oelcer, and G. Ungerboeck, “Trellis precoding forchannels with spectral nulls,” inProc. 1997 IEEE Int. Symp. InformationTheory (Ulm, Germany, June 1997), p. 464.

[21] J. S. Chow, J. M. Cioffi, and J. A. Bingham, “Equalizer trainingalgorithms for multicarrier modulation systems,” inProc. 1993 Int. Conf.Communication(Geneva, Switzerland, May 1993), pp. 761–765.

[22] J. M. Cioffi, G. P. Dudevoir, M. V. Eyuboglu, and G. D. Forney, Jr.,“MMSE decision-feedback equalizers and coding—Parts I and II,”IEEETrans. Commun., vol. 43, pp. 2582–2604, Oct. 1995.

[23] J. M. Cioffi and G. D. Forney, Jr., “Generalized decision-feedbackequalization for packet transmission with ISI and Gaussian noise,”in Communications, Computation, Control and Signal Processing, A.Paulrajet al., Eds. Boston, MA: Kluwer, 1997, pp. 79–127.

[24] Consultative Committee for Space Data Standards, “Recommendationsfor space data standard: Telemetry channel coding,”Blue BookIssue 2,CCSDS 101.0-B2, Jan. 1987.

[25] J. H. Conway and N. J. A. Sloane,Sphere Packings, Lattices and Groups.New York: Springer, 1988.

[26] D. J. Costello, Jr., J. Hagenauer, H. Imai, and S. B. Wicker, “Applica-tions of error control coding,” this issue, pp. 0000–0000.

[27] M. C. Davey and D. J. C. MacKay, “Low-density parity-check codesover GF(q),” in Proc. 1998 Information Theory Workshop(Killarney,Ireland, June 1998).

[28] B. Dejon and E. Hamsler, “Optimum multiplexing of sampled signals onnoisy channels,”IEEE Trans. Inform. Theory, vol. IT-17, pp. 257–262,May 1971.

[29] D. Divsalar, S. Dolinar, and F. Pollara, “Draft CCDS recommenda-tion for telemetry channel coding (updated to include turbo codes),”Consultative Committee for Space Data Systems,White Book, Jan. 1997.

[30] M. L. Doeltz, E. T. Heald, and D. L. Martin, “Binary data transmissiontechniques for linear systems,”Proc. IRE, vol. 45, pp. 656–661, May1957.

[31] P. Elias, “Coding for noisy channels,” inIRE Conv. Rec., Mar. 1955,vol. 3, pt. 4, pp. 37–46.

[32] D. Esteban and C. Galand, “Application of quadrature mirror filters tosplitband voice coding schemes,” inProc. IEEE Int. Conf. Acoustics,Speech, and Signal Processing, May 1977, pp. 191–195.

[33] European Telecommunication Standards Institute, “Radio broadcastingsystems; digital audio broadcasting (DAB) to mobile, portable and fixedreceivers,” ETSI Std. ETS 300 401, Feb. 1995.

[34] M. V. Eyuboglu and G. D. Forney, Jr., “Combined equalization andcoding using precoding,”IEEE Commun. Mag., vol. 29, no. 12, pp.25–34, Dec. 1991.

[35] , “Trellis precoding: Combined coding, precoding and shaping forintersymbol interference channels,”IEEE Trans. Inform. Theory, vol.38, pp. 301–314, Mar. 1992.

[36] D. D. Falconer and F. R. Magee, “Adaptive channel memory truncationfor maximum likelihood sequence estimation,”Bell Syst. Tech. J., vol.52, pp. 1541–1562, Nov. 1973.

[37] R. F. H. Fischer, “Using flexible precoding for channels with spectralnulls,” Electron. Lett., vol. 31, pp. 356–358, Mar. 1995.

[38] R. F. H. Fischer and J. B. Huber, “Comparison of precoding schemes fordigital subscriber lines,”IEEE Trans. Commun., vol. 45, pp. 334–343,Mar. 1997.

[39] G. D. Forney, Jr.,Concatenated Codes. Cambridge, MA: MIT Press,1966.

[40] , “Burst-correcting codes for the classic bursty channel,”IEEETrans. Commun. Technol., vol. COM-19, pp. 772–781, Oct. 1971.

[41] , “Maximum-likelihood sequence estimation of digital sequencesin the presence of intersymbol interference,”IEEE Trans. Inform.Theory, vol. IT-18, pp. 363–378, May 1972.

[42] , “Coset codes—Part I: Introduction and geometrical classifi-cation,” IEEE Trans. Inform. Theory, vol. 34, pp. 1123–1151, Sept.1988.

[43] , “Coset codes—Part II: Binary lattices and related codes,”IEEETrans. Inform. Theory, vol. 34, pp. 1152–1187, Sept. 1988.

[44] , “Geometrically uniform codes,”IEEE Trans. Inform. Theory,vol. 37, pp. 1241–1260, Sept. 1991.

[45] , “Trellis shaping,” IEEE Trans. Inform. Theory, vol. 38, pp.281–300, Mar. 1992.

[46] , “Approaching the channel capacity of the AWGN channel withcoset codes and multilevel coset codes,” Aug. 1997, submitted toIEEETrans. Inform. Theory.

[47] G. D. Forney, Jr., L. Brown, M. V. Eyuboglu, and J. L. Moran, III,“The V.34 high-speed modem standard,”IEEE Commun. Mag., vol. 34,pp. 28–33, Dec. 1996.

[48] G. D. Forney, Jr. and L.-F. Wei, “Multidimensional constellations–PartI: Introduction, figures of merit, and generalized cross constellations,”IEEE J. Select. Areas Commun., vol. 7, pp. 877–892, Aug. 1989.

Page 31: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

2414 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

[49] P. Fortier, A. Ruiz, and J. M. Cioffi, “Multidimensional signal setsthrough the shell construction for parallel channels,”IEEE Trans.Commun., vol. 40, pp. 500–512, Mar. 1992.

[50] R. G. Gallager,Low-Density Parity-Check Codes. Cambridge, MA:MIT Press, 1962.

[51] , Information Theory and Reliable Communication. New York:Wiley, 1968.

[52] General Datacomm (P. Cole and Y. Goldstein), “Distribution-preservingTomlinson algorithm,” contribution D189 to CCITT Study Group XVII,Geneva, Switzerland, June 1992.

[53] General Datacomm (Y. Goldstein), Motorola (M. V. Eyuboglu), andRockwell (S. Olafsson), “Precoding for V.fast,” TIA TR30.1 contribu-tion, Boston, MA, July 1993.

[54] H. Harashima and H. Miyakawa, “A method of code conversion fora digital communication channel with intersymbol interference,”IEEETrans. Commun., vol. COM-20, pp. 774–780, Aug. 1972.

[55] J. A. Heller and I. M. Jacobs, “Viterbi decoding for satellite and spacecommunication,” IEEE Trans. Commun. Technol., vol. COM-19, pp.835–848, Oct. 1971.

[56] B. Hirosaki, “An orthogonally multiplexed QAM system using thediscrete Fourier transform,”IEEE Trans. Commun., vol. COM-29, pp.982–989, July 1981.

[57] J. L. Holsinger, “Digital communication over fixed time-continuouschannels with memory, with special application to telephone channels,”MIT Res. Lab. Electron., Tech. Rep. 430, 1964.

[58] J. Huber and U. Wachsmann, “Capacities of equivalent channels inmultilevel coding schemes,”Electron. Lett., vol. 30, pp. 557–558, Mar.1994.

[59] J. Huber, U. Wachsmann, and R. Fischer, “Coded modulation by mul-tilevel codes: Overview and state of the art,” inProc. ITG Fachtagung“Codierung fuer Quelle, Kanal und Uebertragung"(Aachen, Germany,Mar. 1998), pp. 255–266.

[60] H. Imai and S. Hirakawa, “A new multilevel coding method using errorcorrecting codes,”IEEE Trans. Inform. Theory, vol. 23, pp. 371–377,May 1977.

[61] Proc. Int. Symp. Turbo Codes(Brest, France, Sept. 1997).[62] ITU-T Recommendation V.34, “A modem operating at data signalling

rates of up to 33 600 bit/s for use on the general switched telephonenetwork and on leased point-to-point 2-wire telephone-type circuits,”Oct. 1996, replaces first version of Sept. 1994.

[63] I. M. Jacobs and E. R. Berlekamp, “A lower bound to the distributionof computation for sequential decoding,”IEEE Trans. Inform. Theory,vol. IT-13, pp. 167–174, 1967.

[64] I. Kalet, “The multitone channel,”IEEE Trans. Commun., vol. 37, pp.119–124, Feb. 1989.

[65] S. Kallel and K. Li, “Bidirectional sequential decoding,”IEEE Trans.Inform. Theory, vol. 43, pp. 1319–1326, July 1997.

[66] A. K. Khandani and P. Kabal, “Shaping multidimensional signalspaces—Part I: Optimum shaping, shell mapping,”IEEE Trans. Inform.Theory, vol. 39, pp. 1799–1808, Nov. 1993.

[67] Y. Kofman, E. Zehavi, and S. Shamai (Shitz), “Performance analysisof a multilevel coded modulation system,”IEEE Trans. Commun., vol.42, pp. 299–312, Feb. 1994.

[68] F. R. Kschischang and S. Pasupathy, “Optimal nonuniform signaling forGaussian channels,”IEEE Trans. Inform. Theory, vol. 39, pp. 913–929,May 1993.

[69] G. R. Lang and F. M. Longstaff, “A Leech lattice modem,”IEEE J.Select. Areas Commun., vol. 7, pp. 968–973, Aug. 1989.

[70] P. J. Lee, “There are many good periodically-time-varying convolutionalcodes,”IEEE Trans. Inform. Theory, vol. 35, pp. 460–463, Mar. 1989.

[71] R. Laroia, “Coding for intersymbol interference channels—Combinedcoding and precoding,”IEEE Trans. Inform. Theory, vol. 42, pp.1053–1061, July 1996.

[72] R. Laroia, N. Farvardin, and S. Tretter, “On optimal shaping of multi-dimensional constellations,”IEEE Trans. Inform. Theory, vol. 40, pp.1044–1056, July 1994.

[73] R. Laroia, S. Tretter, and N. Farvardin, “A simple and effectiveprecoding scheme for noise whitening on intersymbol interferencechannels,”IEEE Trans. Commun., vol. 41, pp. 1460–1463, Oct. 1993.

[74] E. A. Lee and D. G. Messerschmitt,Digital Communication, 2nd ed.Boston, MA: Kluwer, 1994.

[75] J. Leech and N. J. A. Sloane, “Sphere packing and error-correctingcodes,”Canad. J. Math., vol. 23, pp. 718–745, 1971.

[76] D. J. C. MacKay and R. M. Neal, “Near Shannon limit performance oflow-density parity-check codes,”Electron. Lett., vol. 32, pp. 1645–1646,Aug. 1996, reprinted in vol. 33, pp. 457–458, Mar. 1997.

[77] F. J. MacWilliams and N. J. A. Sloane,The Theory of Error-CorrectingCodes. Amsterdam, The Netherlands: North-Holland, 1977.

[78] J. L. Massey, “Coding and modulation in digital communications,”in Proc. 1974 Int. Zurich Seminar on Digital Communication(Zurich,Switzerland, Mar. 1974), pp. E2(1)–(4).

[79] R. J. McEliece and L. Swanson, “Reed–Solomon codes and the ex-ploration of the solar system,” inReed–Solomon Codes and TheirApplications, S. B. Wicker and V. K. Bhargava, Eds. Piscataway, NJ:IEEE Press, 1994, pp. 25–40.

[80] Motorola Information Systems Group (M. V. Eyuboglu), “A flexibleform of precoding for V.fast,” Contribution D194, CCITT Study GroupXVII, Geneva, Switzerland, June 1992.

[81] S. H. Mueller and J. B. Huber, “A comparison of peak power reductionschemes for OFDM,” inProc. 1997 IEEE Global Telecomm. Conf.(Globecom’97)(Phoenix, AZ, Nov. 1997), pp. 1–5.

[82] A. Narula and F. R. Kschischang, “Unified framework for PAR controlin multitone systems,” ANSI T1E1.4 contribution 98-183, Huntsville,AL, June 1998.

[83] A. Papoulis,Signal Analysis. New York: McGraw-Hill, 1984.[84] L. C. Perez, J. Seghers, and D. J. Costello, Jr., “A distance spectrum

interpretation of turbo codes,”IEEE Trans. Inform. Theory, vol. 42, pp.1698–1709, Nov. 1996.

[85] R. Price, “Nonlinearly feedback-equalized PAM versus capacity fornoisy filter channels,” inProc. 1972 Int. Conf. Communication, June1972, pp. 22.12–22.17.

[86] A. Peled and A. Ruiz, “Frequency domain data transmission usingreduced computational complexity algorithms,” inProc. IEEE Int. Conf.Acoustics, Speech, and Signal Processing, Apr. 1980, pp. 964–967.

[87] S. S. Pietrobon, R. H. Deng, A. Lafanechere, G. Ungerboeck, and D. J.Costello, Jr., “Trellis-coded multidimensional phase modulation,”IEEETrans. Inform. Theory, vol. 36, pp. 63–89, Jan. 1990.

[88] S. S. Pietrobon, G. Ungerboeck, L. C. Perez, and D. J. Costello,Jr., “Rotationally invariant nonlinear trellis codes for two-dimensionalmodulation,”IEEE Trans. Inform. Theory, vol. 40, pp. 1773–1791, Nov.1994.

[89] P. Robertson and T. Woerz, “Coded modulation scheme employing turbocodes,”Electron. Lett.,vol. 31, pp. 1546–1547, Aug. 1995.

[90] A. Ruiz, J. M. Cioffi, and S. Kasturia, “Discrete multiple tone modula-tion with coset coding for the spectrally shaped channel,”IEEE Trans.Commun., vol. 40, pp. 1012–1029, June 1992.

[91] S. D. Sandberg and M. A. Tzannes, “Overlapped discrete multitonemodulation for high speed copper wire commmunications,”IEEE J.Select. Areas Commun., vol. 13, pp. 1570–1585, Dec. 1995.

[92] J. P. Sarvis, “Symmetries of trellis codes,” M. Eng. thesis, Dept. Elec.Eng. and Comp. Sci., MIT, Cambridge, MA, June 1995.

[93] C. E. Shannon, “A mathematical theory of communication,”Bell Syst.Tech. J., vol. 27, pp. 379–423 and pp. 623–656, July and Oct. 1948.

[94] , “Communication in the presence of noise,”Proc. IRE, vol. 37,pp. 10–21, 1949.

[95] , “Probability of error for optimal codes in a Gaussian channel,”Bell Syst. Tech. J., vol. 38, pp. 611–656, May 1959.

[96] M. Sipser and D. A. Spielman, “Expander codes,”IEEE Trans. Inform.Theory, vol. 42, pp. 1710–1722, Nov. 1996.

[97] M. Tomlinson, “New automatic equalizer employing modulo arith-metic,” Electron. Lett., vol. 7, pp. 138–139, Mar. 1971.

[98] P. Tong, “A 40 MHz encoder-decoder chip generated by aReed–Solomon compiler,” inProc. Cust. Int. Circuits Conf. (Boston,MA, May 1990), pp. 13.5.1–13.5.4.

[99] IEEE Trans. Signal Processing(Special Issue on Theory and Applica-tions of Filter Banks and Wavelet Transforms), vol. 46, Apr. 1998.

[100] M. D. Trott, “The algebraic structure of trellis codes,” Ph.D. dissertation,Dept. Elec. Eng., Stanford Univ., Stanford, CA, Aug. 1992.

[101] M. D. Trott, S. Benedetto, R. Garello, and M. Mondin, “Rotationalinvariance of trellis codes—Part I: Encoders and precoders,”IEEETrans. Inform. Theory, vol. 42, pp. 751–765, May 1996.

[102] M. A. Tzannes, M. C. Tzannes, J. Proakis, and P. N. Heller, “DMTsystems, DWMT systems, and filter banks,” inProc. Int. CommunicationConf., 1994.

[103] G. Ungerboeck, ”Adaptive maximum-likelihood receiver for carrier-modulated data-transmission systems,”IEEE Trans. Commun., vol.COM-22, pp. 1124–1136, May 1974.

[104] , “Channel coding with multilevel/phase signals,”IEEE Trans.Inform. Theory, vol. IT-28, pp. 55–67, Jan. 1982.

[105] , “Trellis-coded modulation with redundant signal sets: Part II,”IEEE Commun. Mag., vol. 25, pp. 12–21, Feb. 1987.

[106] P. P. Vaidyanathan,Multirate Systems and Filter Banks. EnglewoodCliffs, NJ: Prentice-Hall, 1993.

[107] A. Vardy, “Trellis structure of codes,” inHandbook of Coding TheoryV. S. Plesset al., Eds. Amsterdam, The Netherlands: Elsevier, 1998.

[108] A. J. Viterbi and J. K. Omura,Principles of Digital Communication andCoding. New York: McGraw-Hill, 1979.

Page 32: Modulation and Coding for Linear Gaussian Channels - … · 2001-03-16 · 2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Modulation and Coding for Linear

FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2415

[109] U. Wachsmann and J. Huber, “Power and bandwidth efficient digitalcommunication using turbo codes in multilevel codes,”Euro. Trans.Telecommun., vol. 6, pp. 557–567, Sept. 1995.

[110] U. Wachsmann, R. Fischer, and J. Huber, “Multilevel codes: Parts 1–3,”June 1997, submitted toIEEE Trans. Inform. Theory.

[111] F.-Q. Wang and D. J. Costello, Jr., “Sequential decoding of trellis codesat high spectral efficiencies,”IEEE Trans. Inform. Theory, vol. 43, pp.2013–2019, Nov. 1997.

[112] L.-F. Wei, “Rotationally invariant convolutional channel encoding withexpanded signal space, Part II: Nonlinear codes,”IEEE J. Select. AreasCommun., vol. 2, pp. 672–686, Sept. 1984.

[113] , “Trellis-coded modulation using multidimensional constella-tions,” IEEE Trans. Inform. Theory, vol. IT-33, pp. 483–501, July 1987.

[114] , “Rotationally invariant trellis-coded modulations with multi-dimensional M-PSK,” IEEE J. Select. Areas Commun., vol. 7, pp.

1281–1295, Dec. 1989.[115] S. B. Weinstein and P. M. Ebert, “Data transmission by frequency-

division multiplexing,” IEEE Trans. Commun., vol. COM-19, pp.628–634, Oct. 1971.

[116] N. Wiberg, H.-A. Loeliger, and R. Kotter, “Codes and iterative decodingon general graphs,”Euro. Trans. Telecommun., vol. 6, pp. 513–526,Sept. 1995.

[117] R. G. C. Williams, “A trellis code for V.fast,” CCITT V.fast rapporteurmeeting, Bath, U.K., Sept. 1992.

[118] J. M. Wozencraft and I. M. Jacobs,Principles of CommunicationEngineering. New York: Wiley, 1965.

[119] J. M. Wozencraft and B. Reiffen,Sequential Decoding. Cambridge,MA: MIT Press, 1961.

[120] E. Zehavi and J. K. Wolf, “On the performance evaluation of trelliscodes,”IEEE Trans. Inform. Theory, vol. IT-33, pp. 192–201, Jan. 1987.