ieee transactions on information theory, …verdu/reprints/kosverit13.pdf · 2013-12-03 · ieee...

31
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 5, MAY 2013 2545 Lossy Joint Source-Channel Coding in the Finite Blocklength Regime Victoria Kostina, Student Member, IEEE, and Sergio Verdú, Fellow, IEEE Abstract—This paper nds new tight nite-blocklength bounds for the best achievable lossy joint source-channel code rate, and demonstrates that joint source-channel code design brings considerable performance advantage over a separate one in the nonasymptotic regime. A joint source-channel code maps a block of source symbols onto a length- channel codeword, and the delity of reproduction at the receiver end is measured by the probability that the distortion exceeds a given threshold . For memoryless sources and channels, it is demonstrated that the parameters of the best joint source-channel code must satisfy , where and are the channel capacity and channel dispersion, respectively; and are the source rate-distortion and rate-dispersion functions; and is the standard Gaussian complementary cumulative distribution function. Symbol-by-symbol (uncoded) transmission is known to achieve the Shannon limit when the source and channel satisfy a certain probabilistic matching condition. In this paper, we show that even when this condition is not satised, symbol-by-symbol transmission is, in some cases, the best known strategy in the nonasymptotic regime. Index Terms—Achievability, converse, nite blocklength regime, joint source-channel coding (JSCC), lossy source coding, memory- less sources, rate-distortion theory, Shannon theory. I. INTRODUCTION I N the limit of innite blocklengths, the optimal achievable coding rates in channel coding and lossy data compression are characterized by the channel capacity and the source rate- distortion function , respectively [3]. For a large class of sources and channels, in the limit of large blocklength, the max- imum achievable joint source-channel coding (JSCC) rate com- patible with vanishing excess distortion probability is character- ized by the ratio [4]. A perennial question in information theory is how relevant the asymptotic fundamental limits are when the communication system is forced to operate at a given Manuscript received August 25, 2012; accepted November 30, 2012. Date of publication January 09, 2013; date of current version April 17, 2013. This work was supported in part by the National Science Foundation (NSF) under Grant CCF-1016625 and the Center for Science of Information, an NSF Science and Technology Center, under Grant CCF-0939370. V. Kostina was supported in part by the Natural Sciences and Engineering Research Council of Canada. This paper was presented in part at the 2012 IEEE International Symposium on Information Theory [1] and in part at the 2012 IEEE Information Theory Workshop [2]. The authors are with the Department of Electrical Engineering, Princeton University, Princeton, NJ 08544 USA (e-mail: [email protected]; [email protected]). Communicated by I. Kontoyiannis, Associate Editor At Large. Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TIT.2013.2238657 xed blocklength. The nite blocklength (delay) constraint is inherent to all communication scenarios. In fact, in many sys- tems of current interest, such as real-time multimedia communi- cation, delays are strictly constrained, while in packetized data communication, packets are frequently on the order of 1000 bits. While computable formulas for the channel capacity and the source rate-distortion function are available for a wide class of channels and sources, the luxury of being able to compute ex- actly (in polynomial time) the nonasymptotic fundamental limit of interest is rarely affordable. Notable exceptions where the nonasymptotic fundamental limit is indeed computable are al- most lossless source coding [5], [6], and JSCC over matched source-channel pairs [7]. In general, however, one can at most hope to obtain bounds and approximations to the information- theoretic nonasymptotic fundamental limits. Although nonasymptotic bounds can be distilled from classical proofs of coding theorems, these bounds are rarely satisfyingly tight in the nonasymptotic regime, as studied in [8] and [9] in the contexts of channel coding and lossy source coding, respectively. For the JSCC problem, the classical converse is based on the mutual information data processing in- equality, while the classical achievability scheme uses separate source/channel coding (SSCC), in which the channel coding block and the source coding block are optimized separately without knowledge of each other. These conventional ap- proaches lead to disappointingly weak nonasymptotic bounds. In particular, SSCC can be rather suboptimal nonasymptoti- cally. An accurate nite blocklength analysis therefore calls for novel upper and lower bounds that sandwich tightly the nonasymptotic fundamental limit. Such bounds were shown in [8] for the channel coding problem and in [9] for the source coding problem. In this paper, we derive new tight bounds for the JSCC problem, which hold in full generality, without any assumptions on the source alphabet, stationarity or memory- lessness. While numerical evaluation of the nonasymptotic upper and lower bounds bears great practical interest (for example, to de- cide how suboptimal with respect to the information-theoretic limit a given blocklength- code is), such bounds usually in- volve cumbersome expressions that offer scant conceptual in- sight. Somewhat ironically, to get an elegant, insightful approx- imation of the nonasymptotic fundamental limit, one must re- sort to an asymptotic analysis of these nonasymptotic bounds. Such asymptotic analysis must be ner than that based on the law of large numbers, which sufces to obtain the asymptotic fundamental limit but fails to provide any estimate of the speed of convergence to that limit. There are two complementary ap- proaches to a ner asymptotic analysis: the large deviations 0018-9448/$31.00 © 2013 IEEE

Upload: phamduong

Post on 26-Jun-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 5, MAY 2013 2545

Lossy Joint Source-Channel Codingin the Finite Blocklength RegimeVictoria Kostina, Student Member, IEEE, and Sergio Verdú, Fellow, IEEE

Abstract—This paper finds new tight finite-blocklength boundsfor the best achievable lossy joint source-channel code rate,and demonstrates that joint source-channel code design bringsconsiderable performance advantage over a separate one in thenonasymptotic regime. A joint source-channel code maps a blockof source symbols onto a length- channel codeword, and thefidelity of reproduction at the receiver end is measured by theprobability that the distortion exceeds a given threshold . Formemoryless sources and channels, it is demonstrated that theparameters of the best joint source-channel code must satisfy

, where and are thechannel capacity and channel dispersion, respectively; and

are the source rate-distortion and rate-dispersion functions;and is the standard Gaussian complementary cumulativedistribution function. Symbol-by-symbol (uncoded) transmissionis known to achieve the Shannon limit when the source andchannel satisfy a certain probabilistic matching condition. In thispaper, we show that even when this condition is not satisfied,symbol-by-symbol transmission is, in some cases, the best knownstrategy in the nonasymptotic regime.

Index Terms—Achievability, converse, finite blocklength regime,joint source-channel coding (JSCC), lossy source coding, memory-less sources, rate-distortion theory, Shannon theory.

I. INTRODUCTION

I N the limit of infinite blocklengths, the optimal achievablecoding rates in channel coding and lossy data compression

are characterized by the channel capacity and the source rate-distortion function , respectively [3]. For a large class ofsources and channels, in the limit of large blocklength, the max-imum achievable joint source-channel coding (JSCC) rate com-patible with vanishing excess distortion probability is character-ized by the ratio [4]. A perennial question in informationtheory is how relevant the asymptotic fundamental limits arewhen the communication system is forced to operate at a given

Manuscript received August 25, 2012; accepted November 30, 2012. Dateof publication January 09, 2013; date of current version April 17, 2013. Thiswork was supported in part by the National Science Foundation (NSF) underGrant CCF-1016625 and the Center for Science of Information, an NSF Scienceand Technology Center, under Grant CCF-0939370. V. Kostina was supportedin part by the Natural Sciences and Engineering Research Council of Canada.This paper was presented in part at the 2012 IEEE International Symposiumon Information Theory [1] and in part at the 2012 IEEE Information TheoryWorkshop [2].The authors are with the Department of Electrical Engineering, Princeton

University, Princeton, NJ 08544 USA (e-mail: [email protected];[email protected]).Communicated by I. Kontoyiannis, Associate Editor At Large.Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TIT.2013.2238657

fixed blocklength. The finite blocklength (delay) constraint isinherent to all communication scenarios. In fact, in many sys-tems of current interest, such as real-time multimedia communi-cation, delays are strictly constrained, while in packetized datacommunication, packets are frequently on the order of 1000 bits.While computable formulas for the channel capacity and thesource rate-distortion function are available for a wide class ofchannels and sources, the luxury of being able to compute ex-actly (in polynomial time) the nonasymptotic fundamental limitof interest is rarely affordable. Notable exceptions where thenonasymptotic fundamental limit is indeed computable are al-most lossless source coding [5], [6], and JSCC over matchedsource-channel pairs [7]. In general, however, one can at mosthope to obtain bounds and approximations to the information-theoretic nonasymptotic fundamental limits.Although nonasymptotic bounds can be distilled from

classical proofs of coding theorems, these bounds are rarelysatisfyingly tight in the nonasymptotic regime, as studied in[8] and [9] in the contexts of channel coding and lossy sourcecoding, respectively. For the JSCC problem, the classicalconverse is based on the mutual information data processing in-equality, while the classical achievability scheme uses separatesource/channel coding (SSCC), in which the channel codingblock and the source coding block are optimized separatelywithout knowledge of each other. These conventional ap-proaches lead to disappointingly weak nonasymptotic bounds.In particular, SSCC can be rather suboptimal nonasymptoti-cally. An accurate finite blocklength analysis therefore callsfor novel upper and lower bounds that sandwich tightly thenonasymptotic fundamental limit. Such bounds were shown in[8] for the channel coding problem and in [9] for the sourcecoding problem. In this paper, we derive new tight bounds forthe JSCC problem, which hold in full generality, without anyassumptions on the source alphabet, stationarity or memory-lessness.While numerical evaluation of the nonasymptotic upper and

lower bounds bears great practical interest (for example, to de-cide how suboptimal with respect to the information-theoreticlimit a given blocklength- code is), such bounds usually in-volve cumbersome expressions that offer scant conceptual in-sight. Somewhat ironically, to get an elegant, insightful approx-imation of the nonasymptotic fundamental limit, one must re-sort to an asymptotic analysis of these nonasymptotic bounds.Such asymptotic analysis must be finer than that based on thelaw of large numbers, which suffices to obtain the asymptoticfundamental limit but fails to provide any estimate of the speedof convergence to that limit. There are two complementary ap-proaches to a finer asymptotic analysis: the large deviations

0018-9448/$31.00 © 2013 IEEE

2546 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 5, MAY 2013

analysis which leads to error exponents, and the Gaussian ap-proximation analysis which leads to dispersion. The error ex-ponent approximation and the Gaussian approximation to thenonasymptotic fundamental limit are tight in different opera-tional regimes. In the former, a rate which is strictly suboptimalwith respect to the asymptotic fundamental limit is fixed, andthe error exponent measures the exponential decay of the errorprobability to 0 as the blocklength increases. The error expo-nent approximation is tight if the error probability a system cantolerate is extremely small. However, already for probability oferror as low as to , which is the operational regime formany high data rate applications, the Gaussian approximation,which gives the optimal rate achievable at a given error proba-bility as a function of blocklength, is tight [8], [9]. In the channelcoding problem, the Gaussian approximation of , themaximum achievable finite blocklength coding rate at block-length and error probability , is given by, for finite alphabetstationary memoryless channels [8]

(1)

where and are the channel capacity and dispersion, re-spectively. In the lossy source coding problem, the Gaussianapproximation of , the minimum achievable finiteblocklength coding rate at blocklength and probability ofexceeding fidelity , is given by, for stationary memorylesssources [9]

(2)

where and are the rate-distortion and the rate-disper-sion functions, respectively. For a given code, the excess distor-tion constraint, which is the figure of merit in this paper as wellas in [9], is, in a way, more fundamental than the average dis-tortion constraint, because varying over its entire range andevaluating the probability of exceeding gives full informa-tion about the distribution (and not just its mean) of the distor-tion incurred at the decoder output. Following the philosophy of[8] and [9], in this paper we perform the Gaussian approxima-tion analysis of our new bounds to show that , the maximumnumber of source symbols transmissible using a given channelblocklength , must satisfy

(3)

under the fidelity constraint of exceeding a given distortion levelwith probability . In contrast, if, following the SSCC para-

digm, we just concatenate the channel code in (1) and the sourcecode in (2), we obtain

(4)

which is usually strictly suboptimal with respect to (3).In addition to deriving new general achievability and con-

verse bounds for JSCC and performing their Gaussian approxi-mation analysis, in this paper we revisit the dilemma of whether

one should or should not code when operating under delay con-straints. Gastpar et al. [7] gave a set of necessary and sufficientconditions on the source, its distortion measure, the channeland its cost function in order for symbol-by-symbol transmis-sion to attain the minimum average distortion. In these curiouscases, the source and the channel are probabilistically matched.In the absence of channel cost constraints, we show that when-ever the source and the channel are probabilistically matched sothat symbol-by-symbol coding achieves the minimum averagedistortion, it also achieves the dispersion of JSCC. Moreover,even in the absence of such a match between the source andthe channel, symbol-by-symbol transmission, though asymp-totically suboptimal, might outperform in the nonasymptoticregime not only SSCC but also our random-coding achievabilitybound.Prior research relating to finite blocklength analysis of JSCC

includes the work of Csiszár [10], [11] who demonstrated thatthe error exponent of JSCC outperforms that of SSCC. Fordiscrete source-channel pairs with average distortion criterion,Pilc’s achievability bound [12], [13] applies. For the transmis-sion of a Gaussian source over a discrete channel under theaverage mean square error constraint, Wyner’s achievabilitybound [14], [15] applies. Nonasymptotic achievability andconverse bounds for a graph-theoretic model of JSCC havebeen obtained by Csiszár [16]. Most recently, Tauste Campo etal. [17] showed a number of finite-blocklength random-codingbounds applicable to the almost-lossless JSCC setup, whileWang et al. [18] found the dispersion of JSCC for sources andchannels with finite alphabets.The rest of this paper is organized as follows. Section II

summarizes basic definitions and notation. Sections III and IVintroduce the new converse and achievability bounds to themaximum achievable coding rate, respectively. A Gaussianapproximation analysis of the new bounds is presented inSection V. The evaluation of the bounds and the approximationis performed for two important special cases: the transmissionof a binary memoryless source (BMS) over a binary-symmetricchannel (BSC) with bit error rate distortion (see Section VI)and the transmission of a Gaussian memoryless source (GMS)with mean-square error distortion over an additive whiteGaussian noise (AWGN) channel with a total power constraint(see Section VII). Section VIII focuses on symbol-by-symboltransmission.

II. DEFINITIONS

A lossy source-channel code is a pair of (possibly random-ized) mappings and . A distortion mea-sure is used to quantify the perfor-mance of the lossy code. A cost function maybe imposed on the channel inputs. The channel is used withoutfeedback.

Definition 1: The pair is a lossy source-channel code for if

and either (averagecost constraint) or a.s. (maximal cost constraint),where (see Fig. 1). In the absence of an input cost

KOSTINA AND VERDÚ: LOSSY JOINT SOURCE-CHANNEL CODING IN THE FINITE BLOCKLENGTH REGIME 2547

Fig. 1. joint source-channel code.

constraint, we simplify the terminology and refer to the code aslossy source-channel code.

The special case and correspondsto almost-lossless compression. If, in addition, is equiprob-able on an alphabet of cardinality , acode in Definition 1 corresponds to an channel code(i.e., a code with codewords and average error probabilityand cost ). On the other hand, if is an identity map-ping on an alphabet of cardinality without cost constraints,a code in Definition 1 corresponds to an lossycompression code (as e.g., defined in [9]).As our bounds in Sections III and IV do not foist a Cartesian

structure on the underlying alphabets, we state them in the one-shot paradigm of Definition 1. When we apply those boundsto the block coding setting, transmitted objects indeed becomevectors, and Definition 2 below comes into play.

Definition 2: In the conventional fixed-to-fixed (orblock) setting in which and are the -fold Carte-sian products of alphabets and , and are the-fold Cartesian products of alphabets and , and

, , acode for is calleda code (or a code if there is no costconstraint).

Definition 3: Fix , , and the channel blocklength .The maximum achievable source blocklength and coding rate(source symbols per channel use) are defined, respectively, by

(5)

(6)

Alternatively, fix , , source blocklength , and channel block-length . The minimum achievable excess distortion is definedby

(7)

Denote, for a given and a cost function

(8)

and for a given and a distortion measure

(9)

We impose the following basic restrictions on , , theinput-cost function, and the distortion measure.(a) is finite for some , i.e., , where

(10)

(b) The infimum in (9) is achieved by a unique .(c) The supremum in (8) is achieved by a unique .The dispersion, which serves to quantify the penalty on the

rate of the best JSCC code induced by the finite blocklength, isdefined as follows.

Definition 4: Fix and . The rate-dispersion func-tion of JSCC (source samples squared per channel use) is de-fined as

V (11)

where and are the channel capacity-cost and sourcerate-distortion functions, respectively.1

The distortion-dispersion function of JSCC is defined as

W

(12)where is the distortion-rate function of the source.If there is no cost constraint, we will simplify notation by

dropping from (5)–(8), (11), and (12).

Definition 5 ( -Tilted Information [9]): For , the-tilted information in is defined as2

(13)

where the expectation is with respect to , i.e., the uncon-ditional distribution of the reproduction random variable thatachieves the infimum in (9), and

(14)

The following properties of -tilted information, proven in[19], are used in the sequel:

(15)

(16)

(17)

where (15) holds for -almost every , while (17) holds for all, and

(18)

1While for memoryless sources and channels, andgiven by (8) and (9) evaluated with single-letter distributions, it is impor-

tant to distinguish between the operational definitions and the extremal mutualinformation quantities, since the core results in this paper allow for memory.2All ’s and ’s are in an arbitrary common base.

2548 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 5, MAY 2013

denotes the information density of the joint distributionat . We can define the right-hand side of (18) for a given

even if there is no such that the marginal ofis . We use the same notation for that more

general function. To extend Definition 5 to the lossless case, fordiscrete random variables, we define 0-tilted information as

(19)

where

(20)

is the information in outcome .The distortion -ball centered at is denoted by

(21)

Given , we writeto indicate that is the marginal of , i.e.,

.3

So as not to clutter notation, in Sections III and IV, we assumethat there are no cost constraints. However, all results in thosesections generalize to the case of a maximal cost constraint byconsidering whose distribution is supported on the subset ofallowable channel inputs

(22)

rather than the entire channel input alphabet .

III. CONVERSES

A. Converses via -Tilted Information

Our first result is a general converse bound.

Theorem 1 (Converse): The existence of a code forand requires that

(23)

(24)

where in (23), , and the conditional probability in(24) is with respect to distributed according to (in-dependent of ), and

(25)

3We write summations over alphabets for simplicity. Unless stated otherwise,all our results hold for abstract probability spaces.

Proof: Fix and the code . Fix an ar-bitrary probability measure on . Let .We can write the probability in the right-hand side of (22) as

(26)

(27)

(28)

(29)

(30)

(31)

(32)

where (32) is due to (17). Optimizing over and , we getthe best possible bound for a given encoder . To obtain acode-independent converse, we simply choose that givesthe weakest bound, and (23) follows. To show (24), we weaken(23) as

(33)

and observe that for any

(34)

(35)

(36)

An immediate corollary to Theorem 1 is the following result.

KOSTINA AND VERDÚ: LOSSY JOINT SOURCE-CHANNEL CODING IN THE FINITE BLOCKLENGTH REGIME 2549

Theorem 2 (Converse): Assume that there exists a distribu-tion such that the distribution of (according to

) does not depend on the choice of . If acode for and exists, then

(37)

for an arbitrary . The probability measure in (37) isgenerated by .

Proof: Under the assumption, the conditional probabilityin the right-hand side of (24) is the same regardless of the choiceof .

The next result generalizes Theorem 1. When we apply The-orem 3 in Section V to find the dispersion of JSCC, we will letbe the number of channel input types, and let be the type

of the channel input block. If , Theorem 3 reduces to The-orem 1.

Theorem 3 (Converse): The existence of a code forand requires that

(38)

(39)

where is a positive integer, the random variable takesvalues on , and

(40)

and in (39), the probability measure is generated by.

Proof: Fix a possibly randomized code, a positive scalar , a positive in-

teger , an auxiliary random variable that satisfies, and a conditional probability distribution

. Let , i.e.,, for all . Write

(41)

(42)

(43)

(44)

(45)

(46)

where (46) is due to (17). Optimizing over , , and the distri-butions of the auxiliary random variables and , we obtainthe best possible bound for a given encoder . To obtain acode-independent converse, we simply choose that givesthe weakest bound, and (38) follows. To show (39), we weaken(38) by restricting the to satisfying andchanging the order of and as follows:

(47)

Observe that for any legitimate choice of and

(48)

(49)

(50)

which is equal to the expectation on the right-hand side of (39).

Remark 1: Theorems 1–3 still hold in the case and, which corresponds to almost-lossless data

compression. Indeed, recalling (19), it is easy to see that theproof of Theorem 1 applies, skipping the now unnecessary step(31), and, therefore, (23) reduces to

(51)

Similar modification can be applied to the proof of Theorem 3.

Remark 2: Our converse for lossy source coding in [9, Th. 7]can be viewed as a particular case of the result in Theorem 2.

2550 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 5, MAY 2013

Indeed, if and ,, then (37) becomes

(52)

which is precisely [9, Th. 7].

B. Converses via Hypothesis Testing and List Decoding

To show a joint source-channel converse in [11], Csiszár useda list decoder, which outputs a list of elements drawn from .While traditionally list decoding has only been considered inthe context of finite alphabet sources, we generalize the settingto sources with abstract alphabets. In our setup, the encoder isthe random transformation , and the decoder is defined asfollows.

Definition 6 (List Decoder): Let be a positive real number,and let be a measure on . An list decoder is arandom transformation , where takes values on -mea-surable sets with -measure not exceeding

(53)

Even though we keep the standard “list” terminology, the de-coder output need not be a finite or countably infinite set. Theerror probability with this type of list decoding is the probabilitythat the source outcome does not belong to the decoder outputlist for

(54)where is the set of all -measurable subsets of with-measure not exceeding .

Definition 7 (List Code): An list code is a pair ofrandom transformations such that (53) holds andthe list error probability (54) does not exceed .Of course, letting , where is the counting mea-

sure on , we recover the conventional list decoder definitionwhere the smallest scalar that satisfies (53) is an integer. The al-most-lossless JSCC setting ( ) in Definition 1 correspondsto , . If the source is analog (has a continuousdistribution), it is reasonable to let be the Lebesgue mea-sure.Any converse for list decoding implies a converse for conven-

tional decoding. To see why, observe that any lossy codecan be converted to a list code with list error probability not ex-ceeding by feeding the lossy decoder output to a function thatoutputs the set of all source outcomes within distortion fromthe output of the original lossy decoder. In this sense,the set of all lossy codes is included in the set of all listcodes with list error probability and list size

(55)

Denote by

(56)

the optimal performance achievable among all randomized testsbetween probability distributions and

on (1 indicates that the test chooses ).4 In fact, need notbe a probability measure, it just needs to be -finite in order forthe Neyman–Pearson lemma and related results to hold.The hypothesis testing converse for channel coding [8, Th.

27] can be generalized to JSCC with list decoding as follows.

Theorem 4 (Converse): Fix and , and let be a-finite measure. The existence of an list code re-quires that

(57)

where the supremum is over all probability measures de-fined on the channel output alphabet .

Proof: Fix , the encoder , and an auxiliary -fi-nite conditional measure . Consider the (not necessarilyoptimal) test for deciding between and

which chooses if belongsto the decoder output list. Note that this is a hypothetical test,which has access to both the source outcome and the decoderoutput.According to , the probability measure generated by ,

the probability that the test chooses is given by

(58)

Since is the measure of the event that the testchooses when is true, and the optimal test cannotperform worse than the possibly suboptimal one that we se-lected, it follows that

(59)

Now, fix an arbitrary probability measure on . Choosing, the inequality in (59) can be weakened as fol-

lows:

(60)

(61)

(62)

(63)

4Throughout, , denote distributions, whereas , are used for the cor-responding probabilities of events on the underlying probability space.

KOSTINA AND VERDÚ: LOSSY JOINT SOURCE-CHANNEL CODING IN THE FINITE BLOCKLENGTH REGIME 2551

Optimizing the bound over and choosing that yieldsthe weakest bound in order to obtain a code-independent con-verse, (57) follows.

Remark 3: Similar to how Wolfowitz’s converse for channelcoding can be obtained from the meta-converse for channelcoding [8], the converse for almost-lossless JSCC in (51) canbe obtained by appropriately weakening (57) with . In-deed, invoking [8]

(64)

and letting in (57), where is the counting measureon , we have

(65)

(66)

which upon rearranging yields (51).In general, computing the infimum in (57) is challenging.

However, if the channel is symmetric (in a sense formalized inthe next result), is indepen-dent of .

Theorem 5 (Converse): Fix a probability measure . As-sume that the distribution of does not depend on

under either or . Then, the existence of anlist code requires that

(67)

where is arbitrary.Proof: The Neyman–Pearson lemma (e.g., [20]) implies

that the outcome of the optimum binary hypothesis test betweenand only depends on the observation through . In par-

ticular, the optimum binary hypothesis test for deciding be-tween and satisfies

(68)

For all , , we have

(69)

(70)

(71)

(72)

and

(73)

where1) (70) is due to (68);2) (71) uses the Markov property ;

Fig. 2. joint source-channel code.

3) (72) follows from the symmetry assumption on the distri-bution of ;

4) (73) is obtained similarly to (71).Since (72) and (73) imply that the optimal test achieves the sameperformance (that is, the same and )regardless of , we choose for somein the left-hand side of (57) to obtain (67).

Remark 4: In the case of finite channel input and output al-phabets, the channel symmetry assumption of Theorem 5 holds,in particular, if the rows of the channel transition probability ma-trix are permutations of each other, and is the equiprobabledistribution on the ( -dimensional) channel output alphabet,which, coincidentally, is also the capacity-achieving output dis-tribution. For Gaussian channels with equal power constraint,which corresponds to requiring the channel inputs to lie on thepower sphere, any spherically symmetric satisfies the as-sumption of Theorem 5.

IV. ACHIEVABILITY

Given a source code of size , and a channelcode of size , we may concatenate them to ob-tain the following subclass of the source-channel codes intro-duced in Definition 1.

Definition 8: An source-channel code is asource-channel code such that the encoder and decoder map-pings satisfy

(74)

(75)

where

(76)

(77)

(78)

(79)

(see Fig. 2).Note that an code is an code.The conventional SSCC paradigm corresponds to the spe-

cial case of Definition 8 in which the source codeis chosen without knowledge of and the channel code

is chosen without knowledge of and the distor-tion measure . A pair of source and channel codes is separationoptimal if the source code is chosen so as to minimize the distor-tion (average or excess) when there is no channel, whereas thechannel code is chosen so as to minimize the worst case (oversource distributions) average error probability

(80)

2552 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 5, MAY 2013

where and takes values on . Ifboth the source and the channel code are chosen separation op-timally for their given sizes, the separation principle guaranteesthat under certain quite general conditions (which encompassthe memoryless setting, see [21]) the asymptotic fundamentallimit of JSCC is achievable. In the finite blocklength regime,however, such SSCC construction is, in general, only subop-timal. Within the SSCC paradigm, we can obtain an achiev-ability result by further optimizing with respect to the choiceof :

Theorem 6 (Achievability, SSCC): Fix , , and . De-note by the minimum achievable worst case averageerror probability among all transmission codes of size , andthe minimum achievable probability of exceeding distortionwith a source code of size by .Then, there exists a source-channel code with

(81)

Bounds on and have been obtained recentlyin [8] and [9], respectively.5

Definition 8 does not rule out choosing the source code basedon the knowledge of or the channel code based on theknowledge of , , and . One of the interesting conclusionsin this paper is that the optimal dispersion of JSCC is achievablewithin the class of source-channel codes introduced inDefinition 8. However, the dispersion achieved by the conven-tional SSCC approach is in fact suboptimal.To shed light on the reason behind the suboptimality of

SSCC at finite blocklength despite its asymptotic optimality,we recall the reason SSCC achieves the asymptotic funda-mental limit. The output of the optimum source encoder is,for large , approximately equiprobable over a set of roughly

distinct messages, which allow us to representmost of the source outcomes within distortion . From thechannel coding theorem, we know that there exists a channelcode that is capable of distinguishing, with high probability,

messages when equippedwith the maximum likelihood (ML) decoder. Therefore, asimple concatenation of the source code and the channel codeachieves vanishing probability of distortion exceeding , forany . However, at finite , the output of theoptimum source encoder need not be nearly equiprobable, sothere is no reason to expect that a separated scheme employingan ML channel decoder, which does not exploit unequal mes-sage probabilities, would achieve near-optimal nonasymptoticperformance. Indeed, in the nonasymptotic regime, the gainafforded by taking into account the residual encoded source re-dundancy at the channel decoder is appreciable. The followingachievability result, obtained using independent random sourcecodes and random channel codes within the paradigm of Defi-nition 8, capitalizes on this intuition.

5As the maximal (over source outputs) error probability cannot be lower thanthe worst case error probability, the maximal error probability achievabilitybounds of [8] apply to bound . Moreover, the RCU bound on averageerror probability of [8], although stated assuming equiprobable source, is obliv-ious to the distribution of the source and thus upper bounds the worst case av-erage error probability as well.

Theorem 7 (Achievability): There exists a source-channel code with

(82)

where the expectations are with respect todefined on , where is the set of naturalnumbers.

Proof: Fix a positive integer . Fix a positive integer-valued random variable that depends on other random vari-ables only through and that satisfies . We will con-struct a code with separate encoders for source and channel andseparate decoders for source and channel as in Definition 8. Wewill perform a random coding analysis by choosing random in-dependent source and channel codes which will lead to the con-clusion that there exists an code with error probabilityguaranteed in (82) with . Observing that increasingcan only tighten the bound in (82) in which is restricted to

not exceed , we will let and conclude, by invokingthe bounded convergence theorem, that the support of in (82)need not be bounded.Source Encoder: Given an ordered list of representation

points , and having observed thesource outcome , the (probabilistic) source encoder generatesfrom and selects the lowest index

such that is within distance of . If no such index can befound, the source encoder outputs a preselected arbitrary index,e.g., . Therefore

(83)In a good JSCC code, would be chosen so largethat with overwhelming probability, a source outcome would beencoded successfully within distortion . It might seem coun-terproductive to let the source encoder in (83) give up beforereaching the end of the list of representation points, but in fact,such behavior helps the channel decoder by skewing the distri-bution of .Channel Encoder: Given a codebook ,

the channel encoder outputs if is the output of the sourceencoder

(84)

Channel Decoder: Define the random variablewhich is a function of , , and only

(85)

Having observed , the channel decoder chooses arbitrarilyamong the members of the set

(86)

AMAP decoder would multiply by .Whilethat decoder would be too hard to analyze, the product in (86) is

KOSTINA AND VERDÚ: LOSSY JOINT SOURCE-CHANNEL CODING IN THE FINITE BLOCKLENGTH REGIME 2553

a good approximation because and arerelated by

(87)

so the decoder in (86) differs from a MAP decoder only wheneither several are identical or there is no representation pointamong the first points within distortion of the source, bothunusual events.Source Decoder: The source decoder outputs if is the

output of the channel decoder

(88)

Error Probability Analysis: We now proceed to analyze theperformance of the code described previously. If there were nosource encoding error, a channel decoding error can occur if andonly if

(89)

Let the channel codebook be drawn i.i.d. from, and independent of the source codebook ,

which is drawn i.i.d. from . Denote by theexcess distortion probability attained with the source code-book and the channel codebook . Conditioned on theevent(no failure at the source encoder), the probability of excessdistortion is upper bounded by the probability that the channeldecoder does not choose , so

(90)

We now average (90) over the source and channel codebooks.Averaging the th term of the sum in (90) with respect to thechannel codebook yields

(91)where are distributed according to

(92)

Letting be an independent copy of and applying theunion bound to the probability in (91), we have that for anygiven

(93)

(94)

(95)

(96)

(97)

where (94) is due to .Applying (97) to (90) and averaging with respect to the source

codebook, we may write

(98)

where for brevity, we denoted the random variable

(99)

The expectation in the right-hand side of (98) is with respect to. It is equal to

(100)

(101)

(102)

where1) (100) applies Jensen’s inequality to the concave function

;2) (101) uses ;

3) (102) is due to , whereis nonnegative.

To evaluate the probability in the right-hand side of (98), notethat conditioned on , , is distributed as

(103)

2554 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 5, MAY 2013

where we denoted for brevity

(104)

Therefore

(105)

(106)

Applying (102) and (106) to (98) and invoking Shannon’srandom coding argument, (82) follows.

Remark 5: As we saw in the proof of Theorem 7, if we re-strict to take values on , then the bound on theerror probability in (82) is achieved in the class ofcodes. The code size that leads to tight achievability boundsfollowing from Theorem 7 is in general much larger than thesize that achieves the minimum in (81). In that SSCC case,is chosen so that lies between and so as tominimize the sum of source and channel decoding error prob-abilities without the benefit of a channel decoder that exploitsresidual source redundancy. In contrast, Theorem 8 is obtainedwith an approximate MAP decoder that allows a larger choicefor , even beyond . Still, we can achieve a goodtradeoff because the channel code employs unequal error pro-tection: those codewords with higher probabilities are more re-liably decoded.

Remark 6: Had we used the ML channel decoder in lieu of(86) in the proof of Theorem 7, we would conclude that acode exists with

(107)

which corresponds to the SSCC bound in (81) with the worstcase average channel error probability upper boundedusing the random coding union (RCU) bound of [8] and thesource error probability upper bounded using therandom coding achievability bound of [9].

Remark 7: Weakening (82) by letting , we obtain aslightly looser version of (107) in which in the exponentis replaced by . To get a generally tighter bound than thatafforded by SSCC, a more intelligent choice of is needed, asdetailed next in Theorem 8.

Theorem 8 (Achievability): There exists a source-channel code with

(108)

where the expectation is with respect to definedon .

Proof: We fix an arbitrary and choose

(109)

where is defined in (104). Observing that

(110)

(111)

(112)

we obtain (108) by weakening (82) using (109) and (112).

In the case of almost-lossless JSCC, the bound in Theorem8 can be sharpened as shown recently by Tauste Campo et al.[17].

Theorem 9 (Achievability, Almost-Lossless JSCC [17]):There exists a code with

(113)

where the expectation is with respect to defined on.

V. GAUSSIAN APPROXIMATION

In addition to the basic conditions (a)–(c) of Section II, in thissection, we impose the following restrictions.(i) The channel is stationary and memoryless,

. If the channel has an input cost function,then it satisfies .

(ii) The source is stationary and memoryless,, and the distortion measure is separable,

.(iii) The distortion level satisfies , where

is defined in (10), and ,where the average is with respect to the unconditional dis-tribution of . The excess distortion probability satisfies

.(iv) where the average is with respect to

and is the output distribution correspondingto the minimizer in (9).

The technical condition (iv) ensures applicability of theGaussian approximation in the following result.

Theorem 10 (Gaussian Approximation): Under restrictions(i)–(iv), the parameters of the optimal code satisfy

(114)

where1) is the source dispersion given by

(115)

KOSTINA AND VERDÚ: LOSSY JOINT SOURCE-CHANNEL CODING IN THE FINITE BLOCKLENGTH REGIME 2555

2) is the channel dispersion given in the following.

a) If and are finite and the channel has no cost con-straints

(116)

(117)

where and are the capacity-achieving input andoutput random variables.

b) If the channel is Gaussian with either equal or max-imal power constraint

(118)

where is the signal-to-noise ratio.3) The remainder term satisfies the following.

a) If and are finite, the channel has no cost con-straints and

(119)

(120)

where

(121)

(122)

In (122), denotes differentiation with respect to ;is defined by

(123)

(cf., Definition 5) and .b) If and are finite, the channel has no cost con-straints and , (120) still holds, while (119) isreplaced with

(124)

c) If the channel is such that the (conditional) distribu-tion of does not depend on (no costconstraint), then .6

d) If the channel is Gaussian with equal or maximalpower constraint, (120) still holds, and (119) holdswith .

e) In the almost-lossless case, , and pro-vided that the third absolute moment of is finite,(114) and (119) still hold, while (120) strengthens to

(125)

6Added in proof: We have shown that the symmetricity condition is actuallysuperfluous.

Proof:1) Appendixes C-A and C-B show the converses in (119) and(124) for cases and , respectively, usingTheorem 3.

2) Appendix C-C shows the converse for the symmetricchannel (3c) using Theorem 2.

3) Appendix C-D shows the converse for the Gaussianchannel (3d) using Theorem 2.

4) Appendix D-A shows the achievability result for almostlossless coding (3e) using Theorem 9.

5) Appendix D-B shows the achievability result in (120) forthe DMC using Theorem 8.

6) Appendix D-C shows the achievability result for theGaussian channel (3d) using Theorem 8.

Remark 8: If the channel and the data compression codes aredesigned separately, we can invoke channel coding [8] and lossycompression [9] results in (1) and (2) to show that (cf., (4))

(126)

Comparing (126) to (114), observe that if either the channel orthe source (or both) have zero dispersion, the JSCC dispersioncan be achieved by separate coding. In that special case, eitherthe -tilted information or the channel information density is soclose to being deterministic that there is no need to account forthe true distributions of these random variables, as a good jointsource-channel code would do.The Gaussian approximations of JSCC and SSCC rates in

(114) and (126), respectively, admit the following heuristic in-terpretation when is large (thus, so is ): since the source isstationary and memoryless, the normalized -tilted information

becomes approximately Gaussian with meanand variance . Likewise, the conditional normal-

ized channel information density is, forlarge , , approximately Gaussian with mean and variancefor all typical according to the capacity-achieving

distribution. Since a good encoder chooses such inputs for (al-most) all source realizations, and the source and the channelare independent, the random variable is approximatelyGaussian with mean and variance ,and (114) reflects the intuition that under JSCC, the source isreconstructed successfully within distortion if and only if thechannel information density exceeds the source -tilted infor-mation, that is, . In contrast, in SSCC, the source isreconstructed successfully with high probability if fallsin the intersection of half-planes for some

, which is the capacity of the noiseless link betweenthe source and the channel code block that can be chosen so asto maximize the probability of that intersection, as reflected in(126). Since in JSCC the successful transmission event is strictlylarger than in SSCC, i.e., , sep-arate source/channel code design incurs a performance loss. Itis worth pointing out that leads to successful recon-struction even within the paradigm of the codes in Definition 8because, as explained in Remark 5, unlike the SSCC case, it is

2556 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 5, MAY 2013

not necessary that lie between and for successful re-construction.

Remark 9: Using Theorem 10, it can be shown that

V(127)

where the rate-dispersion function of JSCC is found as (recallDefinition 4)

V (128)

Remark 10: Under regularity conditions similar to those in[9, Th. 14], it can be shown that

W

(129)where the distortion-dispersion function of JSCC is given by

W (130)

Remark 11: If the basic conditions (b) and/or (c) fail so thatthere are several distributions and/or several thatachieve the rate-distortion function and the capacity, respec-tively, then , for

V V (131)

W W (132)

where the minimum is taken over and , and V(resp. W ) denotes (128) (resp. (130)) computed with

and . The reason for possibly lower achievable dis-persion in this case is that we have the freedom to map the un-likely source realizations leading to high probability of failureto those codewords resulting in the maximum variance so as toincrease the probability that the channel output escapes the de-coding failure region.

Remark 12: The dispersion of the Gaussian channel is givenby (118), regardless of whether an equal or a maximal powerconstraint is imposed. An equal power constraint corresponds tothe subset of allowable channel inputs being the power sphere

(133)

where is the noise power. In a maximal power constraint,(133) is relaxed replacing “ ” with “ .”Specifying the nature of the power constraint in the subscript,

we remark that the bounds for the maximal constraint can beobtained from the bounds for the equal power constraint via thefollowing relation:

(134)

Fig. 3. Rate-dispersion function for the transmission of a BMS over a BSCwithas a function of in . It increases unboundedly

as , and vanishes as or .

where the right-most inequality is due to the following ideadating back to Shannon: a code with a maximalpower constraint can be converted to a codewith an equal power constraint by appending an thcoordinate to each codeword to equalize its total power to

. From (134), it is immediate that the channel dispersionsfor maximal or equal power constraints must be the same.

VI. LOSSY TRANSMISSION OF A BMS OVER A BSC

In this section, we particularize the bounds in Sections III andIV and the approximation in Section V to the transmission of aBMS with bias over a BSC with crossover probability . Thetarget bit error rate satisfies .The rate-distortion function of the source and the channel ca-

pacity are given, respectively, by

(135)

(136)

The source and the channel dispersions are given by [8], [9]

(137)

(138)

where note that (137) does not depend on . The rate-dispersionfunction in (128) is plotted in Fig. 3.Throughout this section, denotes the Hamming weight

of the binary -vector , and denotes a binomial randomvariable with parameters and , independent of all otherrandom variables.For convenience, we define the discrete random variable

by

(139)

In particular, substituting and in (139), we ob-serve that the terms in the right-hand side of (139) are zero-mean

KOSTINA AND VERDÚ: LOSSY JOINT SOURCE-CHANNEL CODING IN THE FINITE BLOCKLENGTH REGIME 2557

random variables whose variances are equal to and ,respectively.Furthermore, the binomial sum is denoted by

(140)

A straightforward particularization of the -tilted informationconverse in Theorem 2 leads to the following result.

Theorem 11 (Converse, BMS-BSC): Any code fortransmission of a BMS with bias over a BSC with bias mustsatisfy

(141)

Proof: Let , which is the equiprobable distri-bution on . An easy exercise reveals that

(142)

(143)

(144)

Since is distributed as regardless of, and is distributed as , the condition in The-

orem 2 is satisfied, and (37) becomes (141).

The hypothesis-testing converse in Theorem 4 particularizesto the following result.

Theorem 12 (Converse, BMS-BSC): Any code fortransmission of a BMS with bias over a BSC with bias mustsatisfy

(145)

where and scalar are uniquely defined by

(146)

Proof: As in the proof of Theorem 11, we let bethe equiprobable distribution on , . Sinceunder , is distributed as , and under

, is distributed as , irrespective of thechoice of , the distribution of the information den-sity in (144) does not depend on the choice of under eithermeasure, so Theorem 5 can be applied. Further, we chooseto be the equiprobable distribution on and observe thatunder , the random variable in (143) has the samedistribution as , while under , it has the same distribu-tion as . Therefore, the log-likelihood ratio for testing be-

tween and has the same distributionas (“ ” denotes equality in distribution)

(147)

(148)

so is equal to the left-handside of (145). Finally, matching the size of the list to the fidelityof reproduction using (55), we find that is equal to the right-hand side of (145).

If the source is equiprobable, the bound in Theorem 12 be-comes particularly simple, as the following result details.

Theorem 13 (Converse, EBMS-BSC): For , if thereexists a joint source-channel code, then

(149)

where

(150)

and is the solution to

(151)The achievability result in Theorem 8 is particularized as fol-

lows.

Theorem 14 (Achievability, BMS-BSC): There exists anjoint source-channel code with

(152)

where(153)

and is defined as

(154)

with

otherwise(155)

(156)

(157)

Proof: We weaken the infima over and in (108)by choosing them to be the product distributions generated bythe capacity-achieving channel input distribution and the rate-

2558 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 5, MAY 2013

Fig. 4. Rate-blocklength tradeoff for the transmission of a BMS with biasover a BSC with crossover probability and ,.

distortion function-achieving reproduction distribution, respec-tively, i.e., is equiprobable on , and

, where . As shown in [9, proof of Theorem21]

(158)

On the other hand, is distributed as , so (152)follows by substituting (144) and (158) into (108).

In the special case of the BMS-BSC, Theorem 10 can bestrengthened as follows.

Theorem 15 (Gaussian Approximation, BMS-BSC): The pa-rameters of the optimal code satisfy (114) where

, , , and are given by (135), (136), (137), and(138), respectively, and the remainder term in (114) satisfies

(159)

(160)

if , and

(161)

(162)

if .Proof: An asymptotic analysis of the converse bound in

Theorem 12 akin to that found in [9, proof of Theorem 23] leadsto (159) and (161). An asymptotic analysis of the achievabilitybound in Theorem 14 similar to the one found in [9, AppendixG] leads to (160). Finally, (162) is the same as (125).

The bounds and the Gaussian approximation (in which wetake ) are plotted in Fig. 4 , Fig. 5 (fair bi-nary source, ), and Fig. 6 (biased binary source, ).A source of fair coin flips has zero dispersion, and as antici-pated in Remark 8, JSSC does not afford much gain in the fi-nite blocklength regime (see Fig. 5). Moreover, in that case, the

Fig. 5. Rate-blocklength tradeoff for the transmission of a fair BMS over aBSC with crossover probability and .

Fig. 6. Rate-blocklength tradeoff for the transmission of a BMS with biasover a BSC with crossover probability and ,

.

JSCC achievability bound in Theorem 8 is worse than the SSCCachievability bound. However, the more general achievabilitybound in Theorem 7 with the choice , as detailed inRemark 7, nearly coincides with the SSCC curve in Fig. 5, pro-viding an improvement over Theorem 8. The situation is dif-ferent if the source is biased, with JSCC showing significantgain over SSCC (see Figs. 4 and 6).

VII. TRANSMISSION OF A GMS OVER AN AWGN CHANNEL

In this section, we analyze the setup where the GMSis transmitted over an AWGN channel, which, upon

receiving an input , outputs , where. The encoder/decoder must satisfy two constraints:

the fidelity constraint and the cost constraint.

KOSTINA AND VERDÚ: LOSSY JOINT SOURCE-CHANNEL CODING IN THE FINITE BLOCKLENGTH REGIME 2559

1) the MSE distortion exceeds with probabilityno greater than ;

2) each channel codeword satisfies the equal power constraintin (133).7

The capacity-cost function and the rate-distortion functionare given by

(163)

(164)

The source dispersion is given by [9]

(165)

while the channel dispersion is given by (118) [8].In the rest of the section, denotes a noncentral chi-square

distributed random variable with degrees of freedom and non-centrality parameter , independent of all other random vari-ables, and denotes its probability density function.A straightforward particularization of the -tilted information

converse in Theorem 2 leads to the following result.

Theorem 16 (Converse, GMS-AWGN): If there exists acode, then

(166)where

(167)

Observe that the terms to the left of the “ ” sign inside theprobability in (166) are zero-mean random variables whosevariances are equal to and , respectively.

Proof: The spherically symmetric, where is the capacity-

achieving output distribution, satisfies the symmetry assump-tion of Theorem 2. More precisely, it is not hard to show (see[8, (205)]) that for all , has thesame distribution under as

(168)

The -tilted information in is given by

(169)

Plugging (168) and (169) into (37), (166) follows.

The hypothesis testing converse in Theorem 5 is particular-ized as follows.

Theorem 17 (Converse, GMS-AWGN):

(170)

7See Remark 12 in Section V for a discussion of the close relation betweenan equal and a maximal power constraint.

where is the solution to

(171)

Proof: As in the proof of Theorem 16, we let. Under ,

the distribution of is that of (168), whileunder , it has the same distribution as (cf., [8, (204)])

(172)

Since the distribution of does not depend onthe choice of according to either measure, Theorem 5applies. Further, choosing to be the Lebesgue measure on, i.e., , observe that

(173)Now, (170) and (171) are obtained by integrating

(174)

with respect to and ,respectively.

The bound in Theorem 8 can be computed as follows.

Theorem 18 (Achievability, GMS-AWGN): There exists acode such that

(175)

where

(176)

(177)

and is defined by

(178)

where

otherwise

(179)

2560 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 5, MAY 2013

Proof: We compute an upper bound to (108) for thespecific case of the transmission of a GMS over an AWGNchannel. First, we weaken the infimum over in (108)by choosing to be the uniform distribution on the sur-face of the -dimensional sphere with center at and radius

. We showed in [9, proof of Theorem 37]

(see also [14] and [23]) that

(180)

which takes care of the source random variable in (108).We proceed to analyze the channel random variable

. Observe that since lies on thepower sphere and the noise is spherically symmetric,

has the same distribution as ,where is an arbitrary point on the surface of the powersphere. Letting , we see that

has the noncen-tral chi-squared distribution with degrees of freedom andnoncentrality parameter . To simplify calculations, weexpress the information density as

(181)

where . The distribution ofis the same as (168). Further, due to the

spherical symmetry of both and , as discussed previ-ously, we have (recall that “ ” denotes equality in distribution)

(182)

which is bounded uniformly in as observed in [22, (425),(435)]; thus, (176) is finite, and (175) follows.

The following result strengthens Theorem 10 in the specialcase of the GMS-AWGN.

Theorem 19 (Gaussian Approximation, GMS-AWGN): Theparameters of the optimal code satisfy (114) where

, , , are given by (163), (164), (165), and (118),respectively, and the remainder term in (114) satisfies

(183)

(184)

Proof: An asymptotic analysis of the converse bound inTheorem 17 similar to that found in [9, proof of Theorem 40]leads to (183). An asymptotic analysis of the achievabilitybound in Theorem 18 similar to [9, Appendix K] leads to (184).

Numerical evaluation of the bounds reveals that JSCC notice-ably outperforms SSCC in the displayed region of blocklengths(see Fig. 7).

VIII. TO CODE OR NOT TO CODE

Our goal in this section is to compare the excess distortionperformance of the optimal code of rate 1 at channel blocklength

Fig. 7. Rate-blocklength tradeoff for the transmission of a GMS with

over an AWGN channel with , .

with that of the optimal symbol-by-symbol code, evaluatedafter channel uses, leveraging the bounds in Sections III andIV and the approximation in Section V. We show certain ex-amples in which symbol-by-symbol coding is, in fact, eitheroptimal or very close to being optimal. A general conclusiondrawn from this section is that even when no coding is asymp-totically suboptimal, it can be a very attractive choice for shortblocklengths [2].

A. Performance of Symbol-by-Symbol Source-Channel Codes

Definition 9: An symbol-by-symbol code is ancode (according to Definition 2) that satis-

fies(185)

(186)

for some pair of functions and .The minimum excess distortion achievable with symbol-by-

symbol codes at channel blocklength , excess probability ,and cost is defined by

(187)

Definition 10: The distortion-dispersion function ofsymbol-by-symbol JSCC is defined as

W (188)

where is the distortion-rate function of the source.As earlier, if there is no channel input-cost constraint

( for all ), we will simplify the notationand write for and W for W . Inaddition to restrictions (i)–(iv) of Section V, we assume thatthe channel and the source are probabilistically matched in thefollowing sense (cf., [7]).(v) There exist , , and such that and

generated by the joint distribution

KOSTINA AND VERDÚ: LOSSY JOINT SOURCE-CHANNEL CODING IN THE FINITE BLOCKLENGTH REGIME 2561

achieve the capacity-cost function and the distor-tion-rate function , respectively.

Condition (v) ensures that symbol-by-symbol transmission at-tains the minimum average (over source realizations) distortionachievable among all codes of any blocklength. The followingresults pertain to the full distribution of the distortion incurredat the receiver output and not just its mean.

Theorem 20 (Achievability, Symbol-by-Symbol Code): Underrestrictions (i)–(v), if

(189)

where , and achieves, then there exists an symbol-by-symbol

code (average cost constraint).Proof: If (v) holds, then there exist a symbol-by-symbol

encoder and decoder such that the conditional distribution ofthe output of the decoder given the source outcome coincideswith distribution , so the excess distortion probability ofthis symbol-by-symbol code is given by the left-hand side of(189).

Theorem 21 (Converse, Symbol-by-Symbol Code): Under re-striction (i) and separable distortion measure, the parametersof any symbol-by-symbol code (average cost con-straint) must satisfy

(190)

where .Proof: The excess distortion probability at blocklength ,

distortion , and cost achievable among all single-letter codesmust satisfy

(191)

(192)

where (192) holds since impliesby the data processing inequality. The right-hand side of

(192) is lower bounded by the right-hand side of (190) becauseholds for all with , and the

distortion measure is separable.

Theorem 22 (Gaussian Approximation, Optimal Symbol-by-Symbol Code): Assume . Under restrictions(i)–(v)

W

(193)

W (194)

where(195)

Moreover, if there is no power constraint

(196)

W W (197)

where is that in Theorem 10.If and , are finite, then

(198)

Proof: Since the third absolute moment of isfinite, the achievability part of the result, namely, (193) withthe remainder satisfying (195), follows by a straightforward ap-plication of the Berry–Esseen bound to (189), provided that

. If , it follows triv-ially from (189).To show the converse in (196), observe that since the set of all

codes includes all symbol-by-symbol codes,we have . Since is positive ornegative depending on whether or , using (130),we conclude that we must necessarily have (197), which is, infact, a consequence of conditions (b) and (c) in Section II. Now,(196) is simply the converse part of (129).The proof of the refined converse in (198) is relegated to

Appendix E.In the absence of a cost constraint, Theorem 22 shows that if

the source and the channel are probabilistically matched in thesense of [7], then not only does symbol-by-symbol transmis-sion achieve the minimum average distortion, but also the dis-persion of JSCC (see (197)). In other words, not only do suchsymbol-by-symbol codes attain the minimum average distortionbut also the variance of distortions at the decoder’s output is theminimum achievable among all codes operating at that averagedistortion. In contrast, if there is an average cost constraint, thesymbol-by-symbol codes considered in Theorem 22 probablydo not attain the minimum excess distortion achievable amongall blocklength- codes, not even asymptotically. Indeed, as ob-served in [24], for the transmission of an equiprobable sourceover an AWGN channel under the average power constraint andthe average block error probability performance criterion, thestrong converse does not hold and the second-order term is oforder , not , as in (193).Two conspicuous examples that satisfy the probabilistic

matching condition (v), so that symbol-by-symbol coding isoptimal in terms of average distortion, are the transmissionof a binary equiprobable source over a BSC provided that thedesired bit error rate is equal to the crossover probability ofthe channel [25, Sec. 11.8], [26, Problem 7.16], and the trans-mission of a Gaussian source over an AWGN channel underthe mean-square error distortion criterion, provided that thetolerable source signal-to-noise ratio attainable by an estimatoris equal to the signal-to-noise ratio at the output of the channel[27]. We dissect these two examples next. After that, we willdiscuss two additional examples where uncoded transmissionis optimal.

2562 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 5, MAY 2013

B. Uncoded Transmission of a BMS Over a BSC

In the setup of Section VI, if the binary source is unbiased, then , , and

. If the encoder and the decoder are both identity mappings(uncoded transmission), the resulting joint distribution satisfiescondition (v). As is well known, regardless of the blocklength,the uncoded symbol-by-symbol scheme achieves the minimumbit error rate (averaged over source and channel). Here, we areinterested instead in examining the excess distortion probabilitycriterion. For example, consider an application where, if thefraction of erroneously received bits exceeds a certain threshold,then the entire output packet is useless.Using (130) and (194), it is easy to verify that

W W (199)

that is, uncoded transmission is optimal in terms of dispersion,as anticipated in (197). Moreover, the uncoded transmission at-tains the minimum bit error rate threshold achievableamong all codes operating at blocklength , regardless of the al-lowed , as the following result demonstrates.

Theorem 23 (BMS-BSC, Symbol-by-Symbol Code): Considerthe symbol-by-symbol scheme which is uncoded if andwhose decoder always outputs the all-zero vector if . Itachieves at blocklength and excess distortion probability ,regardless of ,

(200)

Moreover, if the source is equiprobable

(201)

Proof: Direct calculation yields (200). To show (201), letus compare with the conditions imposed on byTheorem 13. Comparing (200) to (150), we see that either1) equality in (200) is achieved, , , and (plug-ging into (149))

(202)

thereby implying that , or2) , , and (149) becomes

(203)

which also implies . To see this, note thatwould imply since is an integer, whichin turn would require (according to (203)) that ,which is impossible.

For the transmission of the fair binary source over a BSC,Fig. 8 shows the distortion achieved by the uncoded scheme, the

Fig. 8. Distortion-blocklength tradeoff for the transmission of a fair BMS overa BSC with crossover probability and , .

separated scheme, and the JSCC scheme of Theorem 14 versusfor a fixed excess distortion probability . The no

coding/converse curve in Fig. 8 depicts one of those singularcases where the nonasymptotic fundamental limit can be com-puted precisely. As evidenced by this curve, the fundamentallimits need not be monotonic with blocklength.Fig. 9(a) shows the rate achieved by separate coding when

is fixed, and the excess distortion probability , shownin Fig. 9(b), is set to be the one achieved by uncoded transmis-sion, namely, (200). Fig. 9(a) highlights the fact that at shortblocklengths (say ), SSCC is vastly suboptimal. As theblocklength increases, the performance of the separated schemeapproaches that of the no-coding scheme, but according to The-orem 23, it can never outperform it. Had we allowed the excessdistortion probability to vanish sufficiently slowly, the JSCCcurve would have approached the Shannon limit as .However, in Fig. 9(a), the exponential decay in is such thatthere is indeed an asymptotic rate penalty as predicted in [11].For the biased binary source with and BSC with

crossover probability 0.11, Fig. 10 plots themaximum distortionachieved with probability 0.99 by the uncoded scheme, which inthis case is asymptotically suboptimal. Nevertheless, uncodedtransmission performs remarkably well in the displayed rangeof blocklengths, achieving the converse almost exactly at block-lengths less than 100, and outperforming the JSCC achievabilityresult in Theorem 14 at blocklengths as long as 700. This ex-ample substantiates that even in the absence of a probabilisticmatch between the source and the channel, symbol-by-symboltransmission, though asymptotically suboptimal, might outper-form SSCC and even our random JSCC achievability bound inthe finite blocklength regime.

C. Symbol-by-Symbol Coding for Lossy Transmission of aGMS Over an AWGN Channel

In the setup of Section VII, using (163) and (164), we findthat

(204)

KOSTINA AND VERDÚ: LOSSY JOINT SOURCE-CHANNEL CODING IN THE FINITE BLOCKLENGTH REGIME 2563

Fig. 9. Rate-blocklength tradeoff (a) for the transmission of a fair BMS over aBSC with crossover probability and . The excess distortionprobability is set to be the one achieved by the (b) uncoded scheme.

Fig. 10. Distortion-blocklength tradeoff for the transmission of a BMS withover a BSC with crossover probability and , .

Fig. 11. Distortion-blocklength tradeoff for the transmission of a GMS over anAWGN channel with and , .

The next result characterizes the distribution of the distortionincurred by the symbol-by-symbol scheme that attains the min-imum average distortion.

Theorem 24 (GMS-AWGN, Symbol-by-Symbol Code): Thefollowing symbol-by-symbol transmission scheme in which theencoder and the decoder are the amplifiers

(205)

(206)

is an symbol-by-symbol code (with average costconstraint) such that

(207)

where is chi-square distributed with degrees of freedom.Note that (207) is a particularization of (189). Using (207), wefind that

W (208)

On the other hand, using (130), we compute

W (209)

W (210)

The difference between (210) and (197) is due to the fact thatthe optimal symbol-by-symbol code in Theorem 24 obeys an av-erage power constraint, rather than the more stringent maximalpower constraint of Theorem 10; so, it is not surprising that forthe practically interesting case , the symbol-by-symbolcode can outperform the best code obeying the maximal powerconstraint. Indeed, in the range of blocklengths displayed inFig. 11, the symbol-by-symbol code even outperforms the con-verse for codes operating under a maximal power constraint.

2564 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 5, MAY 2013

D. Uncoded Transmission of a Discrete Memoryless Source(DMS) Over a Discrete Erasure Channel (DEC) UnderErasure Distortion Measure

For a discrete source, the single-letter erasure dis-tortion measure is defined as the following mapping

:8

(211)

For any , the rate-distortion function of theequiprobable source is achieved by

(212)

The rate-distortion function and the -tilted information for theequiprobable source with the erasure distortion measure aregiven, respectively, by

(213)

(214)

Note that, trivially, Thechannel that is matched to the equiprobable DMS with the era-sure distortion measure is the DEC, whose single-letter transi-tion probability kernel is

(215)

and whose capacity is given by , achieved byequiprobable . For equiprobable on , we find that

, and

W (216)

E. Symbol-by-Symbol Transmission of a DMS Over a DECUnder Logarithmic Loss

Let the source alphabet be finite, and let the reproductionalphabet be the set of all probability distributions on . Thesingle-letter logarithmic loss distortion measureis defined by [29], [30]

(217)

Curiously, for any , the rate-distortion functionand the -tilted information are given respectively by (213) and(214), even if the source is not equiprobable. In fact, the rate-distortion function is achieved by

(218)

and the channel that is matched to the equiprobable source underlogarithmic loss is exactly the DEC in (215). Of course, un-like Section VIII-D, the decoder we need is a simple one-to-onefunction that outputs if the channel output is , and oth-erwise, where is the output of the DEC. Finally, it is easy

8The distortion measure in (211) is a scaled version of the erasure distortionmeasure found in the literature, e.g., [28].

to verify that the distortion-dispersion function of symbol-by-symbol coding under logarithmic loss is the same as that undererasure distortion and is given by (216).

IX. CONCLUSION

In this paper, we gave a nonasymptotic analysis of JSCCincluding several achievability and converse bounds, whichhold in wide generality and are tight enough to determine thedispersion of JSCC for the transmission of an abstract memo-ryless source over either a DMC or a Gaussian channel, underan arbitrary fidelity measure. We also investigated the penaltyincurred by SSCC using both the source-channel dispersion andthe particularization of our new bounds to 1) the binary sourceand the BSC with bit error rate fidelity criterion and 2) theGaussian source and Gaussian channel under mean-square errordistortion. Finally, we showed cases where symbol-by-symbol(uncoded) transmission beats any other known scheme inthe finite blocklength regime even when the source-channelmatching condition is not satisfied.The approach taken in this paper to analyze the nonasymp-

totic fundamental limits of lossy JSCC is two-fold. Our newachievability and converse bounds apply to abstract sources andchannels and allow for memory, while the asymptotic analysisof the new bounds leading to the dispersion of JSCC is focusedon the most basic scenario of transmitting a stationary memory-less source over a stationary memoryless channel.The major results and conclusions are the following.1) A general new converse bound (Theorem 3) leverages theconcept of -tilted information (Definition 5), a randomvariable which corresponds (in a sense that can be formal-ized [9], [31]) to the number of bits required to represent agiven source outcome within distortion , and whose rolein lossy compression is on a par with that of information(in (20)) in lossless compression.

2) The converse result in Theorem 4 capitalizes on two simpleobservations, namely, that any lossy code can be con-verted to a list code with list error probability , and that abinary hypothesis test between and an auxiliary dis-tribution on the same space can be constructed by choosing

when there is no list error. We have generalized theconventional notion of list to allow the decoder to output apossibly uncountable set of source realizations.

3) As evidenced by our numerical results, the converse resultin Theorem 5, which applies to those channels satisfyinga certain symmetry condition and which is a consequenceof the hypothesis testing converse in Theorem 4, can out-perform the -tilted information converse in Theorem 3.Nevertheless, it is Theorem 3 that lends itself to analysismore easily and that leads to the JSCC dispersion for thegeneral DMC.

4) Our random-coding-based achievability bound (Theorem7) provides insights into the degree of separation betweenthe source and the channel codes required for optimal per-formance in the finite blocklength regime. More precisely,it reveals that the dispersion of JSCC can be achieved inthe class of JSCC codes (Definition 7). As inSSCC, in coding, the inner channel coding blockis connected to the outer source coding block by a noise-

KOSTINA AND VERDÚ: LOSSY JOINT SOURCE-CHANNEL CODING IN THE FINITE BLOCKLENGTH REGIME 2565

less link of capacity , but unlike SSCC, the channel(resp. source) code can be chosen based on the knowledgeof the source (resp. channel). The conventional SSCC inwhich the source code is chosen without the knowledgeof the channel and the channel code is chosen withoutthe knowledge of the source, although known to achievethe asymptotic fundamental limit of JSCC under certainquite weak conditions, is in general suboptimal in the fi-nite blocklength regime.

5) Since , bounds foraverage distortion can be obtained by integrating ourbounds on excess distortion. Note, however, that the codethat minimizes depends on . Since thedistortion cumulative distribution function (cdf) of anysingle code does not majorize the cdfs of all possiblecodes, the converse bound on the average distortion ob-tained through this approach, although asymptoticallytight, may be loose at short blocklengths. Likewise, re-garding achievability bounds (e.g., (93)), the optimizationover channel and source random codes, and , mustbe performed after the integration so that the choice ofcode does not depend on the distortion threshold .

6) For the transmission of a stationary memoryless sourceover a stationary memoryless channel, the Gaussian ap-proximation in Theorem 10 (neglecting the remainder

) provides a simple estimate of the maximalnonasymptotically achievable JSCC rate. Appealingly, thedispersion of joint source-channel coding decomposes intotwo terms: the channel dispersion and the source disper-sion. Thus, only two channel attributes, the capacity anddispersion, and two source attributes, the rate-distortionand rate-dispersion functions, are required to compute theGaussian approximation to the maximal JSCC rate.

7) In those curious cases where the source and the channelare probabilistically matched so that symbol-by-symbolcoding attains the minimum possible average distortion,Theorem 22 ensures that it also attains the dispersion ofJSCC, that is, symbol-by-symbol coding results in the min-imum variance of distortions among all codes operating atthat average distortion.

8) Even in the absence of a probabilistic match between thesource and the channel, symbol-by-symbol transmission,though asymptotically suboptimal, might outperformSSCC and joint source-channel random coding in thefinite blocklength regime.

APPENDIX ABERRY–ESSEEN THEOREM

The following result is an important tool in the Gaussian ap-proximation analysis.

Theorem 25 (Berry-Esseen CLT, e.g., [32, Ch. XVI.5, Th.2]): Fix a positive integer . Let be indepen-dent. Then, for any real

(219)

where

(220)

(221)

(222)

(223)

and ( for identi-cally distributed ).

APPENDIX BAUXILIARY RESULT ON THE MINIMIZATION OF THE

INFORMATION SPECTRUM

Given a finite set , we say that has type ifthe number of times each letter is encountered inis . Let be the set of all distributions on , which issimply the standard simplex in . For an arbitrarysubset , denote by the set of distributions in thatare also -types, i.e.,

(224)

Denote by the minimum Euclidean distance approxima-tion of in the set of -types, i.e.,

(225)

Let be the set of capacity-achieving distributions9

(226)

Denote the minimum (maximum) information variancesachieved by the distributions in by

(227)

(228)

and let be the set of capacity-achieving distributionsthat achieve the minimum information variance

(229)

and analogously for the distributions in with maximalvariance. Lemma 1 below allows us to show that in the mem-oryless case, the infimum inside the expectation in (38) with

and , whereis the output distribution induced by the type , is ap-

proximately attained by those sequences whose type is closestto the capacity-achieving distribution (if it is nonunique,

is chosen appropriately based on the information variance

9In this appendix, we dispose of the assumption (c) in Section II-A that thecapacity-achieving input distribution is unique.

2566 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 5, MAY 2013

Fig. 12. Example where (213) holds with equality.

it achieves). This technical result is the key to proving the con-verse part of Theorem 10.

Lemma 1: There exist such that for all sufficientlylarge , we have the following.1) , then there exists such that for

(230)

where (230) holds for any withfor if and if .

2) If , then for all and

(231)

The information densities in the left-hand sides of (230)and (231) are computed with , whereis induced by the type of , i.e.,

, and that in the right-hand side of (230) is

computed with , where is induced by

the type of , i.e., . Theindependent random variables in the left-hand sides of(230) and (231) have distribution , while in theright-hand side of (230) have distribution .

In order to prove Lemma 1, we first show three auxiliarylemmas. The first two deal with approximate optimization offunctions.If and approximate each other, and the minimum of is

approximately attained at , then is also approximately mini-mized at , as the following lemma formalizes.

Lemma 2: Fix , . Let be an arbitrary set, andlet and be such that

(232)

Further, assume that and attain their minima. Then

(233)

as long as satisfies

(234)

(see Fig. 12).

Proof of Lemma 2: Let be such that. Using (232) and (234), write

(235)

(236)

(237)

(238)

(239)

The following lemma is reminiscent of [8, Lemma 64].

Lemma 3: Let be a compact metric space, and letbe a metric. Fix and . Let

(240)

Suppose that for some constants , we have for all

(241)

(242)

Then, for any positive scalars ,

(243)

Moreover, if, instead of (241), satisfies

(244)

then, for any positive scalars , such that

(245)

we have

(246)

Proof of Lemma 3: Let achieve the maximum on theleft-hand side of (243). Using (241) and (242), we have for all

(247)

(248)

(249)

where (229) follows because the maximum of (228) is achievedat .To show (246), observe using (244) and (242) that

(250)

(251)

(252)

where (252) follows from (245).

The following lemma deals with asymptotic behavior of the-function.

KOSTINA AND VERDÚ: LOSSY JOINT SOURCE-CHANNEL CODING IN THE FINITE BLOCKLENGTH REGIME 2567

Lemma 4: Fix , . Then, there exists(explicitly computed in the proof) such that for alland all large enough

(253)

Proof of Lemma 4: is convex for , and

, so for ,

(254)

while for arbitrary and

(255)

If , we use (254) to obtain

(256)

(257)

(258)

(259)

where (259) holds for large enough because the maximum of(258) is attained at .If , we use (255) to obtain

(260)

(261)

If , we use to obtain

(262)

(263)

(264)

(265)

(266)

where (265) is due to in , and

(266) holds because the maximum of (265) is attained at .We are now equipped to prove Lemma 1.

Proof of Lemma 1: Define the following functions:

(267)

(268)

(269)

If , then for each , there are oc-currences of among the . Inthe sequel, we will invoke Theorem 25 withwhere is a given sequence, and (220)–(222) become

(270)

(271)

(272)

(273)

(274)

Define the (Euclidean) -neighborhood of the set of capacity-achieving distributions

(275)

We split the domain of the minimization in the left-hand sideof (230) into two sets, and

(recall notation (224)), for an appropriately chosen.

We now show that (230) holds for all if the mini-mization is restricted to types in , where is arbi-trary, and

(276)

By Chebyshev’s inequality, for all whose type belongs to

(277)

(278)

(279)

(280)

(281)

(282)

2568 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 5, MAY 2013

where in (279), we used

(283)

and

(284)

Note that by Property 1 below. Therefore

(285)

(286)

(287)

We conclude that (230) holds if the minimization is restricted totypes in .Without loss of generality, we assume that all outputs in are

accessible (which implies that for all ) andchoose so that for all and

(288)

where . We recall the following properties ofthe functions , , and from [8, Appendixes E and I].

Property 1: The functions , , and arecontinuous on the compact set , and therefore bounded andachieve their extrema.

Property 2: There exists such that for all

(289)

Property 3: In , the functions , andare infinitely differentiable.

Property 4: In , .Due to Property 3, there exist nonnegative constants andsuch that for all

(290)

(291)

To treat the case , we will need to choosecarefully and to consider the cases andseparately.

A) : We decrease until, in addition to (288),

(292)

is satisfied.We now show that (230) holds if theminimization is restricted

to types in , for all , for an appropriatelychosen . Using (292) and boundedness of , write

(293)

where

(294)

Therefore, for any with , the Berry–Es-seen bound yields

(295)where

(296)

We now apply Lemma 2 with and

(297)

(298)

Condition (232) of Lemma 2 holds with due to (295).As will be shown in the sequel, the following version of condi-tion (234) holds:

(299)

where , the minimum Euclidean distance approximationof in the set of -types, is formally defined in (225), and

will be chosen later. Applying Lemma 2, we deduce from(233) that

(300)

We conclude that (230) holds if minimization is restricted totypes in .We proceed to show (299). As will be proven later, for appro-

priately chosen and , we can write

(301)

(302)

(303)

(304)

where if , and if .Denote

(305)

(306)

(307)

KOSTINA AND VERDÚ: LOSSY JOINT SOURCE-CHANNEL CODING IN THE FINITE BLOCKLENGTH REGIME 2569

If

(308)

then , and Lemma 4 applies to . So, using (301)and (304), the fact that is monotonically decreasing andLemma 4, we conclude that there exists such that

(309)

(310)

(311)

which is equivalent to (299).It remains to prove (301) and (304). Observing that for

(312)

(313)

and using (291) and (292), we have for all

(314)

where

(315)

Thus, recalling (290) and denoting , we have

(316)

(317)

(318)

where

(319)

So, (301) follows by observing that for any

(320)

and letting in (316)–(318).

To show (304), we apply Lemma 3 with

(321)

(322)

(323)

(324)

(325)

(326)

We proceed to verify that conditions of Lemma 3 are met. Func-tion satisfies condition (242) with defined in (315). Let usnow show that function satisfies condition (241) with

, where and are defined in (284) and (289), respec-

tively. For any , write

(327)

(328)

(329)

where (328) follows from (284), and (329) applies (289). So,Lemma 3 applies to , re-sulting in (304) with

(330)

thereby completing the proof of (300).Combining (287) and (300), we conclude that (230) holds for

all in the interval

(331)

B) : We choose so that (288) is satisfied. Thecase was covered in (287), so we only needto consider the minimization of the left-hand side of (231) over

. Fix . If

(332)

we have

(333)

(334)

(335)

(336)

(337)

2570 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 5, MAY 2013

where1) (334) is by Chebyshev’s inequality;2) (335) uses (289), (291), and ;3) (336) holds because the maximum of its left-hand side isattained at .

APPENDIX CPROOF OF THE CONVERSE PART OF THEOREM 10

Note that for the converse, restriction (iv) can be replaced bythe following weaker one.

(iv′) The random variable has finite absolute thirdmoment.

To verify that (iv) implies (iv′), observe that by the concavity ofthe logarithm

(338)

so(339)

We now proceed to prove the converse by showing first that wecan eliminate all rates exceeding

(340)

for any . More precisely, we show that the excessdistortion probability of any code having such rate convergesto 1 as , and therefore, for any , there is ansuch that for all , no code can exist for ,satisfying (340).We weaken (23) by fixing and choosing a particular

output distribution, namely, .Due to restriction (ii) in Section V, , andthe -tilted information single-letterizes, that is, for a.e.

(341)

Theorem 1 implies that error probability of everycode must be lower bounded by

(342)

(343)

where in (343), we used (340) and . Recalling(16) and

(344)

with equality for -a.e. , we conclude using the law of largenumbers that (343) tends to 1 as .We proceed to show that for all large enough , if there is

a sequence of codes such that

(345)

(346)

then .Note that in general the bound in Theorem 1 with the choice

of as above does not lead to the correct channel disper-sion term. We first consider the general case, in which we applyTheorem 3, and then we show the symmetric case, in which weapply Theorem 2.Recall that has type if the number of times each

letter is encountered in is . In Theorem 3,we weaken the supremum over by letting map to itstype, . Note that the total number of types sat-isfies (e.g., [26]) . We weaken the supremumover in (38) by fixing , where

, i.e., is the output distribution inducedby the type . In this way, Theorem 3 implies that the errorprobability of any code must be lower bounded by

(347)

Choose

(348)

At this point, we consider two cases separately, and.

A) : In order to apply Lemma 1 in Appendix B, weisolate the typical set of source sequences

(349)

Observe that

(350)

(351)

(352)

(353)

KOSTINA AND VERDÚ: LOSSY JOINT SOURCE-CHANNEL CODING IN THE FINITE BLOCKLENGTH REGIME 2571

where1) (352) follows by lower bounding

(354)

(355)

(356)

(357)

where— (354) holds for large enough due to (345) and (346);— (355) holds for large enough by the choice of in (348);— (356) lower bounds using (345);— (357) holds for a small enough .— (353) is by Chebyshev’s inequality.

Now, we let

(358)

where will be chosen in the sequel, and are chosenso that both (345) and the following version of (346) hold

(359)

where is defined in (291). Denote for brevity

(360)

Weakening (347) using (348) and Lemma 1, we can lowerbound by

(361)

(362)

(363)

(364)

(365)

where (361) is by Lemma 1, and (363) is by the union bound.To justify (365), observe that the quantities in Theorem 25 cor-responding to the sum of independent random variables in (364)are

(366)

(367)

(368)

(369)

(370)

where the functions , , , are defined in(225), (267)–(269) in Appendix B. To show (369), recall that

by Property 4 in Appendix B, and use (291) and(320). Further, is bounded uniformly in , so (223) isupper bounded by some constant . Finally, applying(367) and (369) to (359), we conclude that

(371)

which enables us to lower bound the probability in (364) in-voking the Berry–Esseen bound (Theorem 25). In view of (358),the resulting bound is equal to , and the proof of (365) is com-plete.

B) : Fix .If , we choose as in (348), and

(372)

where is the same as in (358), and are chosen so thatthe following version of (346) hold:

(373)

where was defined in Lemma 1. Weakening (347) using(231), we have

(374)

(375)

(376)

(377)

(378)

2572 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 5, MAY 2013

where (375) uses (231) and (373), and (376) is by the Berry–Es-seen bound.If , which implies a.s., we let

(379)

and choose that satisfy

(380)

Then, plugging a.s. in (347), we have

(381)

(382)

(383)

(384)

where (382) is by the choice of in (380), (383) invokes(231), and (384) follows from the choice of in (379).

C) Symmetric Channel: We show that if the channel issuch that the distribution of (according to )does not depend on the choice , Theorem 2 leads to atighter third-order term than (119).If either or , let

(385)

(386)

where can be chosen as in (358), and let be suchthat the following version of (346) (with the remaindersatisfying (119) with ) holds:

(387)

Theorem 2 and Theorem 25 imply that the error probability ofevery code must satisfy for an arbitrary sequence

(388)

(389)

If both and , choose to satisfy

(390)

(391)

Substituting (391) and , a.s.in (388), we conclude that the right-hand side of (388) equals ,so whenever a code exists.

D) Gaussian Channel: In view of Remark 12, it sufficesto consider the equal power constraint (133). The spherically

symmetric , where, satisfies the symmetry assumption of The-

orem 2. In fact, for all , has thesame distribution under as (cf., (168))

(392)where independent of each other. Since

is a sum of i.i.d. random variables, the mean of is equal toand its variance is equal to (118), the result

follows analogously to (385)–(389).

APPENDIX DPROOF OF THE ACHIEVABILITY PART OF THEOREM 10

A) Almost Lossless Coding Over a DMC: Theproof consists of an asymptotic analysis of the bound in The-orem 9 by means of Theorem 25. Weakening (98) by fixing

, we conclude that there existsa code with

(393)where are distributed according to

. The case of equiprobable has beentackled in [8]. Here, we assume that is not a constant,that is, .Let and be such that

(394)where , and is the Berry–Esseen ratio (203)for the sum of independent random variables appearingin the right-hand side of (393). Note that is finite due to:1) ;2) the third absolute moment of is finite;3) the third absolute moment of is finite, as ob-served in Appendix B.

Therefore, (394) can be written as (99) with the remaindertherein satisfying (125). So, it suffices to prove that if sat-isfy (394), then the right-hand side of (393) is upper boundedby . Let

(395)

By the Berry–Esseen bound (Theorem 25)

(396)

KOSTINA AND VERDÚ: LOSSY JOINT SOURCE-CHANNEL CODING IN THE FINITE BLOCKLENGTH REGIME 2573

We now further upper bound (393) as

(397)

(398)

(399)

where we invoked (394) and (395) to upper bound the exponentin the right-hand side of (397).

B) Lossy Coding Over a DMC: The proof consists of theasymptotic analysis of the bound in Theorem 8 using TheoremA1 and Lemma 5 below, which deals with asymptotic behaviorof distortion -balls. Note that Lemma 5 is the only step thatrequires finiteness of the ninth absolute moment of asrequired by restriction (4) in Section V.

Lemma 5 ([9, Lemma 2]): Under restrictions (ii)–(iv),there exist constants such that for all

(400)

where is given by (122).We weaken (108) by fixing

(401)

(402)

(403)

where , so there exists a code with errorprobability upper bounded by

(404)

where are distributed according to. We need to show that for satis-

fying (114), (404) is upper bounded by .We apply Lemma 5 to upper bound (404) as follows:

(405)

with

(406)

We first consider the (nontrivial) case . Letand be such that

(407)

(408)

where constants and are defined in Lemma 5, and is theBerry–Esseen ratio (223) for the sum of independentrandom variables appearing in (405). Note that is finite be-cause:1) either or by the assumption;2) the third absolute moment of is finite by restriction(iv) as spelled out in (339);

3) the third absolute moment of is finite, as ob-served in Appendix B.

Applying a Taylor series expansion to (407) with the choice ofin (403), we conclude that (407) can be written as (114) with

the remainder term satisfying (120).It remains to further upper bound (405) using (407). Let

(409)

By the Berry–Esseen bound (Theorem 25)

(410)

so the expectation in the right-hand side of (405) isupper-bounded as

(411)

(412)

where we used (407) and (409) to upper bound the exponent inthe right-hand side of (411).Putting (405) and (412) together, we conclude that .Finally, consider the case , which implies

and almost surely, and letand be such that

(413)

2574 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 5, MAY 2013

where constants and are defined in Lemma 5. Then

(414)

which, together with (405), implies that , as desired.C) Lossy or Almost Lossless Coding Over a Gaussian

Channel: In view of Remark 12, it suffices to consider the equalpower constraint (133). As shown in the proof of Theorem 18,for any distribution of on the power sphere

(415)

where is defined in (392) (cf., (168)) and is a (computable)constant.Now, the proof for almost lossless coding in Appendix D-A

can be modified to work for the Gaussian channel byadding to the right-hand side of (394) and replacing

in (393) and (397) with , andin (395) with .Similarly, the proof for lossy coding in Appendix D-B is

adapted for the Gaussian channel by adding to the right-hand side of (407) and replacing in (404)and (406) with , and in (409) with .

APPENDIX EPROOF OF THEOREM 22

Applying the Berry–Esseen bound to (190), we obtain

(416)

W(417)

where is the Berry–Esseen ratio, and (417) follows by theapplication of Lemma 3 with

(418)

(419)

(420)

(421)

(422)

Note that the mean and standard deviation of are linearand continuously differentiable in , respectively, so con-ditions (242) and (244) hold with the metric being the usualEuclidean distance between vectors in . So, (417) fol-lows immediately upon observing that by the definition of therate-distortion function,for all such that .

REFERENCES

[1] V. Kostina and S. Verdú, “Lossy joint source-channel coding in thefinite blocklength regime,” in Proc. IEEE Int. Symp. Inf. Theory, Cam-bridge, MA, Jul. 2012, pp. 1553–1557.

[2] V. Kostina and S. Verdú, “To code or not to code: Revisited,” in Proc.IEEE Inf. Theory Workshop, Lausanne, Switzerland, 2012, pp. 5–9.

[3] C. E. Shannon, “A mathematical theory of communication,” Bell Syst.Tech. J., vol. 27, pp. 379–423, Oct. 1948.

[4] C. E. Shannon, “Coding theorems for a discrete source with a fidelitycriterion,” IRE Int. Conv. Rec., vol. 7, pp. 93–126, Mar. 1959.

[5] S. Verdú and I. Kontoyiannis, “Lossless data compression rate: Asymp-totics and non-asymptotics,” in Proc. 46th Annu. Conf. Inf. Sci. Syst.,Princeton, NJ, Mar. 2012, pp. 1–6.

[6] S. Verdú and I. Kontoyiannis, “Lossless data compression at finiteblocklengths,” IEEE Trans. Inf. Theory, 2012, to be published.

[7] M. Gastpar, B. Rimoldi, andM.Vetterli, “To code, or not to code: Lossysource-channel communication revisited,” IEEE Trans. Inf. Theory,vol. 49, no. 5, pp. 1147–1158, May 2003.

[8] Y. Polyanskiy, H. V. Poor, and S. Verdú, “Channel coding rate in fi-nite blocklength regime,” IEEE Trans. Inf. Theory, vol. 56, no. 5, pp.2307–2359, May 2010.

[9] V. Kostina and S. Verdú, “Fixed-length lossy compression in the fi-nite blocklength regime,” IEEE Trans. Inf. Theory, vol. 58, no. 6, pp.3309–3338, Jun. 2012.

[10] I. Csiszár, “Joint source-channel error exponent,” Prob. Control Inf.Theory, vol. 9, no. 5, pp. 315–328, Sep. 1980.

[11] I. Csiszár, “On the error exponent of source-channel transmission witha distortion threshold,” IEEE Trans. Inf. Theory, vol. 28, no. 6, pp.823–828, Nov. 1982.

[12] R. Pilc, “Coding theorems for discrete source-channel pairs,” Ph.D.dissertation, Dept. Elect. Eng., Massachusetts Inst. Technol., Cam-bridge, MA, 1967.

[13] R. Pilc, “The transmission distortion of a discrete source as a functionof the encoding block length,” Bell Syst. Tech. J., vol. 47, no. 6, pp.827–885, Jul.–Aug. 1968.

[14] A. D. Wyner, “Communication of analog data from a Gaussian sourceover a noisy channel,” Bell Syst. Tech. J., vol. 47, no. 5, pp. 801–812,May–Jun. 1968.

[15] A. D. Wyner, “On the transmission of correlated Gaussian data over anoisy channel with finite encoding block length,” Inf. Control, vol. 20,no. 3, pp. 193–215, Apr. 1972.

[16] I. Csiszár, “An abstract source-channel transmission problem,” Probl.Control Inf. Theory, vol. 12, no. 5, pp. 303–307, Sep. 1983.

[17] A. T. Campo, G. Vazquez-Vilar, A. G. Fàbregas, i, and A. Martinez,“Random-coding joint source-channel bounds,” in Proc. IEEE Int.Symp. Inf. Theory, St. Petersburg, Russia, Aug. 2011, pp. 899–902.

[18] D. Wang, A. Ingber, and Y. Kochman, “The dispersion of joint source-channel coding,” in Proc. 49th Annu. Allerton Conf. Commun., ControlComput., Monticello, IL, Sep. 2011, pp. 180–187.

[19] I. Csiszár, “On an extremum problem of information theory,” StudiaScientiarumMathematicarum Hungarica, vol. 9, no. 1, pp. 57–71, Jan.1974.

[20] H. V. Poor, An Introduction to Signal Detection and Estimation. NewYork, NY, USA: Springer-Verlag, 1994.

[21] S. Vembu, S. Verdú, and Y. Steinberg, “The source-channel separationtheorem revisited,” IEEE Trans. Inf. Theory, vol. 41, no. 1, pp. 44–54,Jan. 1995.

[22] M. Tomamichel and V. Y. F. Tan, A Tight Upper Bound for the Third-Order Asymptotics of Discrete Memoryless Channels, ArXiv eprintarXiv:1212.3689 2012.

[23] D. Sakrison, “A geometric treatment of the source encoding of aGaussian random variable,” IEEE Trans. Inf. Theory, vol. 14, no. 3,pp. 481–486, May 1968.

[24] Y. Polyanskiy, “Channel coding: Non-asymptotic fundamental limits,”Ph.D. dissertation, Princeton Univ., Princeton, NJ, USA, 2010.

[25] F. Jelinek, Probabilistic Information Theory: Discrete andMemorylessModels. New York, NY, USA: McGraw-Hill, 1968.

[26] I. Csiszár and J. Körner, Information Theory: Coding Theorems forDiscrete Memoryless Systems, 2nd ed. Cambridge, U.K.: CambridgeUniv. Press, 2011.

[27] J. T. Goblick, “Theoretical limitations on the transmission of data fromanalog sources,” IEEE Trans. Inf. Theory, vol. 11, no. 4, pp. 558–567,Oct. 1965.

KOSTINA AND VERDÚ: LOSSY JOINT SOURCE-CHANNEL CODING IN THE FINITE BLOCKLENGTH REGIME 2575

[28] R. Gallager, Information Theory and Reliable Communication. NewYork, NY, USA: Wiley, 1968.

[29] P. Harremoës and N. Tishby, “The information bottleneck revisited orhow to choose a good distortion measure,” in Proc. IEEE Int. Symp.Inf. Theory, Nice, France, Jun. 2007, pp. 566–570.

[30] T. Courtade and R. Wesel, “Multiterminal source coding with an en-tropy-based distortion measure,” in Proc. IEEE Int. Symp. Inf. Theory,St. Petersburg, Russia, Aug. 2011, pp. 2040–2044.

[31] I. Kontoyiannis, “Pointwise redundancy in lossy data compression anduniversal lossy data compression,” IEEE Trans. Inf. Theory, vol. 46,no. 1, pp. 136–152, Jan. 2000.

[32] W. Feller, An Introduction to Probability Theory and Its Applications,2nd ed. New York, NY, USA: Wiley, 1971, vol. II.

Victoria Kostina (S’12) received the Bachelor’s degree with honors in appliedmathematics and physics from theMoscow Institute of Physics and Technology,Russia, in 2004, where she was affiliated with the Institute for InformationTransmission Problems of the Russian Academy of Sciences, and the Master’sdegree in electrical engineering from the University of Ottawa, Canada, in 2006.She is currently pursuing a Ph.D. degree in electrical engineering at Princeton

University. Her research interests lie in information theory, theory of randomprocesses, coding, and wireless communications.

Sergio Verdú (S’80–M’84–SM’88–F’93) received the TelecommunicationsEngineering degree from the Universitat Politècnica de Barcelona in 1980, andthe Ph.D. degree in Electrical Engineering from the University of Illinois atUrbana-Champaign in 1984. Since 1984 he has been a member of the facultyof Princeton University, where he is the Eugene Higgins Professor of ElectricalEngineering, and is a member of the Program in Applied and ComputationalMathematics.Sergio Verdú is the recipient of the 2007 Claude E. Shannon Award, and

the 2008 IEEE Richard W. Hamming Medal. He is a member of the NationalAcademy of Engineering, and was awarded a Doctorate Honoris Causa fromthe Universitat Politècnica de Catalunya in 2005.He is a recipient of several paper awards from the IEEE: the 1992 Donald

Fink Paper Award, the 1998 and 2012 Information Theory Paper Awards, anInformation Theory Golden Jubilee Paper Award, the 2002 Leonard AbrahamPrize Award, the 2006 Joint Communications/Information Theory Paper Award,and the 2009 Stephen O. Rice Prize from IEEE Communications Society. In1998, Cambridge University Press published his bookMultiuser Detection, forwhich he received the 2000 Frederick E. Terman Award from the AmericanSociety for Engineering Education.Sergio Verdú served as President of the IEEE Information Theory Society

in 1997, and on its Board of Governors (1988–1999, 2009-present). Hehas also served in various editorial capacities for the IEEE TRANSACTIONSON INFORMATION THEORY: Associate Editor (Shannon Theory, 1990–1993;Book Reviews, 2002–2006), Guest Editor of the Special Fiftieth AnniversaryCommemorative Issue (published by IEEE Press as “Information Theory:Fifty years of discovery”), and member of the Executive Editorial Board(2010–2013). He is the founding Editor-in-Chief of Foundations and Trends inCommunications and Information Theory.