standard handbook-of-electronic-engineering

1946

Upload: mike2323

Post on 10-May-2015

2.944 views

Category:

Documents


3 download

TRANSCRIPT

  • 1.Christiansen_Sec_01.qxd 10/27/0410:19 AM Page 1.1Source: STANDARD HANDBOOK OF ELECTRONIC ENGINEERING P GAGRGT 1PRINCIPLESAND TECHNIQUESSection 1. Information, Communication, Noise, and Interference 1.3Section 2. Systems Engineering and Systems Management2.1Section 3. Reliability 3.1Section 4. Computer-Assisted Digital System Design 4.1On the CD-ROMBasic PhenomenaMathematics, Formulas, Definitions, and TheoremsCircuit Principles Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com) Copyright 2004 The McGraw-Hill Companies. All rights reserved.Any use is subject to the Terms of Use as given at the website.

2. Christiansen_Sec_01.qxd 10/27/04 10:19 AM Page 1.2PRINCIPLES AND TECHNIQUESDownloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)Copyright 2004 The McGraw-Hill Companies. All rights reserved. Any use is subject to the Terms of Use as given at the website. 3. Christiansen_Sec_01.qxd 10/27/04 10:19 AM Page 3 Source: STANDARD HANDBOOK OF ELECTRONIC ENGINEERING SECTION 1 INFORMATION, COMMUNICATION, NOISE, AND INTERFERENCEThe telephone profoundly changed our methods of communication, thanks to Alexander Graham Bell andother pioneers (Bell, incidentally, declined to have a telephone in his home!). Communication has been atthe heart of the information age. Electronic communication deals with transmitters and receivers ofelectromagnetic waves. Even digital communications systems rely on this phenomenon. This section of thehandbook covers information sources, codes and coding, communication channels, error correction,continuous and band-limited channels, digital data transmission and pulse modulation, and noise andinterference. C.A.In This Section:CHAPTER 1.1 COMMUNICATION SYSTEMS 1.7CONCEPTS1.7SELF-INFORMATION AND ENTROPY1.7ENTROPY OF DISCRETE RANDOM VARIABLES1.8MUTUAL INFORMATION AND JOINT ENTROPY1.9CHAPTER 1.2 INFORMATION SOURCES, CODES, AND CHANNELS 1.12MESSAGE SOURCES1.12MARKOV INFORMATION SOURCE1.13NOISELESS CODING 1.14NOISELESS-CODING THEOREM 1.15CONSTRUCTION OF NOISELESS CODES1.16CHANNEL CAPACITY 1.17DECISION SCHEMES 1.19THE NOISY-CODING THEOREM 1.20ERROR-CORRECTING CODES 1.21PARITY-CHECK CODES 1.23OTHER ERROR-DETECTING AND ERROR-CORRECTING CODES 1.25CONTINUOUS-AMPLITUDE CHANNELS1.25MAXIMIZATION OF ENTROPY OF CONTINUOUS DISTRIBUTIONS1.26GAUSSIAN SIGNALS AND CHANNELS1.27BAND-LIMITED TRANSMISSION AND THE SAMPLING THEOREM 1.29Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)Copyright 2004 The McGraw-Hill Companies. All rights reserved. Any use is subject to the Terms of Use as given at the website. 4. Christiansen_Sec_01.qxd 10/27/04 10:19 AM Page 1.4INFORMATION, COMMUNICATION, NOISE, AND INTERFERENCE CHAPTER 1.3 MODULATION1.32 MODULATION THEORY 1.32 ELEMENTS OF SIGNAL THEORY 1.33 DURATION AND BANDWIDTHUNCERTAINTY RELATIONSHIPS1.36 CONTINUOUS MODULATION 1.37 LINEAR, OR AMPLITUDE, MODULATION1.38 DOUBLE-SIDEBAND AMPLITUDE MODULATION (DSBAM)1.39 DOUBLE-SIDEBAND AMPLITUDE MODULATION, SUPPRESSED CARRIER1.40 VESTIGIAL-SIDEBAND AMPLITUDE MODULATION (VSBAM) 1.41 SINGLE-SIDEBAND AMPLITUDE MODULATION (SSBAM)1.41 BANDWIDTH AND POWER RELATIONSHIPS FOR AM1.41 ANGLE (FREQUENCY AND PHASE) MODULATION1.42 CHAPTER 1.4 DIGITAL DATA TRANSMISSION AND PULSE MODULATION1.44 DIGITAL TRANSMISSION1.44 PULSE-AMPLITUDE MODULATION (PAM)1.44 QUANTIZING AND QUANTIZING ERROR 1.45 SIGNAL ENCODING 1.46 BASEBAND DIGITAL-DATA TRANSMISSIONS 1.48 PULSE-CODE MODULATION (PCM) 1.50 SPREAD-SPECTRUM SYSTEMS 1.51 CHAPTER 1.5 NOISE AND INTERFERENCE1.52 GENERAL 1.52 RANDOM PROCESSES1.52 CLASSIFICATION OF RANDOM PROCESSES1.54 ARTIFICIAL NOISE1.55 Section Bibliography: Of Historical Significance Davenport, W. B., Jr., and W. L. Root, An Introduction to the Theory of Random Signals and Noise, McGraw-Hill,1958. (Reprint edition published by IEEE Press, 1987.) Middleton, D., Introduction to Statistical Communication Theory, McGraw-Hill, 1960. (Reprint edition published by IEEEPress, 1996.) Sloane, N. J. A., and A. D. Wyner (eds.), Claude Elwood Shannon: Collected Papers, IEEE Press, 1993. General Carlson, A. B., et al., Communications Systems, 4th ed., McGraw-Hill, 2001. Gibson, J. D., Principles of Digital and Analog Communications, 2nd ed., Macmillan, 1993. Haykin, S., Communication Systems, 4th ed., Wiley, 2000. Papoulis, A., and S. U. Pillai, Probability, Random Variables, and Stochastic Processes, 4th ed., McGraw-Hill, 2002. Thomas, J. B., An Introduction to Communication Theory and Systems, Springer-Verlag, 1987. Ziemer, R. E., and W. H. Tranter, Principles of Communications: Systems, Modulation, and Noise, 5th ed., Wiley, 2001. Information Theory Blahut, R. E., Principles and Practice of Information Theory, Addison-Wesley, 1987. Cover, T. M., and J. A. Thomas, Elements of Information Theory, Wiley, 1991. Gallagher, R., Information Theory and Reliable Communication, Wiley, 1968. 1.4Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)Copyright 2004 The McGraw-Hill Companies. All rights reserved. Any use is subject to the Terms of Use as given at the website. 5. Christiansen_Sec_01.qxd10/27/04 10:19 AM Page 1.5INFORMATION, COMMUNICATION, NOISE, AND INTERFERENCE Coding Theory Blahut, R. E., Theory and Practice of Error Control Codes, Addison-Wesley, 1983. Clark, G. C., Jr., and J. B. Cain, Error-correction Coding for Digital Communications, Plenum Press, 1981. Lin, S., and D. J. Costello, Error Control Coding, Prentice-Hall, 1983. Digital Data Transmission Barry, J. R., D. G. Messerschmitt, and E. A. Lee, Digital Communications, 3rd ed., Kluwer, 2003. Proakis, J. G., Digital Communications, 4th ed., McGraw-Hill, 2000.1.5Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)Copyright 2004 The McGraw-Hill Companies. All rights reserved. Any use is subject to the Terms of Use as given at the website. 6. Christiansen_Sec_01.qxd 10/27/04 10:19 AM Page 1.6 INFORMATION, COMMUNICATION, NOISE, AND INTERFERENCEDownloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)Copyright 2004 The McGraw-Hill Companies. All rights reserved. Any use is subject to the Terms of Use as given at the website. 7. Christiansen_Sec_01.qxd10/27/04 10:19 AMPage 1.7Source: STANDARD HANDBOOK OF ELECTRONIC ENGINEERING CHAPTER 1.1 COMMUNICATION SYSTEMS Geoffrey C. Orsak, H. Vincent Poor, John B. ThomasCONCEPTS The principal problem in most communication systems is the transmission of information in the form of messages or data from an originating information source S to a destination or receiver D. The method of transmission is frequently by means of electric signals under the control of the sender. These signals are transmitted via a channel C, as shown in Fig. 1.1.1. The set of messages sent by the source will be denot- ed by {U}. If the channel were such that each member of U were received exactly, there would be no com- munication problem. However, because of channel limitations and noise, a corrupted version {U*} of {U} is received at the information destination. It is generally desired that the distorting effects of channel imperfections and noise be minimized and that the number of messages sent over the channel in a given time be maximized. These two requirements are interacting, since, in general, increasing the rate of message transmission increases the distortion or error. However, some forms of message are better suited for transmission over a given channel than others, in that they can be transmitted faster or with less error. Thus it may be desirable to modify the message set {U} by a suitable encoder E to produce a new message set {A} more suitable for a given channel. Then a decoder E 1 will be required at the destination to recover {U*} from the distorted set {A*}. A typical block diagram of the resulting system is shown in Fig. 1.1.2.SELF-INFORMATION AND ENTROPY Information theory is concerned with the quantification of the communications process. It is based on probabilistic modeling of the objects involved. In the model communication system given in Fig. 1.1.1, we assume that each member of the message set {U} is expressible by means of some combination of a finite set of symbols called an alphabet. Let this source alphabet be denoted by the set {X} with elements x1, x2, . . . , xM, where M is the size of the alphabet. The notation p(xi), i = 1, 2, . . . , M, will be used for the probabil- ity of occurrence of the ith symbol xi. In general the set of numbers {p(xi)} can be assigned arbitrarily pro- vided thatp(xi) 0 i = 1, 2, . . . , M(1) andM p( xi ) = 1(2)=1i1.7Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)Copyright 2004 The McGraw-Hill Companies. All rights reserved. Any use is subject to the Terms of Use as given at the website. 8. Christiansen_Sec_01.qxd 10/27/0410:19 AMPage 1.8 COMMUNICATION SYSTEMS 1.8 INFORMATION, COMMUNICATION, NOISE, AND INTERFERENCEFIGURE 1.1.1 Basic communication system.A measure of the amount of information contained in the ith symbol xi can be defined based solely on the probability p(xi). In particular, the self-information I(xi) of the ith symbol xi is defined as I(xi) = log 1/p(xi) = log p(xi) (3) This quantity is a decreasing function of p(xi) with the endpoint values of infinity for the impossible event and zero for the certain event.It follows directly from Eq. (3) that I(xi) is a discrete random variable, i.e., a real-valued function defined on the elements xi of a probability space. Of the various statistical properties of this random variable I(xi), the most important is the expected value, or mean, given by MM E{I ( xi )} = H ( X ) = p( xi ) I ( xi ) = p( xi ) log p( xi ) (4) i =1i =1 This quantity H(X) is called the entropy of the distribution p(xi). If p(xi) is interpreted as the probability of the ith state of a system in phase space, then this expression is identical to the entropy of statistical mechanics and thermodynamics. Furthermore, the relationship is more than a mathematical similarity. In statistical mechan- ics, entropy is a measure of the disorder of a system; in information theory, it is a measure of the uncertainty associated with a message source. In the definitions of self-information and entropy, the choice of the base for the logarithm is arbitrary, but of course each choice results in a different system of units for the information measures. The most common bases used are base 2, base e (the natural logarithm), and base 10. When base 2 is used, the unit of I() is called the binary digit or bit, which is a very familiar unit of information content. When base e is used, the unit is the nat; this base is often used because of its convenient analytical properties in integration, differen- tiation, and the like. The base 10 is encountered only rarely; the unit is the Hartley. ENTROPY OF DISCRETE RANDOM VARIABLES The more elementary properties of the entropy of a discrete random variable can be illustrated with a simple example. Consider the binary case, where M = 2, so that the alphabet consists of the symbols 0 and 1 with probabilities p and 1 p, respectively. It follows from Eq. (4) thatH1(X) = [p log2 p + (1 p) log2 (1 p)] (bits) (5) FIGURE 1.1.2 Communication system with encoding and decoding.Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)Copyright 2004 The McGraw-Hill Companies. All rights reserved. Any use is subject to the Terms of Use as given at the website. 9. Christiansen_Sec_01.qxd10/27/0410:19 AMPage 1.9 COMMUNICATION SYSTEMSCOMMUNICATION SYSTEMS 1.9Equation (5) can be plotted as a function of p, as shown inFig. 1.1.3, and has the following interesting properties:1. H1(X) 0.2. H1(X) is zero only for p = 0 and p = 1.3. H1(X) is a maximum at p = 1 p = 1/2.More generally, it can be shown that the entropy H(X) has thefollowing properties for the general case of an alphabet ofsize M:1. H(X) 0. (6)2. H(X) = 0 if and only if all of the probabilities are zero FIGURE 1.1.3 Entropy in the binary case.except for one, which must be unity.(7)3. H(X) logb M.(8) 4. H(X) = logb M if and only if all the probabilities are equal so that p(xi) = 1/M for all i.(9)MUTUAL INFORMATION AND JOINT ENTROPY The usual communication problem concerns the transfer of information from a source S through a channel C to a destination D, as shown in Fig. 1.1.1. The source has available for forming messages an alphabet X of size M. A particular symbol x1 is selected from the M possible symbols and is sent over the channel C. It is the lim- itations of the channel that produce the need for a study of information theory. The information destination has available an alphabet Y of size N. For each symbol xi sent from the source, a symbol yj is selected at the destination. Two probabilities serve to describe the state of knowledge at the destination. Prior to the reception of a communication, the state of knowledge of the destination about the sym- bol xj is the a priori probability p(xi) that xi would be selected for transmission. After reception and selection of the symbol yj, the state of knowledge concerning xi is the conditional probability p(xi yj), which will be called the a posteriori probability of xi. It is the probability that xi was sent given that yj was received. Ideally this a posteriori probability for each given yj should be unity for one xi and zero for all other xi. In this case an observer at the destination is able to determine exactly which symbol xi has been sent after the reception of each symbol yj. Thus the uncertainty that existed previously and which was expressed by the a priori proba- bility distribution of xi has been removed completely by reception. In the general case it is not possible to remove all the uncertainty, and the best that can be hoped for is that it has been decreased. Thus the a posteri- ori probability p(xi yj) is distributed over a number of xi but should be different from p(xi). If the two proba- bilities are the same, then no uncertainty has been removed by transmission or no information has been transferred. Based on this discussion and on other considerations that will become clearer later, the quantity I(xi; yj) is defined as the information gained about xi by the reception of yj, whereI(xi; yj) = logb [p(xi yj)/p(xi)] (10) This measure has a number of reasonable and desirable properties.Property 1.The information measure I(xi; yj) is symmetric in xi and yj; that is, I(xi; yj) = I(yj; xi) (11)Property 2. The mutual information I(xi; yj) is a maximum when p(xi yj) = 1, that is, when the receptionof yj completely removes the uncertainty concerning xi: I(xi; yj) log p(xi) = (xi)(12)Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)Copyright 2004 The McGraw-Hill Companies. All rights reserved. Any use is subject to the Terms of Use as given at the website. 10. Christiansen_Sec_01.qxd10/27/0410:19 AMPage 1.10 COMMUNICATION SYSTEMS 1.10INFORMATION, COMMUNICATION, NOISE, AND INTERFERENCE Property 3. If two communications yj and zk concerning the same message xi are received successively, and if the observer at the destination takes the a posteriori probability of the first as the a priori probability of the second, then the total information gained about xi is the sum of the gains from both communications:I(xi; yj, zk) = I(xi; yj) + I(xi; zk yj) (13) Property 4. If two communications yj and yk concerning two independent messages xi and xm are received, the total information gain is the sum of the two information gains considered separately:I(xi, xm; yj, yk) = I(xi; yj) + I(xm; yk) (14)These four properties of mutual information are intuitively satisfying and desirable. Moreover, if one beginsby requiring these properties, it is easily shown that the logarithmic definition of Eq. (10) is the simplest formthat can be obtained.The definition of mutual information given by Eq. (10) suffers from one major disadvantage. When errorsare present, an observer will not be able to calculate the information gain even after the reception of all thesymbols relating to a given source symbol, since the same series of received symbols may represent severaldifferent source symbols. Thus, the observer is unable to say which source symbol has been sent and at bestcan only compute the information gain with respect to each possible source symbol. In many cases it wouldbe more desirable to have a quantity that is independent of the particular symbols. A number of quantities ofthis nature will be obtained in the remainder of this section.The mutual information I(xi; yj) is a random variable just as was the self-information I(xi); however, twoprobability spaces X and Y are involved now, and several ensemble averages are possible. The average mutualinformation I(X; Y ) is defined as a statistical average of I(xi; yj) with respect to the joint probability p(xi; yj);that is, I ( X ; Y ) = E XY {I ( xi ; y j )} = p( xi , y j ) log[ p( xi y j ) /p( xi )](15) i jThis new function I(X; Y ) is the first information measure defined that does not depend on the individual sym-bols xi or yj.. Thus, it is a property of the whole communication system and will turn out to be only the first ina series of similar quantities used as a basis for the characterization of communication systems. This quantityI(X; Y ) has a number of useful properties. It is nonnegative; it is zero if and only if the ensembles X and Y arestatistically independent; and it is symmetric in X and Y so that I(X; Y) = I(Y; X).A source entropy H(X) was given by Eq. (4). It is obvious that a similar quantity, the destination entropyH(Y), can be defined analogously byNH (Y ) = p( y j ) log p( y j )(16) j =1This quantity will, of course, have all the properties developed for H(X). In the same way the joint or systementropy H(X, Y ) can be defined by MN H ( X , Y ) = p( xi , y j ) log p( xi , y j )(17) i =1 j =1If X and Y are statistically independent so that p(xi, yj) = p(xi)p( yj) for all i and j, then Eq. (17) can be written asH(X, Y ) = H(X ) + H(Y )(18)On the other hand, if X and Y are not independent, Eq. (17) becomes H(X, Y ) = H(X ) + H(Y X ) = H(Y ) + H(XY) (19) Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com) Copyright 2004 The McGraw-Hill Companies. All rights reserved.Any use is subject to the Terms of Use as given at the website. 11. Christiansen_Sec_01.qxd 10/27/0410:19 AMPage 1.11 COMMUNICATION SYSTEMSCOMMUNICATION SYSTEMS 1.11 where H(Y X ) and H(X Y ) are conditional entropies given byMNH (Y X ) = p( xi , y j ) log p( y j xi )(20)i =1 j =1 and byMNH ( X Y ) = p( xi , y j ) log p( xi y j ) (21) i =1 j =1These conditional entropies each satisfies an important inequality0 H(Y H) H(Y ) (22) and0 H(X Y) H(X)(23) It follows from these last two expressions that Eq. (15) can be expanded to yieldI(X; Y) = H(X, Y) + H(X) + H(Y) 0(24) This equation can be rewritten in the two equivalent formsI(X; Y ) = H(Y) H(Y X) 0 (25) orI(X Y) = H(X ) H(X Y) 0 (26) It is also clear, say from Eq. (24), that H(X, Y) satisfies the inequality H(X, Y) H(X ) + H(Y) (27) Thus, the joint entropy of two ensembles X and Y is a maximum when the ensembles are independent.At this point it may be appropriate to comment on the meaning of the two conditional entropies H(Y X) and H(X Y). Let us refer first to Eq. (26). This equation expresses the fact that the average information gained about a message, when a communication is completed, is equal to the average source information less the average uncertainty that still remains about the message. From another point of view, the quantity H(X Y) is the aver- age additional information needed at the destination after reception to completely specify the message sent. Thus, H(X Y) represents the information lost in the channel. It is frequently called the equivocation. Let us now consider Eq. (25). This equation indicates that the information transmitted consists of the difference between the destination entropy and that part of the destination entropy that is not information about the source; thus the term H(Y X) can be considered a noise entropy added in the channel.Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)Copyright 2004 The McGraw-Hill Companies. All rights reserved. Any use is subject to the Terms of Use as given at the website. 12. Christiansen_Sec_01.qxd 10/27/04 10:19 AMPage 1.12Source: STANDARD HANDBOOK OF ELECTRONIC ENGINEERINGCHAPTER 1.2INFORMATION SOURCES, CODES,AND CHANNELSGeoffrey C. Orsak, H. Vincent Poor, John B. Thomas MESSAGE SOURCESAs shown in Fig. 1.1.1, an information source can be considered as emitting a given message ui from the set{U} of possible messages. In general, each message ui will be represented by a sequence of symbols xj fromthe source alphabet {X}, since the number of possible messages will usually exceed the size M of the sourcealphabet. Thus sequences of symbols replace the original messages ui, which need not be considered further.When the source alphabet {X} is of finite size M, the source will be called a finite discrete source. The prob-lems of concern now are the interrelationships existing between symbols in the generated sequences and theclassification of sources according to these interrelationships.A random or stochastic process xi, t T, can be defined as an indexed set of random variables where T is theparameter set of the process. If the set T is a sequence, then xt is a stochastic process with discrete parameter(also called a random sequence or series). One way to look at the output of a finite discrete source is that it is adiscrete-parameter stochastic process with each possible given sequence one of the ensemble members or real-izations of the process. Thus the study of information sources can be reduced to a study of random processes.The simplest case to consider is the memoryless source, where the successive symbols obey the same fixedprobability law so that the one distribution p(xi) determines the appearance of each indexed symbol. Such asource is called stationary. Let us consider sequences of length n, each member of the sequence being a real-ization of the random variable xi with fixed probability distribution p(xi). Since there are M possible realiza-tions of the random variable and n terms in the sequence, there must be Mn distinct sequences possible oflength n. Let the random variable Xi in the jth position be denoted by Xij so that the sequence set (the messageset) can be represented by{U} = Xn = {Xi1, Xi2, . . . , Xin} i = 1, 2, . . . , M(1)The symbol Xn is sometimes used to represent this sequence set and is called the nth extension of the memo-ryless source X. The probability of occurrence of a given message ui is just the product of the probabilities ofoccurrence of the individual terms in the sequence so thatp{ui} = p(xi1)p(xi2) . . . p{xin} (2)Now the entropy for the extended source Xn isH ( X n ) = p{ui }log p{ui } = nH ( X ) (3) xn 1.12 Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com) Copyright 2004 The McGraw-Hill Companies. All rights reserved.Any use is subject to the Terms of Use as given at the website. 13. Christiansen_Sec_01.qxd10/27/04 10:19 AMPage 1.13 INFORMATION SOURCES, CODES, AND CHANNELSINFORMATION SOURCES, CODES, AND CHANNELS1.13 as expected. Note that, if base 2 logarithms are used, then H(X) has units of bits per symbol, n is symbols per sequence, and H(Xn) is in units of bits per sequence. For a memoryless source, all sequence averages of informa- tion measures are obtained by multiplying the corresponding symbol by the number of symbols in the sequence.MARKOV INFORMATION SOURCE The memoryless source is not a general enough model in most cases. A constructive way to generalize this model is to assume that the occurrences of a given symbol depends on some number m of immediately pre- ceeding symbols. Thus the information source can be considered to produce an mth-order Markov chain and is called an mth-order Markov source. For an mth-order Markov source, the m symbols preceding a given symbol position are called the state sj of the source at that symbol position. If there are M possible symbols xi, then the mth-order Markov source will have Mm = q possible states sj making up the state setS = {s1, s2, , sq}q = Mm (4) At a given time corresponding to one symbol position the source will be in a given state sj. There will exist a probability p(sk sj) = pjk that the source will move into another state sk with the emission of the next symbol. The set of all such conditional probabilities is expressed by the transition matrix T, where p11 p12 ...p1q p21 p22 ...p2 q T = [ p jk ] = (5) ... ... ...... pq1 pq 2 ...pqq A Markov matrix or stochastic matrix is any square matrix with nonnegative elements such that the row sums are unity. It is clear that T is such a matrix sinceq q pij = p(s j si ) = 1i = 1, 2, . . . , q(6) j =1 j =1 Conversely, any stochastic matrix is a possible transition matrix for a Markov source of order m, where q = Mm is equal to the number of rows or columns of the matrix. A Markov chain is completely specified by its transition matrix T and by an initial distribution vector p giving the probability distribution for the first state occurring. For the memoryless source, the transition matrix reduces to a stochastic matrix where all the rows are identical and are each equal to the initial distribution vec- tor p, which is in turn equal to the vector giving the source alphabet a priori probabilities. Thus, in this case, we havep jk = p(sk s j ) = p(sk ) = p( x k ) k = 1, 2, . . . , M (7) For each state si of the source an entropy H(si) can be defined by q M H (si ) = p(s j si ) log p(s j si ) = p( x k si ) log p( x k si ) (8)j =1k =1 The source entropy H(S) in information units per symbol is the expected value of H(si); that is,qq q M H (S ) = p(si ) p(s j si ) log p(s j si ) = p(si ) p( x k si ) log p( x k si )(9) i =1 j =1i =1 k =1Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)Copyright 2004 The McGraw-Hill Companies. All rights reserved. Any use is subject to the Terms of Use as given at the website. 14. Christiansen_Sec_01.qxd 10/27/0410:19 AM Page 1.14 INFORMATION SOURCES, CODES, AND CHANNELS 1.14INFORMATION, COMMUNICATION, NOISE, AND INTERFERENCEwhere p(si) = pi is the stationary state probability and is the ith element of the vector P defined by P = [p1 p2 pq](10)It is easy to show, as in Eq. (8), that the source entropy cannot exceed log M, where M is the size of thesource alphabet {X}. For a given source, the ratio of the actual entropy H(S) to the maximum value it can havewith the same alphabet is called the relative entropy of the source. The redundancy h of the source is definedas the positive difference between unity and this relative entropy: H (S ) =1 (11)log MThe quantity log M is sometimes called the capacity of the alphabet. NOISELESS CODINGThe preceding discussion has emphasized the information source and its properties. We now begin to considerthe properties of the communication channel of Fig. 1.1.1. In general, an arbitrary channel will not accept andtransmit the sequence of xis emitted from an arbitrary source. Instead the channel will accept a sequence ofsome other elements ai chosen from a code alphabet A of size D, whereA = {a1, a2, . . . , aD} (12)with D generally smaller than M. The elements ai of the code alphabet are frequently called code elements orcode characters, while a given sequence of ais may be called a code word.The situation is now describable in terms of Fig. 1.1.2, where an encoder E has been added between thesource and channel. The process of coding, or encoding, the source consists of associating with each sourcesymbol xi a given code word, which is just a given sequence of ais. Thus the source emits a sequence of aischosen from the source alphabet A, and the encoder emits a sequence of ais chosen from the code alphabet A.It will be assumed in all subsequent discussions that the code words are distinct, i.e., that each code word cor-responds to only one source symbol.Even though each code word is required to be distinct, sequences of code words may not have this property.An example is code A of Table 1.2.1, where a source of size 4 has been encoded in binary code with charac-ters 0 and 1. In code A the code words are distinct, but sequences of code words are not. It is clear that such acode is not uniquely decipherable. On the other hand, a given sequence of code words taken from code B willcorrespond to a distinct sequence of source symbols. An examination of code B shows that in no case is a codeword formed by adding characters to another word. In other words, no code word is a prefix of another. It isclear that this is a sufficient (but not necessary) condition for a code to be uniquely decipherable. That it is notnecessary can be seen from an examination of codes C and D of Table 1.2.1. These codes are uniquely deci-pherable even though many of the code words are prefixes of other words. In these cases any sequence of codewords can be decoded by subdividing the sequence of 0s and 1s to the left of every 0 for code C and to theright of every 0 for code D. The character 0 is the first (or last) character of every code word and acts as acomma; therefore this type of code is called a comma code. TABLE 1.2.1 Four Binary Coding Schemes Source symbol Code A Code BCode C Code D x100 00 x21 1001 10 x3 00110 011110 x4 111110111 1110 Note: Code A is not uniquely decipherable; codes B, C, and D are uniquely decipherable; codes B and D are instantaneous codes; and codes C and D are comma codes. Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com) Copyright 2004 The McGraw-Hill Companies. All rights reserved.Any use is subject to the Terms of Use as given at the website. 15. Christiansen_Sec_01.qxd10/27/04 10:19 AMPage 1.15 INFORMATION SOURCES, CODES, AND CHANNELS INFORMATION SOURCES, CODES, AND CHANNELS 1.15In general the channel will require a finite amount of time to transmit each code character. The code words should be as short as possible in order to maximize information transfer per unit time. The average length L of a code is given byM L = ni p( xi ) (13)i =1 where ni is the length (number of code characters) of the code word for the source symbol xi and p(xi) is the probability of occurrence of xi. Although the average code length cannot be computed unless the set {p(xi)} is given, it is obvious that codes C and D of Table 1.2.1 will have a greater average length than code B unless p(x4) = 0. Comma codes are not optimal with respect to minimum average length.Let us encode the sequence x3x1x3x2 into codes B, C, and D of Table 1.2.1 as shown below: Code B: 110011010 Code C: 011001101 Code D: 110011010 Codes B and D are fundamentally different from code C in that codes B and D can be decoded word by word without examining subsequent code characters while code C cannot be so treated. Codes B and D are called instantaneous codes while code C is noninstantaneous. The instantaneous codes have the property (previously maintained) that no code word is a prefix of another code word. The aim of noiseless coding is to produce codes with the two properties of (1) unique decipherability and (2) minimum average length L for a given source S with alphabet X and probability set {p(xi)}. Codes which have both these properties will be called optimal. It can be shown that if, for a given source S, a code is opti- mal among instantaneous codes, then it is optimal among all uniquely decipherable codes. Thus it is sufficient to consider instantaneous codes. A necessary property of optimal codes is that source symbols with higher probabilities have shorter code words; i.e.,p( xi ) > p( x j ) ni n j (14)The encoding procedure consists of the assignment of a code word to each of the M source symbols. The code word for the source symbol xi will be of length ni; that is, it will consist of ni code elements chosen from the code alphabet of size D. It can be shown that a necessary and sufficient condition for the construction of a uniquely decipherable code is the Kraft inequality M D ni 1 (15) i =1NOISELESS-CODING THEOREM It follows from Eq. (15) that the average code length L, given by Eq. (13), satisfies the inequalityL H(X)/log D(16) Equality (and minimum code length) occurs if and only if the source-symbol probabilities obeyp(xi) = Dni i = 1, 2, . . . , M(17) A code where this equality applies is called absolutely optimal. Since an integer number of code elements must be used for each code word, the equality in Eq. (16) does not usually hold; however, by using one more code element, the average code length L can be bounded from above to giveH(X)/log D L H(X)/log D + 1 (18) This last relationship is frequently called the noiseless-coding theorem.Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)Copyright 2004 The McGraw-Hill Companies. All rights reserved. Any use is subject to the Terms of Use as given at the website. 16. Christiansen_Sec_01.qxd 10/27/04 10:19 AMPage 1.16INFORMATION SOURCES, CODES, AND CHANNELS 1.16INFORMATION, COMMUNICATION, NOISE, AND INTERFERENCE CONSTRUCTION OF NOISELESS CODESThe easiest case to consider occurs when an absolutely optimal code exists; i.e., when the source-symbol prob-abilities satisfy Eq. (17). Note that code B of Table 1.2.1 is absolutely optimal if p(x1) = 1/2, p(x2) = 1/4, andp(x3) = p(x4) = 1/8. In such cases, a procedure for realizing the code for arbitrary code-alphabet size (D 2) iseasily constructed as follows:1. Arrange the M source symbols in order of decreasing probability.2. Arrange the D code elements in an arbitrary but fixed order, i.e., a1, a2, . . . , aD.3. Divide the set of symbols xi into D groups with equal probabilities of 1/D each. This division is always pos- sible if Eq. (17) is satisfied.4. Assign the element a1 as the first digit for symbols in the first group, a2 for the second, and ai for the ith group.5. After the first division each of the resulting groups contains a number of symbols equal to D raised to some integral power if Eq. (17) is satisfied.Thus, a typical group, say group i, contains Dki symbols, where ki is an integer (which may be zero). Thisgroup of symbols can be further subdivided ki times into D parts of equal probabilities. Each division decidesone additional code digit in the sequence. A typical symbol xi is isolated after q divisions. If it belongs to thei1 group after the first division, the i2 group after the second division, and so forth, then the code word for xiwill be ai1 ai2 . . . aiq.An illustration of the construction of an absolutely optimal code for the case where D = 3 is given in Table 1.2.2.This procedure ensures that source symbols with high probabilities will have short code words and vice versa,since a symbol with probability Dni will be isolated after ni divisions and thus will have ni elements in its codeword, as required by Eq. (17). TABLE 1.2.2 Construction of an Optimal Code; D = 3 SourceA prioriStep symbols probabilities xip(xi) 12 3Final codex1 1/31 1x2 1/90101x3 1/90000x4 1/90 101x51/27 11 1 11 1x61/27 11 0 11 0x71/27 11 1111x81/27 10 1 10 1x91/27 10 0 10 0 x101/27 10 1101 x111/27 1 1 1 111 x121/27 1 1 0 110 x131/27 1 1 111 1 Note: Average code length L = 2 code elements per symbol: source entropy H(X) = 2 log2 3 bits per symbol. H (X )L= log2 3 Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com) Copyright 2004 The McGraw-Hill Companies. All rights reserved.Any use is subject to the Terms of Use as given at the website. 17. Christiansen_Sec_01.qxd10/27/0410:19 AMPage 1.17INFORMATION SOURCES, CODES, AND CHANNELS INFORMATION SOURCES, CODES, AND CHANNELS1.17TABLE 1.2.3 Construction of Huffman Code; D = 2The code resulting from the process just discussed is sometimes called the Shannon-Fano code. It is apparent that the same encoding procedure can be followed whether or not the source probabilities satisfy Eq. (17). The set of symbols xi is simply divided into D groups with probabilities as nearly equal as possible. The procedure is sometimes ambiguous, however, and more than one Shannon-Fano code may be possible. The ambiguity arises, of course, in the choice of approximately equiprobable subgroups.For the general case where Eq. (17) is not satisfied, a procedure owing to Huffman guarantees an optimal code, i.e., one with minimum average length. This procedure for code alphabet of arbitrary size D is as follows: 1. As before, arrange the M source symbols in order of decreasing probability. 2. As before, arrange the code elements in an arbitrary but fixed order, that is, a1, a2, . . . , aD. 3. Combine (sum) the probabilities of the D least likely symbols and reorder the resulting M (D 1) prob-abilities; this step will be called reduction 1. Repeat as often as necessary until there are D ordered proba-bilities remaining. Note: For the binary case (D = 2), it will always be possible to accomplish this reductionin M 2 steps. When the size of the code alphabet is arbitrary, the last reduction will result in exactly Dordered probabilities if and only if M = D + n(D 1)where n is an integer. If this relationship is not satisfied, dummy source symbols with zero probabilityshould be added. The entire encoding procedure is followed as before, and at the end the dummy symbolsare thrown away. 4. Start the encoding with the last reduction which consists of exactly D ordered probabilities; assign the ele-ment a1 as the first digit in the code words for all the source symbols associated with the first probability;assign a2 to the second probability; and ai to the ith probability. 5. Proceed to the next to the last reduction; this reduction consists of D + (D 1) ordered probabilities for anet gain of D 1 probabilities. For the D new probabilities, the first code digit has already been assignedand is the same for all of these D probabilities; assign a1 as the second digit for all source symbols associ-ated with the first of these D new probabilities; assign a2 as the second digit for the second of these D newprobabilities, etc. 6. The encoding procedure terminates after 1 + n(D 1) steps, which is one more than the number of reductions. As an illustration of the Huffman coding procedure, a binary code is constructed in Table 1.2.3.CHANNEL CAPACITY The average mutual information I(X; Y) between an information source and a destination was given by Eqs. (25) and (26) as I(X; Y) = H(Y) H(Y X) = H(X) H(X Y ) 0(19)Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)Copyright 2004 The McGraw-Hill Companies. All rights reserved. Any use is subject to the Terms of Use as given at the website. 18. Christiansen_Sec_01.qxd10/27/0410:19 AM Page 1.18 INFORMATION SOURCES, CODES, AND CHANNELS 1.18INFORMATION, COMMUNICATION, NOISE, AND INTERFERENCEThe average mutual information depends not only on the statistical characteristics of the channel but also onthe distribution p(xi) of the input alphabet X. If the input distribution is varied until Eq. (19) is a maximum fora given channel, the resulting value of I(X; Y) is called the channel capacity C of that channel; i.e., C = max I ( X ; Y )(20) p ( xi )In general, H(X), H(Y ), H(X Y ), and H(Y X) all depend on the input distribution p(xi). Hence, in the generalcase, it is not a simple matter to maximize Eq. (19) with respect to p(xi).All the measures of information that have been considered in this treatment have involved only probabilitydistributions on X and Y. Thus, for the model of Fig. 1.1.1, the joint distribution p(xi, yj) is sufficient. Supposethe source [and hence the input distribution p(xi)] is known; then it follows from the usual conditional-proba-bility relationship p(xi, yj) = p(xi)p(yj xi) (21)that only the distribution p(yj xi) is needed for p(xi yj) to be determined. This conditional probability p(yj xi)can then be taken as a description of the information channel connecting the source X and the destination Y.Thus, a discrete memoryless channel can be defined as the probability distribution p(yj xi)xi X and yj Y (22)or, equivalently, by the channel matrix D, where p( y1 x1 ) p( y2 x 2 ) . . . p( yN x1 ) p( y1 x 2 ) p( y2 x 2 ) . . . p( yN x 2 ) (23)D = [ p( y j | xi )] = p( y1 x M ) .... . . p( yN x M ) A number of special types of channels are readily distinguished. Some of the simplest and/or most inter-esting are listed as follows:(a) Lossless Channel. Here H(X Y) = 0 for all input distribution p(xi), and Eq. (20) becomes C = max H ( X ) = log M(24) p( xi )This maximum is obtained when the xi are equally likely, so that p(xi) = 1/M for all i. The channel capaci-ty is equal to the source entropy, and no source information is lost in transmission.(b) Deterministic Channel. Here H(Y X) = 0 for all input distributions p(xi), and Eq. (20) becomes C = max H (Y ) = log N p(25)( xi )This maximum is obtained when the yj are equally likely, so that p(yj) = 1/N for all j. Each member of theX set is uniquely associated with one, and only one, member of the destination alphabet Y.(c) Symmetric Channel. Here the rows of the channel matrix D are identical except for permutations, and thecolumns are identical except for permutations. If D is square, rows and columns are identical except forpermutations. In the symmetric channel, the conditional entropy H(Y X) is independent of the input dis-tribution p(xi) and depends only on the channel matrix D. As a consequence, the determination of channelcapacity is greatly simplified and can be written NC = log N + p( y j xi ) log p( y j xi ) (26) j =1 Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com) Copyright 2004 The McGraw-Hill Companies. All rights reserved.Any use is subject to the Terms of Use as given at the website. 19. Christiansen_Sec_01.qxd10/27/04 10:19 AM Page 1.19INFORMATION SOURCES, CODES, AND CHANNELSINFORMATION SOURCES, CODES, AND CHANNELS 1.19 This capacity is obtained when the yi are equally likely, so that p(yj) = 1/N for all j. (d) Binary Symmetric Channel (BSC). This is the special case of a symmetric channel where M = N = 2. Herethe channel matrix can be written p 1 p D=(27) 1 p p and the channel capacity is C = log 2 G(p)(28)where the function G(p) is defined as G(p) = [p log p + (1 p) log (1 p)] (29) This expression is mathematically identical to the entropy of a binary source as given in Eq. (5) and is plot- FIGURE 1.2.1 Capacity of the binary symmetric channel.ted in Fig. 1.1.3 using base 2 logarithms. For the same base, Eq. (28) is shown as a function of p in Fig. 1.2.1. As expected, the channel capacity is large if p, the prob-ability of correct transmission, is either close to unity or to zero. If p = 1/2, there is no statistical evidencewhich symbol was sent and the channel capacity is zero.DECISION SCHEMES A decision scheme or decoding scheme B is a partitioning of the Y set into M disjoint and exhaustive sets B1, B2, , BM such that when a destination symbol yk falls into set Bi, it is decided that symbol xi was sent. Implicit in this definition is a decision rule d(yj), which is a function specifying uniquely a source symbol for each des- tination symbol. Let p(e yj) be the probability of error when it is decided that yj has been received. Then the total error probability p(e) isN p(e) = p( y j ) p(e y j )(30)j =1 For a given decision scheme b, the conditional error probability p(e yj) can be writtenp(e yj) = 1 p[d(yj) yj](31) where p[d(yj) yj] is the conditional probability p(xi yj) with xi assigned by the decision rule; i.e., for a given decision scheme d(yj) = xi. The probability p(yj) is determined only by the source a priori probability p(xi) and by the channel matrix = D [p(yj xi)]. Hence, only the term p(e yj) in Eq. (30) is a function of the decision scheme. Since Eq. (30) is a sum of nonnegative terms, the error probability is a minimum when each summand is a minimum. Thus, the term p(e yj) should be a minimum for each yj. It follows from Eq. (31) that the minimum- error scheme is that scheme which assigns a decision rule d(yj) = x*j = 1, 2, . . . , N (32) where x* is defined byp(x* yj) p(xi yj) i = 1, 2, . . . , M(33) In other words, each yj is decoded as the a posteriori most likely xi. This scheme, which minimizes the proba- bility of error p(e), is usually called the ideal observer.Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)Copyright 2004 The McGraw-Hill Companies. All rights reserved. Any use is subject to the Terms of Use as given at the website. 20. Christiansen_Sec_01.qxd10/27/0410:19 AMPage 1.20 INFORMATION SOURCES, CODES, AND CHANNELS 1.20INFORMATION, COMMUNICATION, NOISE, AND INTERFERENCE The ideal observer is not always a completely satisfactory decision scheme. It suffers from two majordisadvantages: (1) For a given channel D, the scheme is defined only for a given input distribution p(xi). Itmight be preferable to have a scheme that was insensitive to input distributions. (2) The scheme minimizesaverage error but does not bound certain errors. For example, some symbols may always be receivedincorrectly. Despite these disadvantages, the ideal observer is a straightforward scheme which does mini-mize average error. It is also widely used as a standard with which other decision schemes may becompared. Consider the special case where the input distribution is p(xi) = 1/M for all i, so that all xi are equally likely.Now the conditional likelihood p(xi yj) isp( xi ) p( y j xi ) p( y j xi )p( xi | y j ) == (34)p( y j ) Mp( y j )For a given yj, that input xi is chosen which makes p(yj xi) a maximum, and the decision rule isd(yj) = xj = 1, 2, . . . , N(35)where x is defined byp(yj x) p(yj xi)i = 1, 2, . . . , M (36)The probability of error becomes N p( y j x ) p(e) = p( y j ) 1 (37)j =1 Mp( y j ) This decoder is sometimes called the maximum-likelihood decoder or decision scheme. It would appear that a relationship should exist between the error probability p(e) and the channel capacity C.One such relationship is the Fano bound, given by H(X Y) G[p(e)] + p(e) log (M 1)(38)and relating error probability to channel capacity through Eq. (20). Here G() is the function already definedby Eq. (29). The three terms in Eq. (38) can be interpreted as follows:H(X Y) is the equivocation. It is the average additional information needed at the destination after recep-tion to completely determine the symbol that was sent.G[p(e)] is the entropy of the binary system with probabilities p(e) and 1 p(e). In other words, it is theaverage amount of information needed to determine whether the decision rule resulted in an error.log (M 1) is the maximum amount of information needed to determine which among the remaining M 1symbols was sent if the decision rule was incorrect; this information is needed with probability p(e). THE NOISY-CODING THEOREMThe concept of channel capacity was discussed earlier. Capacity is a fundamental property of an informationchannel in the sense that it is possible to transmit information through the channel at any rate less than thechannel capacity with arbitrarily small probability of error. This result is called the noisy-coding theorem orShannons fundamental theorem for a noisy channel. The noisy-coding theorem can be stated more precisely as follows: Consider a discrete memoryless chan-nel with nonzero capacity C; fix two numbers H and such that 00 (40) Let us transmit m messages u1, u2, . . . , um by code words each of length n binary digits. The positive integer n can be chosen so thatm 2nH(41) In addition, at the destination the m sent messages can be associated with a set V = {n1, n2, . . . , nm} of received messages and with a decision rule d(nj) = uj such thatp[d ( j ) j ] 1 (42) i.e., decoding can be accomplished with a probability of error that does not exceed . There is a converse to the noise-coding theorem which states that it is not possible to produce an encoding procedure which allows transmission at a rate greater than channel capacity with arbitrarily small error.ERROR-CORRECTING CODES The codes considered earlier were designed for minimum length in the noiseless-transmission case. For noisy channels, the noisy-coding theorem guarantees the existence of a code which will allow transmis- sion at any rate less than channel capacity and with arbitrarily small probability of error; however, the theorem does not provide a constructive procedure to devise such codes. Indeed, it implies that very long sequences of source symbols may have to be considered if reliable transmission at rates near channel capacity are to be obtained. In this section, we consider some of the elementary properties of simple error-correcting codes; i.e., codes which can be used to increase reliability in the transmission of infor- mation through noisy channels by correcting at least some of the errors that occur so that overall proba- bility of error is reduced.The discussion will be restricted to the BSC, and the noisy-coding theorem notation will be used. Thus, a source alphabet X = {x1, x2, . . . , xm} of M symbols will be used to form a message set U of m messages uk, where U = {u1, u2, . . . , um}. Each uk will consist of a sequence of the xis. Each message uk will be encoded into a sequence of n binary digits for transmission over the BSC. At the destination, there exists a set V = {n1, n2, . . . , n2n} of all possible binary sequences of length n. The inequality m 2n must hold. The problem is to associate with each sent message uk a received message nj so that p(e), the overall probability of error, is reduced.In the discussion of the noisy-coding theorem, a decoding scheme was used that examined the received message nj and identified it with the sent message uk, which differed from it in the least number of binary dig- its. In all the discussions here it will be assumed that this decoder is used. Let us define the Hamming distance d(nj, nk) between two binary sequences nj and nk of length n as the number of digits in which nj and nk disagree. Thus, if the distance between two sequences is zero, the two sequences are identical. It is easily seen that this distance measure has the following four elementary properties:d(nj, nk) 0 with equality if and only if nj = nk (43)d(nj, nk) = d(nk, nj)(44)d(nj, nl) d(nj, nk) + d(nk, nl)(45)d(nj, nk) n(46) The decoder we use is a minimum-distance decoder. As mentioned earlier, the ideal-observer decoding scheme is a minimum-distance scheme for the BSC.Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)Copyright 2004 The McGraw-Hill Companies. All rights reserved. Any use is subject to the Terms of Use as given at the website. 22. Christiansen_Sec_01.qxd10/27/0410:19 AM Page 1.22INFORMATION SOURCES, CODES, AND CHANNELS 1.22INFORMATION, COMMUNICATION, NOISE, AND INTERFERENCEIt is intuitively apparent that the sent messages should be represented by code words that all have the great-est possible distances between them. Let us investigate this matter in more detail by considering all binarysequences of length n = 3; there are 2n = 23 = 8 such sequences, viz., 000001011111010110100101It is convenient to represent these as the eight corners of a unit cube, as shown in Fig. 1.2.2a, where the x axiscorresponds to the first digit, the y axis to the second, and the z axis to the third. Although direct pictorial rep-resentation is not possible, it is clear that binary sequences of length n greater than 3 can be considered as thecorners of the corresponding n-cube.Suppose that all eight binary sequences are used as code words to encode a source. If any binary digit ischanged in transmission, an error will result at the destination since the sent message will be interpreted incor-rectly as one of the three possible messages that differ in one code digit from the sent message. This situationis illustrated in Fig. 1.2.2 for the code words 000 and 111. A change of one digit in each of these code wordsproduces one of three possible other code words.Figure 1.2.2 suggests that only two code words, say 000 and 111, should be used. The distance between thesetwo words, or any other two words on opposite corners of the cube, is 3. If only one digit is changed in the trans-mission of each of these two code words, they can be correctly distinguished at the destination by a minimum-dis-tance decoder. If two digits are changed in each word in transmission, it will not be possible to make this distinction.This reasoning can be extended to sequences containing more than three binary digits. For any n 3, sin-gle errors in each code word can be corrected. If double errors are to be corrected without fail, there must beat least two code words with a minimum distance between them of 5; thus, for this case, binary code words oflength 5 or greater must be used.Note that the error-correcting properties of a code depend on the distance d(nj, nk) between the code words.Specifically, single errors can be corrected if all code words employed are at least a distance of 3 apart, doubleerrors if the words are at a distance of 5 or more from each other, and, in general, q-fold errors can be corrected if d(nj, nk) 2q + 1 jk(47)Errors involving less than q digits per code word can also be corrected if Eq. (63) is satisfied. If the distancebetween two code words is 2q, there will always be a group of binary sequences which are in the middle, i.e.,a distance q from each of the two words. Thus, by the proper choice of code words, q-fold errors can be detect-ed but not corrected ifd(nj, nk) = 2qjk(48) FIGURE 1.2.2 Representation of binary sequences as the corners of an n-cube, n = 3; (a) the eight binary sequences of length 3; (b) shift in sequences 000 and 111 from a single error. Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com) Copyright 2004 The McGraw-Hill Companies. All rights reserved.Any use is subject to the Terms of Use as given at the website. 23. Christiansen_Sec_01.qxd10/27/0410:19 AM Page 1.23INFORMATION SOURCES, CODES, AND CHANNELS INFORMATION SOURCES, CODES, AND CHANNELS1.23TABLE 1.2.4 Parity-Check Code for Single-Error DetectionMessage digitsCheck digitWord Message digit Check digit Word000 00000 110 0 1100100 11001 101 0 1010010 10101 011 0 0110001 10011 111 1 1111 Now consider the maximum number of code words r that can be selected from the set of 2n possible binary sequences of length n to form a code that will correct all single, double, . . . , q-fold errors. In the example of Fig. 1.2.2, the number of code words selected was 2. In fact, it can be shown that there is no single-error-cor- recting code for n = 3, 4 containing more than two words. Suppose we consider a given code consisting of the words . . . , uk, uj, . . . . All binary sequences of distance q or less from uk must belong to uk, and to uk only, if q-fold errors are to be corrected. Thus, associated with uk are all binary sequences of distance 0, 1, 2, . . . , q from uk. The number of such sequences is given by n n n n q n + + ++ = (49) 0 1 2 q i = 0 i Since there are r of the code words, the total number of sequences associated with all the code words isq n r i=0 i This number can be no larger than 2n, the total number of distinct binary sequences of length n. Therefore the following inequality must hold:q n2n r 2n or r (50) i=0 i n i=0 i q This is a necessary upper bound on the number of code words that can be used to correct all errors up to and including q-fold errors. It can be shown that it is not sufficient. Consider the eight possible distinct binary sequences of length 3. Suppose we add one binary digit to each sequence in such a way that the total number of 1s in the sequence is even (or odd, if you wish). The result is shown in Table 1.2.4. Note that all the word sequences of length 4 differ from each other by a distance of at least 2. In accordance with Eq. (48), it should be possible now to detect single errors in all eight sequences. The detection method is straightforward. At the receiver, count the number of 1s in the sequence; if the num- ber is odd, a single error (or, more precisely, an odd number of errors) has occurred; if the number is even, no error (or an even number of errors) has occurred. This particular scheme is a good one if only single errors are likely to occur and if detection only (rather than correction) is desired. Such is often the case, for example, in closed digital systems such as computers. The added digit is called a parity-check digit, and the scheme is a very simple example of a parity-check code.PARITY-CHECK CODES More generally, in parity-check codes, the encoded sequence consists of n binary digits of which only k < n are information digits while the remaining l = n k digits are used for error detection and correction and are called check digits or parity checks. The example of Table 1.2.4 is a single-error-detecting code, but, in general, q-foldDownloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)Copyright 2004 The McGraw-Hill Companies. All rights reserved. Any use is subject to the Terms of Use as given at the website. 24. Christiansen_Sec_01.qxd10/27/0410:19 AMPage 1.24 INFORMATION SOURCES, CODES, AND CHANNELS 1.24INFORMATION, COMMUNICATION, NOISE, AND INTERFERENCEerrors can be detected and/or corrected. As the number of errors to be detected and/or corrected increases, thenumber l of check digits must increase. Thus, for fixed word length n, the number of information digits k = n lwill decrease as more and more errors are to be detected and/or corrected. Also the total number of words inthe code cannot exceed the right side of Eq. (50) or the number 2k.Parity-check codes are relatively easy to implement. The simple example given of a single-error-detectingcode requires only that the number of 1s in each code word be counted. In this light, it is of considerable impor-tance to note that these codes satisfy the noisy-coding theorem. In other words, it is possible to encode a sourceby parity-check coding for transmission over a BSC at a rate approaching channel capacity and with arbitrar-ily small probability of error. Then, from Eq. (41), we have 2nH = 2k (51)or H, the rate of transmission, is given by H = k/n(52)As n , the probability of error p(e) approaches zero. Thus, in a certain sense, it is sufficient to limit a studyof error-correcting codes to the parity-check codes.As an example of a parity-check code, consider the simplest nondegenerate case where l, the number ofcheck digits, is 2 and k, the number of information digits, is 1. This system is capable of single-error detectionand correction, as we have already decided from geometric considerations. Since l + k = 3, each encoded wordwill be three digits long. Let us denote this word by a1a2a3, where each ai is either 0 or 1. Let a1 represent theinformation digit and a2 and a3 represent the check digits.Checking for errors is done by forming two independent equations from the three ai, each equation beingof the form of a modulo-2 sum, i.e., of the form 0 ai = a j ai a j = 1 ai a j Take the two independent equations to be a2 a3 = 0 anda 1 a3 = 0for an even-parity check. For an odd-parity check, let the right sides of both of these equations be unity. If thesetwo equations are to be satisfied, the only possible code words that can be sent are 000 and 111. The other sixwords of length 3 violate one or both of the equations.Now suppose that 000 is sent and 100 is received. A solution of the two independent equations gives, forthe received word,a 2 a3 = 0 0 = 0a 1 a3 = 1 0 = 1The check yields the binary check number 1, indicating that the error is in the first digit a1, as indeed it is. If111 is sent and 101 received, thena2 a3 = 0 1 = 1a 1 a3 = 1 1 = 0and the binary check number is 10, or 2, indicating that the error is in a2. In the general case, a set of l independent linear equations is set up in order to derive a binary checkingnumber whose value indicates the position of the error in the binary word. If more than one error is to be detect-ed and corrected, the number l of check digits must increase, as discussed previously. In the example just treated, the l check digits were used only to check the k information digits immediatelypreceding them. Such a code is called a block code, since all the information digits and all the check digits arecontained in the block (code word) of length n = k + l. In some encoding procedures, the l check digits mayalso be used to check information digits appearing in preceding words. Such codes are called convolutional or Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com) Copyright 2004 The McGraw-Hill Companies. All rights reserved.Any use is subject to the Terms of Use as given at the website. 25. Christiansen_Sec_01.qxd 10/27/04 10:19 AMPage 1.25INFORMATION SOURCES, CODES, AND CHANNELS INFORMATION SOURCES, CODES, AND CHANNELS 1.25 recurrent codes. A parity-check code (either block or convolutional) where the word length is n and the num- ber of information digits is k is usually called an (n, k) code.OTHER ERROR-DETECTING AND ERROR-CORRECTING CODES Unfortunately, a general treatment of error-detecting and error-correcting codes requires that the code struc- ture be cast in a relatively sophisticated mathematical form. The commonest procedure is to identify the code letters with the elements of a finite (algebraic) field. The code words are then taken to form a vector subspace of n-tuples over the field. Such codes are called linear codes or, sometimes, group codes. Both the block codes and the convolutional codes mentioned in the previous paragraph fall in this category. An additional constraint often imposed on linear codes is that they be cyclic. Let a code word a be repre- sented by a = (a0, a1, a2, . . . , an1) Then the ith cyclic permutation ai is given by ai = (ai, ai+1, . . . , an1, a0, a1, . . . , ai1). A linear code is cyclic if, and only if, for every word a in the code, there is also a word ai in the code. The permutations need not be distinct and, in fact, generally will not be. The eight code words00000110100110100011110001011111 constitute a cyclic set. Included in the cyclic codes are some of those most commonly encountered such as the Bose and Ray-Chaudhuri (BCH) codes and shortened Reed-Muller codes.CONTINUOUS-AMPLITUDE CHANNELS The preceding discussion has concerned discrete message distributions and channels. Further, it has been assumed, either implicitly or explicitly, that the time parameter is discrete, i.e., that a certain number of messages, symbols, code digits, and so forth, are transmitted per unit time. Thus, we have been concerned with discrete-amplitude, discrete-time channels and with messages which can be modeled as discrete random processes with discrete parameter. There are three other possibilities, depending on whether the process ampli- tude and the time parameter have discrete or continuous distributions. We now consider the continuous-amplitude, discrete-time channel, where the input messages can be mod- eled as continuous random processes with discrete parameter. It will be shown later that continuous-time cases of engineering interest can be treated by techniques which amount to the replacement of the continuous parameter by a discrete parameter. The most straightforward method involves the application of the sampling theorem to band-limited processes. In this case the process is sampled at equispaced intervals of length 1/2W, where W is the highest frequency of the process. Thus the continuous parameter t is replaced by the discrete parameter tk = k/2W, k = . . . , 1, 0, 1, . . . . Let us restrict our attention for the moment to continuous-amplitude, discrete-time situations. The discrete density p(xi), i = 1, 2, . . . , M, of the source-message set is replaced by the continuous density fx(x), where, in gen- eral, < x < , although the range of x may be restricted in particular cases. In the same way, other discrete densities are replaced by continuous densities. For example, the destination distribution p(yj), j = 1, 2, . . . , N, becomes fy(y), and the joint distribution p(xi, yj) will be called f2(x, y). In analogy with the discrete-amplitude case [Eq. (4)], the entropy of a continuous distribution fx(x) can be defined as H (X ) = f ( x ) log f x ( x ) dx (53) xDownloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)Copyright 2004 The McGraw-Hill Companies. All rights reserved. Any use is subject to the Terms of Use as given at the website. 26. Christiansen_Sec_01.qxd10/27/04 10:19 AMPage 1.26 INFORMATION SOURCES, CODES, AND CHANNELS 1.26INFORMATION, COMMUNICATION, NOISE, AND INTERFERENCEThis definition is not completely satisfactory because of some of the properties of this new H(X). For exam-ple, it can be negative and it can depend on the coordinate system used to represent the message.Joint and conditional entropies can also be defined in exact analogy to the discrete case discussed in Chap. 1.1.If the joint density f2(x, y) exists, then the joint entropy H(X, Y) is given by H (X , Y ) = f2 ( x , y) log f2 ( x , y) dx dy (54) and the conditional entropies H(X Y) and H(Y X) are f2 ( x , y) H ( X Y ) = f2 ( x , y) logdx dy(55)f y ( y)and f2 ( x , y) H ( X Y ) = f2 ( x , y) logdx dy(56)f y ( y)where fx ( x ) = f ( x , y) dy and f y ( y) = f ( x , y) dx 2 2The average mutual information follows from Eq. (15) and is f x ( x ) f y ( y) I (X; Y ) = f2 ( x , y) logdx dy(57) f2 ( x , y)Although the entropy of a continuous distribution can be negative, positive, or zero, the average mutual infor-mation I(X; Y) 0 with equality when x and y are statistically independent, i.e., when f2(x, y) = fx(x)fy(y). MAXIMIZATION OF ENTROPY OF CONTINUOUS DISTRIBUTIONSThe entropy of a discrete distribution is a maximum when the distribution is uniform, i.e., when all outcomesare equally likely. In the continuous case, the entropy depends on the coordinate system, and it is possible tomaximize this entropy subject to various constraints on the associated density function. The Maximization of H(X) for a Fixed Variance of x. Maximizing H(X) subject to the constraint that x 2 fx ( x ) dx = 2 (58)yields the gaussian density f x ( x ) = (1 / 2 )e x 2 / 2 2< x < (59)Thus, for fixed variance, the normal distribution has the largest entropy. The entropy in this case is H(X) = 1/2 ln 2ps 2 + 1/2 ln e = 1/2 ln 2pes 2 (60) Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com) Copyright 2004 The McGraw-Hill Companies. All rights reserved.Any use is subject to the Terms of Use as given at the website. 27. Christiansen_Sec_01.qxd10/27/0410:19 AM Page 1.27 INFORMATION SOURCES, CODES, AND CHANNELS INFORMATION SOURCES, CODES, AND CHANNELS 1.27 This last result will be of considerable use later. For convenience, the natural logarithm has been used, and the units of H are nats.The Maximization of H(X) for a Limited Peak Value of x. In this case, the single constraint is M fx ( x ) dx = 1M(61) One obtains the uniform distribution 1 / 2 M x Mfx ( x ) = 0 x > M and, the associated entropy is 11H (X ) = M logdx = log 2 M(62)M2M 2MThe Maximization of H(X) for x Limited to Nonnegative Values and a Given Average Value. The constraints0f x ( x ) dx = 1 and 0 x fx ( x ) dx = (63) lead to the exponential distribution 0x 2 W(80)k Such a signal can be represented in terms of its sample taken at the Nyquist sampling times, tk = k = 0, 1, . . . via the sampling representation2W k sin (2Wt k ) f (t ) = f 2W k = 2Wt k (81)Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)Copyright 2004 The McGraw-Hill Companies. All rights reserved. Any use is subject to the Terms of Use as given at the website. 30. Christiansen_Sec_01.qxd 10/27/04 10:19 AM Page 1.30INFORMATION SOURCES, CODES, AND CHANNELS 1.30INFORMATION, COMMUNICATION, NOISE, AND INTERFERENCEThis expression is sometimes called the Cardinal series or Shannons sampling theorem. It relates the discretetime domain {k/2W} with sample values f(k/2W) to the continuous time domain {t} of the function f (t). The interpolation functionk (t ) = (sin 2 Wt ) / 2 Wt(82)has a Fourier transform K(w) given byK ( ) = / 1 4 W < 2 W (83) 0 > 2 WAlso the shifted functions k(t k/2W) has the Fourier transformF{k(t k/2W)} = K(w)ejwk/2W (84)Therefore, each term on the right side of Eq. (81) is a time function which is strictly band-limited (2pW,2pW). Note also that k sin (2Wt k ) 1 t = tk = k / 2W kt == 2W (85) 2 Wt k0 t = tn , n kThus, this sampling function k(t k/2W) is zero at all Nyquist instants except tk, where it equals unity.Suppose that a function h(t) is not strictly band-limited to at least (2pW, 2pW) rad/s and an attempt ismade to reconstruct the function using Eq. (81) with sample values spaced 1/2W s apart. It is apparent thatthe reconstructed signal [which is strictly band-limited (2pW, 2pW), as already mentioned] will differ fromthe original. Moreover, a given set of sample values {f(k/2W)} could have been obtained from a whole classof different signals. Thus, it should be emphasized that the reconstruction of Eq. (81) is unambiguous only forsignals strictly band-limited to at least (2pW, 2pW) rad/s. The set of different possible signals with the sameset of sample values {f(k/2W)} is called the aliases of the band-limited signal f(t).Let us now consider a signal (random process) X(t) with autocorrelation function given byRx ( ) = E{X (t ) X (t + )} (86)and power spectral density x ( ) = Rx (r )e jwr dr(87) which is just the Fourier transform of Rx(t). The process will be assumed to have zero mean and to be strictlyband-limited (2pW, 2pW) in the sense that the power special density fx(w) vanishes outside this interval; i.e., x ( ) = 0 > 2 W(88)It has been noted that a deterministic signal f(t) band-limited (2pW, 2pW) admits the sampling represen-tation of Eq. (81). It can also be shown that the random process X(t) admits the same expansions; i.e., k sin (2Wt k ) X (t ) = X 2W 2Wt k (89)k = Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com) Copyright 2004 The McGraw-Hill Companies. All rights reserved.Any use is subject to the Terms of Use as given at the website. 31. Christiansen_Sec_01.qxd10/27/0410:19 AMPage 1.31INFORMATION SOURCES, CODES, AND CHANNELSINFORMATION SOURCES, CODES, AND CHANNELS 1.31 The right side of this expression is a random variable for each value of t. The infinite sum means thatlim E{| X (t ) X N (t ) |2 } = 0 N whereN k sin(2Wt k ) X N (t ) = X 2W 2Wt k=k N Thus, the process X(t) with continuous time parameter t can be represented by the process X(k/2W), k = . . . , 2, 1, 0, 1, 2, . . . , with discrete time parameter tk = k/2W. For band-limited signals or channels it is sufficient, therefore, to consider the discrete-time case and to relate the results to continuous time through Eq. (89).Suppose the continuous-time process X(t) has a spectrum jx(w) which is flat and band-limited so that N | | 2 W x ( ) = 0 (90) 0| | > 2W Then the autocorrelation function passes through zero at intervals of 1/2W so thatRx(k/2W) = 0 k = . . . , 2, 1, 1, 2, . . . (91) Thus, samples spaced k/2W apart are uncorrelated if the power spectral density is flat and band-limited (2pW, 2pW). If the process is gaussian, the samples are independent. This implies that continuous-time band-limited (2pW, 2pW) gaussian channels, where the noise has a flat spectrum, have a capacity C given by Eq. (79) as C = 1/2 ln (1 + Sp /Np) (nats/sample) (92) Here Np is the variance of the additive, flat, band-limited gaussian noise and Sp is Rx(0), the fixed variance of the input signal. The units of Eq. (92) are on a per sample basis. Since there are 2W samples per unit time, the capacity C per unit time can be written as C = W ln (1 + Sp /Np) (nats/s)(93)The ideas developed thus far in this section have been somewhat abstract notions involving information sources and channels, channel capacity, and the various coding theorems. We now look more closely at con- ventional channels. Many aspects of these topics fall into the area often called modulation theory.Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)Copyright 2004 The McGraw-Hill Companies. All rights reserved. Any use is subject to the Terms of Use as given at the website. 32. Christiansen_Sec_01.qxd 10/27/0410:19 AM Page 1.32 Source: STANDARD HANDBOOK OF ELECTRONIC ENGINEERINGCHAPTER 1.3MODULATIONGeoffrey C. Orsak, H. Vincent Poor, John B. Thomas MODULATION THEORYAs discussed in Chap. 1.1 and shown in Fig. 1.1.1, the central problem in most communication systems is thetransfer of information originating in some source to a destination by means of a channel. It will be convenientin this section to call the sent message or intelligence a(t) and to denote the received message by a*(t), a dis-torted or corrupted version of a(t).The message signals used in communication and control systems are usually limited in frequency range tosome maximum frequency fm = wm /2p Hz. This frequency is typically in the range of a few hertz for controlsystems and moves upward to a few megahertz for television video signals. In addition the bandwidth of thesignal is often of the order of this maximum frequency so that the signal spectrum is approximately low-passin character. Such signals are often called video signals or baseband signals. It frequently happens that thetransmission of such a spectrum through a given communication channel is inefficient or impossible. In thislight, the problem may be looked upon as the one shown in Fig. 1.1.2, where an encoder E has been addedbetween the source and the channel; however, in this case, the encoder acts to modulate the signal a(t), pro-ducing at its output the modulated wave or signal m(t).Modulation can be defined as the modification of one signal, called the carrier, by another, called the mod-ulating signal. The result of the modulation process is a modulated wave or signal. In most cases a frequencyshift is one of the results. There are a number of reasons for producing modulated waves. The following listgives some of the major ones.(a) Frequency Translation for Efficient Antenna Design. It may be necessary to transmit the modulating sig-nal through space as electromagnetic radiation. If the antenna used is to radiate an appreciable amount ofpower, it must be large compared with the signal wavelength. Thus translation to higher frequencies (andhence to smaller wavelengths) will permit antenna structures of reasonable size and cost at both transmit-ter and receiver.(b) Frequency Translation for Ease of Signal Processing. It may be easier to amplify and/or shape a signalin one frequency range than in another. For example, a dc signal may be converted to ac, amplified, andconverted back again.(c) Frequency Translation to Assigned Location. A signal may be translated to an assigned frequency bandfor transmission or radiation, e.g., in commercial radio broadcasting.(d) Changing Bandwidth. The bandwidth of the original message signal may be increased or decreased bythe modulation process. In general, decreased bandwidth will result in channel economies at the cost offidelity. On the other hand, increased bandwidth will be accompanied by increased immunity to channeldisturbances, as in wide-band frequency modulation or in spread-spectrum systems, for examples.(e) Multiplexing. It may be necessary or desirable to transmit several signals occupying the same frequencyrange or the same time range over a single channel. Various modulation techniques allow the signals toshare the same channel and yet be recovered separately. Such techniques are given the generic name of 1.32 Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com) Copyright 2004 The McGraw-Hill Companies. All rights reserved.Any use is subject to the Terms of Use as given at the website. 33. Christiansen_Sec_01.qxd10/27/0410:19 AMPage 1.33 MODULATION MODULATION1.33 FIGURE 1.3.1Communication system involving modulation and demodulation. multiplexing. As will be discussed later, multiplexing is possible in either the frequency domain (frequency- domain multiplexing FDM) or in the time domain (time-domain multiplexing, TDM). As a simple exam- ple, the signals may be translated in frequency so that they occupy separate and distinct frequency ranges as mentioned in item (b).Thus, the process of modulation can be considered as a form of encoding used to match the message signal arising from the information source to the communication channel. At the same time it is generally true that the channel itself has certain undesirable characteristics resulting in distortion of the signal during trans- mission. A part of such distortion can frequently be accounted for by postulating noise disturbances in the channel. These noises may be additive and may also affect the modulated wave in a more complicated fash- ion, although it is usually sufficient (and much simpler) to assume additive noise only. Also, the received signal must be decoded (demodulated) to recover the original signal.In view of this discussion, it is convenient to change the block diagram of Fig. 1.1.2 to that shown in Fig. 1.3.1. The waveform received at the demodulator (receiver) will be denoted by r(t), where r(t) = m[t, a(t), p(t)] + n(t)(1) where a(t) is the original message signal, m[t, a(t)] is the modulated wave,m[t, a(t), p(t)] is a corrupted ver- sion of m[t, a(t)], and p(t) and n(t) are noises whose characteristics depend on the channel. Unless it is absolute- ly necessary for an accurate characterization of the channel, we will assume that p(t) 0 to avoid the otherwise complicated analysis that results. The aim is to find modulators M and demodulators M1 that make a(t) a good estimate of the message signal a(t). It should be emphasized that M1 is not uniquely specified by M; for example, it is not intended to imply that MM1 = 1. The form of the demodulator, for a given modulator, will depend on the characteristics of the message a(t) and the channel as well as on the criterion of goodness of estimation used. We now take up a study of the various forms of modulation and demodulation, their principal characteris- tics, their behavior in conjunction with noisy channels, and their advantages and disadvantages. We begin with some preliminary material or signals and their properties.ELEMENTS OF SIGNAL THEORY A real time function f(t) and its Fourier transform form a Fourier transform pair given by F ( ) = f (t )e jt dt(2)Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)Copyright 2004 The McGraw-Hill Companies. All rights reserved. Any use is subject to the Terms of Use as given at the website. 34. Christiansen_Sec_01.qxd 10/27/0410:19 AM Page 1.34 MODULATION 1.34INFORMATION, COMMUNICATION, NOISE, AND INTERFERENCE FIGURE 1.3.2 Duration and delay: (a) typical pulse; (b) inte- gral of pulse.and1 f (t ) = 2 F ( )e jt d (3)It follows directly from Eq. (2) that the transform F(w) of a real-time function has an even-symmetric real partand an odd-symmetric imaginary part.Consider the function f (t) shown in Fig. 1.3.2a. This might be a pulsed signal or the impulse responseof a linear system, for example. The time T over which f (t) is appreciably different from zero is calledthe duration of f (t), and some measure, such as td, of the center of the pulse is called the delay of f (t). Insystem terms, the quantity T is the system response time or rise time, and td is the system delay. The inte-gral of f (t), shown in Fig. 1.3.2b, corresponds to the step-function response of a system with impulseresponse f(t).If the function f (t) of Fig. 1.3.2 is nonnegative, the new functionf (t ) f (t ) dtis nonnegative with unit area. We now seek measures of duration and delay that are both meaningful in termsof communication problems and mathematically tractable. It will be clear that some of the results we obtainwill not be universally applicable and, in particular, must be used with care when the function f (t) can be neg-ative for some values of t; however, the results will be useful for wide classes of problems.Consider now a frequency function F(w), which will be assumed to be real. If F(w) is not real, eitherF(w) 2 = F(w)F(w) or F(w) can be used. Such a function might be similar to that shown in Fig. 1.3.3a. Theradian frequency range W (or the frequency range F) over which F(w) is appreciably different from zero iscalled the bandwidth of the function. Of course, if the function is a bandpass function, such as that shown inFig. 1.3.3b, the bandwidth will usually be taken to be some measure of the width of the positive-frequency Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com) Copyright 2004 The McGraw-Hill Companies. All rights reserved.Any use is subject to the Terms of Use as given at the website. 35. Christiansen_Sec_01.qxd10/27/04 10:19 AMPage 1.35 MODULATIONMODULATION 1.35FIGURE 1.3.3 Illustrations of bandwidth: (a) typical low-pass frequency function;(b) typical bandpass frequency function. (or negative-frequency) part of the function only. As in the case of the time function previously discussed, we may normalize to unit area and consider F ( ) F ( ) d Again this new function is nonnegative with unit area.Consider now the Fourier pair f(t) and F(w) and let us change the time scale by the factor a, replacing f(t) by a f (at) so that both the old and the new signal have the same area, i.e., f (t ) dt = af (at ) dt(4) For a < 1, the new signal af (at) is stretched in time and reduced in height; its duration has been increased. For a > 1, af (at) has been compressed in time and increased in height; its duration has been decreased. The transform of this new function is af (at )e jt dt = f ( x )e j ( / a) x dt = F a (5) The effect on the bandwidth of F(w) has been the opposite of the effect on the duration of f (t). When the sig- nal duration is increased (decreased), the bandwidth is decreased (increased) in the same proportion. From the discussion, we might suspect that more fundamental relationships hold between properly defined durations and bandwidths of signals.Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)Copyright 2004 The McGraw-Hill Companies. All rights reserved. Any use is subject to the Terms of Use as given at the website. 36. Christiansen_Sec_01.qxd10/27/0410:19 AMPage 1.36 MODULATION 1.36INFORMATION, COMMUNICATION, NOISE, AND INTERFERENCE DURATION AND BANDWIDTHUNCERTAINTY RELATIONSHIPSIt is apparent from the discussion above that treatments of duration and bandwidth are mathematically similaralthough one is defined in the time domain and the other in the frequency domain. Several specific measuresof these two quantities will now be found, and it will be shown that they are intimately related to each otherthrough various uncertainty relationships. The term uncertainty arises from the Heisenberg uncertaintyprinciple of quantum mechanics, which states that it is not possible to determine simultaneously and exactlythe position and momentum coordinates of a particle. More specifically, if x and p are the uncertainties inposition and momentum, then x p h(6)where h is a constant. A number of inequalities of the form of Eq. (6) can be developed relating the durationT of a signal to its (radian) bandwidth . The value of the constant h will depend on the definitions of dura-tion and bandwidth.Equivalent Rectangular Bandwidth DW1 and Duration DT1.The equivalent rectangular bandwidth DW1of a frequency function F(w) is defined as 1 = F ( ) d (7)F ( 0 )where w0 is some characteristic center frequency of the function F(w). It is clear from this definition that the orig-inal function F(w) has been replaced by a rectangular function of equal area, width W1, and height F(w0). For thelow-pass case (w0 0), it follows from Eqs. (2) and (3) that Eq. (7) can be rewritten 2 f (0) 1 = (8) f (t ) dtwhere f(t) is the time function which is the inverse Fourier transform of F(w). The same procedure can be followed in the time domain, and the equivalent rectangular duration T1 ofthe signal f (t) can be defined by T1 = f (t ) dt(9) f (t0 )where t0 is some characteristic time denoting the center of the pulse. For the case where t0 0, it is clear, then,from Eqs. (8) and (9) that equivalent rectangular duration and bandwidth are connected by the uncertainty rela-tionshipT1 W1 = 2p (10)Second-Moment Bandwidth DW2 and Duration DT2. An alternative uncertainty relationship is based onthe second-moment properties of the Fourier pair F(w) and f(t). A second-moment bandwidth W2 can be defined by1 ( 2 )2 = 2 ( )2 F ( ) 2 d /(11) Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com) Copyright 2004 The McGraw-Hill Companies. All rights reserved.Any use is subject to the Terms of Use as given at the website. 37. Christiansen_Sec_01.qxd10/27/04 10:19 AMPage 1.37 MODULATION MODULATION 1.37 and a second-moment duration T2 by ( T2 )2 = (t t )2 f (t ) 2 dt / (12) where the total energy is given by 1 =2 F ( ) 2 d = f (t ) 2dt = 1 These quantities are related by the inequality2 T2 1/2 (13) This expression is a second uncertainty relationship connecting the bandwidth and duration of a signal. Many other such inequalities can be obtained.CONTINUOUS MODULATION Modulation can be defined as the modification of one signal, called the carrier, by another, called the modu- lation, modulating signal, or message signal. In this section we will be concerned with situations where the carrier and the modulation are both continuous functions of time. Later we will treat the cases where the car- rier and/or the modulation have the form of pulse trains. For our analysis, Fig. 1.3.1 can be modified to the system shown in Fig. 1.3.4 in which the message is sent through a modulator (or transmitter) to produce the modulated continuous signal m[t, a(t)]. This waveform is corrupted by additive noise n(t) in transmission so that the received (continuous) waveform r(t) can be written r (t) = m[t, a(t)] + n(t)(14) The purpose of the demodulator (or receiver) is to produce some best estimatea*(t) of the original message signal a(t). As pointed out earlier a more general model of the transmission medium would allow corruption of the modulated waveform itself so that the received signal was of the form of Eq. (1). For example, in wire- less systems, multiplicative disturbances can result because of multipath transmission or fading so that the received signal is of the formr1(t) = p(t)m[t, a(t)] + n(t) (15)FIGURE 1.3.4 Communication-system model for continuous modulation and demodulation.Downloaded from Digital Engineering Library @ McGraw-Hill (www.digitalengineeringlibrary.com)Copyright 2004 The McGraw-Hill Companies. All rights reserved. Any use is subject to the Terms of Use as given at the website. 38. Christiansen_Sec_01.qxd10/27/0410:19 AMPage 1.38 MODULATION 1.38INFORMATION, COMMUNICATION, NOISE, AND INTERFERENCEwhere both p(t) and n(t) are noises. However, we shall not treat such systems, confining ourselves to the sim-pler additive-noise model of Fig. 1.3.4. LINEAR, OR AMPLITUDE, MODULATIONIn a general way, linear (or amplitude) modulation (AM) can be defined as a system where a carrier wave c(t)has its amplitude varied linearly by some message signal a(t). More precisely, a waveform is linearly modu-lated (or amplitude-modulated) by a given message a(t) if the partial derivative of that waveform with respectto a(t) is independent of a(t). In other words, the modulated m[t, a(t)] can be written in the form m[t, a(t)] = a(t)c(t) + d(t)(16)where c(t) and d(t) are independent of a(t). Now we havem[t , a(t )]= c(t )(17)a(t )and c(t) will be called the carrier. In most of the cases we will treat, the waveform d(t) will either be zero orwill be linearly related to c(t). It will be more convenient, therefore, to write Eq. (16) asm[t, a(t)] = b(t)c(t)(18)where b(t) will be eitherb1(t) 1 + a(t) (19)or b2(t) a(t)(20)Also, at present it will be sufficient to allow the carrier c(t) to be of the form c(t) = C cos (w0t + q)(21)where C and w0 are constants and q is either a constant or a random variable uniformly distributed on (0, 2p). Whenever Eq. (19) applies, it will be convenient to assume that b1(t) is nearly always nonnegative. Thisimplies that if a(t) is a deterministic signal, then a(t) 1 (22)It also implies that if a(t) is a random process, the probability that a(t) is less than 1 in any finite interval(T, T) is arbitrarily small; i.e., p[a(t) < 1] < 1 < T t T(23)The purpose of these restrictions on b1(t) is to ensure that the carrier is not overmodulated and that the mes-sage signal a(t) is easily recovered by simple receivers. W