review of probability theory - guceee.guc.edu.eg/courses/communications/comm502 communication...
TRANSCRIPT
Digital Communication Systems
Source of Information
User of Information
Source Encoder
Channel Encoder
Modulator
Source Decoder
Channel Decoder
De-Modulator
Channel Communication systems are designed to transmit the information generated by a source to some destination.
Analog : The output is analog signals (examples: TV and
Radio broadcasting).
Discrete : The output is discrete (a sequence of letters or
symbols (example: Computers, storage devices,…).
Types of Information Sources
Types of Information Sources
Analog Discrete
Source Encoder • Whether a source is analog or discrete, a digital
communication system is designed to transmit information in digital form.
• Consequently, the output of the source must be converted to a format that can be transmitted digitally.
• This conversion of the source output to a digital form is generally performed by the source encoder, whose output is a sequence of binary digits.
• Ex: ASCII code which converts characters to binary bits.
Why Source Coding is Important
It enables us to determine: - The amount of information from a given source.
- The minimum storage and bandwidth needed to
transfer data from a given source. - The limit on the transmission rate of information
over a noisy channel?
- Data compression.
Fixed Length Codes • The standard character codes are of fixed length such as 5, 6, or 7
bits. The length is usually chosen so that there are enough binary characters to assign a unique binary sequence to each input alphabet character.
• Fixed length codes have the property that character boundaries are separated by a fixed bit count.
• Example:
• This allows the conversion of a serial data stream to a parallel data stream by a simple bit counter.
Variable Length Codes
• Data compression codes are often variable length codes. We expect that the length of a binary sequence assigned to each alphabet symbol is inversely related to the probability of that symbol.
• A significant amount of data compression can be realized when there is a wide differences in probabilities of the symbols. To achieve this compression, there must also be a sufficiently large number of symbols.
Discrete Information Source
• Assume
– Information Source Generates a group of symbols from a given alphabet
– Each symbol has a probability:
- Symbols are independent
Information Source
0 1 1KS s ,s ,..., s
Pr 0 1 1k ks p ,k , ,...,K
K-1
0
1kk
p
Information
Source
11 ...,,, Ko SSSS
kk pS Pr
kp
S
Measure of Information
• If =1:
- The occurrence of the event does not correspond to any gain of information (i.e there is no uncertainty). In this case, there is no need for communications because the receiver knows everything.
• As decreases,
– The uncertainty increases
– The reception of sk corresponds to some gain in information. BUT HOW MUCH?
kp
kp
Measure of Information (Cont.)
• Information is measured by:
(1) Self information
(2) Entropy
• We use the probability theory to quantify and measure “information”.
) Self Information1(
• The amount of information in bits about a symbol is closely related to its probability of occurrence.
• A function which measures the amount of information after observing the symbol sk is
the self-information:
1 0
)( ksI
kSP
]bits[log1
log)( 22 k
k
k pp
sI
Properties of Self Information
Properties of I(s):
1) I(s) 0 (a real nonnegative measure).
2)
3) I(s) is a continuous function of p.
ikik PPifSISI )()(
1 0
)( ksI
kSP
) Entropy2( • Entropy is the average
amount of information of a finite discrete source,
• More precisely, it is the average number of bits per symbol required to describe that source.
• For a source containing N independent symbols, its entropy is defined as
Unit of entropy: bits/symbol (Infor. bits/symb)
N
i
ii
ii
N
i
ii
i
ppHThen
pSISince
SIp
SIEH
1
2
2
1
)(log
)(log)(
)(
))((
N
i
ii ppH1
2 )(log
Properties of Entropy
•
• The unit bit is a measure of information content and is not to be confused with the term “bit” meaning “binary digit”.
• If all a priori probabilities are equally likely
( for all N symbols) then the entropy
is maximum and given by:
• Then
NPi /1
0HquantitypositiveaisH
NH 2log0
NH 2log
Proof
N
N
NNN
NN
ppH
Np
N
i
N
i
ii
i
2
2
2
1
2
1
2
log
)/1(log
)]/1(log[)/1(
)/1(log)/1(
)(log
symbolsallfor/1
likelyequally are iesprobabilit priori a all If
NH 2log0
A source puts out one of five possible messages during each
message interval. The probs. of these messages are p1 = 2
1 ; p2 =
4
1 ;
p3 = 4
1 : p4 =
16
1, p5 =
16
1
What is the information content of these messages?
I (m1) = - log2
2
1 = 1 bit I (m2) = - log2
4
1 = 2 bits
I (m3) = - log
8
1 = 3 bits I (m4) = - log2
16
1 = 4 bits
I (m5) = - log2
16
1 = 4 bits
Example
Entropy Example
• Find and plot the entropy of the binary code in which the probability of occurrence for the symbol 1 is p and for the symbol 0 is 1-p
pppp
PPH i
1i
i
1log1log
log
22
2
2
bit/symbol 12
1
2
1
2
1log
2
1
2
1log
2
1
2
122 Hp
lbits/symbo 0.8113 4
3log
4
3
4
1log
4
1
4
122 Hp
bit/symbol 01;bit/symbol 00 HpHp
0 1 p
1 H
1/2 v logv 0
as v 0
Average Information Content in English Language
Calculate the average information in bits/character in English assuming each letter is equally likely
Since characters do not appear with the same frequency in English, use the probabilities
•P=0.70 for a,e,o,t
•P=0.2 for h,i,n,r,s
•P=0.08 for c,d,f,l,m,p,u,y
•P=0.02 for b,g,j,k,q,v,w,x,z
26
21
1 1log
26 26
4.7 /
i
H
bits char
Solve this problem
Source Coding –Objective
• Efficient representation of data generated by an information source
What Does the word EFFIECIENT Mean?
• Efficient Source Coding means:
- Minimum average number of bits per
source symbol
How could we be EFFICIENT in Source Coding?
• By using knowledge of the statistics of the source
• Clearly:
– Frequent source symbols should be assigned
• SHORT CODEWORDS
– Rare source symbols should be assigned
• LONGER CODEWORDS
• Example
– Morse Code
• E is represented by: “.”
• Q is represented by: “__ __ . __”
Morse Code • Morse code is a method of transmitting text
information as a series of on-off tones, lights, or clicks that can be directly understood by a skilled listener or observer without special equipment.
• The letters is transmitted as standardized sequences of short and long signals called "dots" and "dashes“.
• The duration of a dash is three times the duration of a dot. Each dot or dash is followed by a short silence, equal to the dot duration.
Morse Code • The letters of a word are separated by a space equal to
three dots (one dash), and the words are separated by a space equal to seven dots.
• The dot duration is the basic unit of time measurement in code transmission.
• For efficiency, the length of each character in Morse is approximately inversely proportional to its frequency of occurrence in English. Thus, the most common letter in English, the letter "E," has the shortest code, a single dot.
Average Code Length
• Source has K symbols
• Each symbol sk has probability pk
• Each symbol sk is represented by a codeword ck of length vk bits
• Average codeword length
• Variance of the code length:
Information Source
Source Encoder
sk ck
K
k
kk vpL1
K
k
kk Lvp1
22 )(
Example: Average Codeword Length
= 0.25(2) +0.30(2) + 0.12(3) +0.15(3) + 0.18(2) = 2.27 bits
It does not mean that we have to
find a way to transmit a
noninteger number of bits.
Rather, it means that on the average
the length of the code is 2.27 bits.
Symbol p (S) Code
A 0.25 11
B 0.30 00
C 0.12 010
D 0.15 011
E 0.18 10
L
K
k
kk vpL1
Calculate the variance of the code length
Code Efficiency
• represents the average number of bits per source
symbol used in the source encoding process.
• If denotes the minimum possible codeword length, the coding efficiency of the source encoder is defined as
L
minL
minL
L
• An efficient code means η1
• What is Lmin ?
Shannon’s First Theorem: The Source Coding Theorem
HL S
Lmin = H(S)
The outputs of an information source cannot be represented by a source code whose average length is less than the source entropy
Compression Ratio
• We define the compression ratio as:
Code efficiency • We define the code efficiency as:
• It measures how much the code achieves from the possible compression ratio.
codelengthvariabletheoflengthcodeAverage
symbolstherepresentsthatcodefixedtheofbitsofNumberCR
L
L
L
SH min)(
lengthcodeAverage
Entropy
Example
• Source Entropy:
H(S) =1/2log2(2)+1/4log2(4)+ 1/8log2(8)+1/log2(8)= 1.75 bits/symbol= Lmin
Source Symbolsk
Symbol Probability
pk
Code I Code II
Symbol Code word
ck
Code word Length
vk
Symbol Code word
ck
Code word Length
vk
s0 1/2 00 2 0 1
s1 1/4 01 2 10 2
s2 1/8 10 2 110 3
s3 1/8 11 2 1111 4
Code I
1 1 1 12 2
2 4 8 8
7 40 875
2
L
.
Code II
1 1 1 1 71 2 3 3
2 4 8 8 4
7 41
7 4
L
4
1.875 0.9333
1.875
067.1875.1
2CR
N
i
ii ppH1
2 )/1(log
K
k
kk vpL1
L
SH )( 1.75 1.75
12
2CR
bits