computing and communications 2. information theory -entropy
TRANSCRIPT
1896 1920 1987 2006
Computing and Communications2. Information Theory
-Entropy
Ying Cui
Department of Electronic Engineering
Shanghai Jiao Tong University, China
2017, Autumn
1
Outline
• Entropy
• Joint entropy and conditional entropy
• Relative entropy and mutual information
• Relationship between entropy and mutual information
• Chain rules for entropy, relative entropy and mutual information
• Jensen’s inequality and its consequences
2
Information Theory
• Information theory answers two fundamental questions in communication theory
– what is the ultimate data compression?
-- entropy H
– what is the ultimate transmission rate of communication? -- channel capacity C
• Information theory is considered as a subset of communication theory
5
A Mathematical Theory of Commun.
• In 1948, Shannon published “A Mathematical Theory of Communication”, founding Information Theory
• Shannon made two major modifications having huge impact on communication design
– the source and channel are modeled probabilistically
– bits became the common currency of communication
7
A Mathematical Theory of Commun.
• Shannon proved the following three theorems– Theorem 1. Minimum compression rate of the source is its entropy
rate H
– Theorem 2. Maximum reliable rate over the channel is its mutual information I
– Theorem 3. End-to-end reliable communication happens if and only if H < I, i.e. there is no loss in performance by using a digital interface between source and channel coding
• Impacts of Shannon’s results– after almost 70 years, all communication systems are designed based
on the principles of information theory
– the limits not only serve as benchmarks for evaluating communication schemes, but also provide insights on designing good ones
– basic information theoretic limits in Shannon’s theorems have now been successfully achieved using efficient algorithms and codes
8
Definition
• Entropy is a measure of the uncertainty of a r.v.
• Consider discrete r.v. X with alphabet and p.m.f.
– log is to the base 2, and entropy is expressed in bits• e.g., the entropy of a fair coin toss is 1 bit
– define , since• adding terms of zero probability does not change the entropy
10
( ) Pr[ ], p x X x x
log 0 as 0x x x 0log 0 0
Example
– H(X)=1 bit when p=0.5• maximum uncertainty
– H(X)=0 bit when p=0 or 1• minimum uncertainty
– concave function of p
12
Joint Entropy
• Joint entropy is a measure of the uncertainty of a pair of r.v.s
• Consider a pair of discrete r.v.s (X,Y) with alphabet and p.m.f.s
15
,
( ) Pr[ ], ( ) Pr[ ], p x X x x p y Y y y ,
Conditional Entropy
• Conditional entropy of a r.v. (Y) given another r.v. (X)
– expected value of entropies of conditional distributions, averaged over conditioning r.v.
16
Relative Entropy
• Relative entropy is a measure of the “distance” between two distributions
– convention:
– if there is any
22
0 00log 0, 0 log 0 and log
0 0
pp
q
such that ( ) 0 and ( ) 0, then ( || ) .x p x q x D p q
Mutual Information
• Mutual information is a measure of the amount of information that one r.v. contains about another r.v.
24
Convex & Concave Functions
• Examples:
38
2convex functions: , | |, , log (for 0)xx x e x x x
concave functions: log and (for 0)x x x
linear functions are both convex and concaveax b
[email protected]/Personal/yingcui
50