computing and communications 2. information theory -entropy

1896 1920 1987 2006

Computing and Communications2. Information Theory

-Entropy

Ying Cui

Department of Electronic Engineering

Shanghai Jiao Tong University, China

2017, Autumn

1

Outline

• Entropy

• Joint entropy and conditional entropy

• Relative entropy and mutual information

• Relationship between entropy and mutual information

• Chain rules for entropy, relative entropy and mutual information

• Jensen’s inequality and its consequences

2

Reference

• Elements of information theory, T. M. Cover and J. A. Thomas, Wiley

3

OVERVIEW

4

Information Theory

• Information theory answers two fundamental questions in communication theory

– what is the ultimate data compression?

-- entropy H

– what is the ultimate transmission rate of communication? -- channel capacity C

• Information theory is considered as a subset of communication theory

5

Information Theory

• Information theory has fundamental contributions to other fields

6

A Mathematical Theory of Commun.

• In 1948, Shannon published “A Mathematical Theory of Communication”, founding Information Theory

• Shannon made two major modifications having huge impact on communication design

– the source and channel are modeled probabilistically

– bits became the common currency of communication

7

A Mathematical Theory of Commun.

• Shannon proved the following three theorems– Theorem 1. Minimum compression rate of the source is its entropy

rate H

– Theorem 2. Maximum reliable rate over the channel is its mutual information I

– Theorem 3. End-to-end reliable communication happens if and only if H < I, i.e. there is no loss in performance by using a digital interface between source and channel coding

• Impacts of Shannon’s results– after almost 70 years, all communication systems are designed based

on the principles of information theory

– the limits not only serve as benchmarks for evaluating communication schemes, but also provide insights on designing good ones

– basic information theoretic limits in Shannon’s theorems have now been successfully achieved using efficient algorithms and codes

8

ENTROPY

9

Definition

• Entropy is a measure of the uncertainty of a r.v.

• Consider discrete r.v. X with alphabet and p.m.f.

– log is to the base 2, and entropy is expressed in bits• e.g., the entropy of a fair coin toss is 1 bit

– define , since• adding terms of zero probability does not change the entropy

10

( ) Pr[ ], p x X x x

log 0 as 0x x x 0log 0 0

Properties

– entropy is nonnegative

– base of log can be changed

11

Example

– H(X)=1 bit when p=0.5• maximum uncertainty

– H(X)=0 bit when p=0 or 1• minimum uncertainty

– concave function of p

12

Example

13

JOINT ENTROPY AND CONDITIONAL ENTROPY

14

Joint Entropy

• Joint entropy is a measure of the uncertainty of a pair of r.v.s

• Consider a pair of discrete r.v.s (X,Y) with alphabet and p.m.f.s

15

，

( ) Pr[ ], ( ) Pr[ ], p x X x x p y Y y y ，

Conditional Entropy

• Conditional entropy of a r.v. (Y) given another r.v. (X)

– expected value of entropies of conditional distributions, averaged over conditioning r.v.

16

Chain Rule

17

Chain Rule

18

Example

19

Example

20

RELATIVE ENTROPY AND MUTUAL INFORMATION

21

Relative Entropy

• Relative entropy is a measure of the “distance” between two distributions

– convention:

– if there is any

22

0 00log 0, 0 log 0 and log

0 0

pp

q

such that ( ) 0 and ( ) 0, then ( || ) .x p x q x D p q

Example

23

Mutual Information

• Mutual information is a measure of the amount of information that one r.v. contains about another r.v.

24

RELATIONSHIP BETWEEN ENTROPY AND MUTUAL INFORMATION

25

Relation

26

Proof

27

Illustration

28

CHAIN RULES FOR ENTROPY, RELATIVE ENTROPY, AND MUTUAL INFORMATION

29

Chain Rule for Entropy

30

Proof

31

Alternative Proof

32

Chain Rule for Information

33

Proof

34

Chain Rule for Relative Entropy

35

Proof

36

JENSEN'S INEQUALITY AND ITS CONSEQUENCES

37

Convex & Concave Functions

• Examples:

38

2convex functions: , | |, , log (for 0)xx x e x x x

concave functions: log and (for 0)x x x

linear functions are both convex and concaveax b

Convex & Concave Functions

39

Jensen’s Inequality

40

Information Inequality

41

Proof

42

Nonnegativity of Mutual Information

43

Max. Entropy Dist. – Uniform Dist.

44

Conditioning Reduces Entropy

45

Independence Bound on Entropy

46

Summary

47

Summary

48

Summary

49

[email protected]/Personal/yingcui

50

mailto:[email protected]

http://iwct.sjtu.edu.cn/Personal/yingcui

computing and communications 2. information theory -entropy

Documents