information & entropy. shannon information axioms small probability events should have more...
TRANSCRIPT
Shannon Information Axioms
Small probability events should have more information than large probabilities.– “the nice person” (common words lower info)– “philanthropist” (less used more information)
Information from two disjoint events should add– “engineer” Information I1– “stuttering” Information I2– “stuttering engineer” Information I1 + I2
Information Units
log2 – bits
loge – naps
log10 – ban or a hartley
Ralph Vinton Lyon Hartley (1888-1970) inventor of the electronic oscillator circuit that bears his name, a pioneer in the field
of Information Theory
Illustration
Q: We flip a coin 10 times. What is the probability we come up the sequence
0 0 1 1 0 1 1 1 0 1? Answer
How much information do we have?
10
2
1
p
bits 102
1log)(log
10
22
pI
Entropy
Bernoulli trial with parameter pInformation from a success =Information from a failure =(Weighted) Average Information
Average Information = Entropy
p2log p 1log2
ppppH 1log)1()(log 22
p
ph
qnpqp
pqpqH
n
n
nn
n
)(
log)(log
)(log
22
2
Entropy of a Geometric RV
then
otherwise ; 0
,...,...,2,1,0;)1()(
kxppxp
k
X
H = 2 bits when p=0.5
k
kk
K
k
pkk
K
k
p
qp
HqpqpH
21
21
log
log),(
Relative Entropy
],...,,,[
],...,,,[
321
321
K
K
qqqqq
ppppp
xx 1ln
Relative Entropy Property Proof
Since
01
ln),(
111
1
k
K
kk
K
kk
kk
K
k
k
kk
K
k
qpp
qp
p
qpqpH
Uniform Probability is Maximum Entropy
Relative to uniform:
01
ln),(1
K
ppqpH kk
K
k
Thus, for K fixed, How does this relate How does this relate to thermodynamic to thermodynamic
entropy?entropy?
entropy maximum log2 KH
Entropy as an Information Measure: Like 20 Questions
16 Balls
Bill Chooses One
1 111
2 2
3 3
2 2
44
765 8
You must find which ball with binary questions. Minimize
the expected number of questions.
One Method...
1 2 3 4 65 8
yes
no
yes
no
yes
no
yes
no
yes
no
yes yes
no7
no
bits 3.18751651
716
17
16
16
16
15
16
1
48
13
8
12
4
11
4
1][
QE
Another (Better) Method...
yesyesno
1 2 3 4 65
87
?2 is X
?1 is X
?4 is X
?3 is X
?6 is Xno
yes
?5 is X
?7 is Xno
yes no yes no yes no
yes no
Longer paths have smaller probabilities.
1 111
2 2
3 3
2 2
44
765 8
bits 2.751644
416
14
16
14
16
14
16
1
38
13
8
12
4
12
4
1][
QE
yesyesno
1 2 3 4 65
87
?2 is X
?1 is X
?4 is X
?3 is X
?6 is Xno
yes
?5 is X
?7 is Xno
yes no yes no yes no
yes no
1 111
2 2
3 3
2 2
44
765 8
bits 2.751644][ QE
Relation to Entropy...
The Problem’s Entropy is...
1 111
2 2
3 3
2 2
44
765 8
bits 75.216
44
16
1log
16
14-
8
1log
8
12-
4
1log
4
1-2
2
22
H
Principle...
•The expected number of questions will equal or exceed the entropy.There can be equality only if all probabilities are powers of ½.
1 111
2 2
3 3
2 2
44
765 8
1 111
2 2
3 3
2 2
44
765 8
1 111
2 2
3 3
2 2
44
765 8
1 111
2 2
3 3
2 2
44
765 8
Principle Proof
1 111
2 2
3 3
2 2
44
765 8
K
k
k
1
12
Lemma: If there are k solutions and the length of the path to the k th solution is , then
k