information theory (part ii)
DESCRIPTION
Information theory (part II). From the form g(R) + g(S) = g(RS) one can expect that the function g( ) shall be a logarithm function. In a general format, the function can be written as g(x) = A ln(x) + C, where A and C are constants. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Information theory (part II)](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814c22550346895db926c5/html5/thumbnails/1.jpg)
Information theory (part II)
• From the form g(R) + g(S) = g(RS)
one can expect that the function g( ) shall be a logarithm function.• In a general format, the function can be written as g(x) = A ln(x) + C,
where A and C are constants.• From the earlier transformation g(x) = x f(1/x), one gets that the
uncertainty quantity, H, shall be (1/p)f(p) = A ln(p) + C, where p = 1/n (n is the total number of event).
• Therefore, f(p) = A*p*ln(p) + C• Given that if the probability is 1, the uncertainty H must be zero, the
constant C should be equal to ZERO.• Thus, f(p) = A*p*ln(p).• Since p is smaller than 1, ln(p) shall be minus and thus the constant
A is inherently negative
![Page 2: Information theory (part II)](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814c22550346895db926c5/html5/thumbnails/2.jpg)
• Following conventional notion, we write f(p) = -K*p*ln(p), where K is a positive coefficient.
• The uncertainty quantity H(p1, p2, …pn) = Σ f(pi)
• Thus, H(p1, p2, …pn) = Σ –K*pi*ln(pi) = -K Σpi*ln(pi)
• Example:
H(1/2, 1/3, 1/6) = -K*[1/2ln(1/2) + 1/3*ln(1/3) + 1/6*ln(1/6)]
= -K*(-0.346 – 0.377 – 0.299) = 1.01K
from the decomposed procedure,
H(1/2,1/2) + 1/2H(2/3, 1/3) = -K*[1/2ln(1/2) + 1/2ln(1/2)]
-1/2*K*[2/3*ln(2/3) + 1/3ln(1/3)]
= -K(-0.346-0.346) – K/2*(-0.27 – 0.366)
= 1.01K
• For equal probable events, pi = 1/n, H = K*ln(n)
![Page 3: Information theory (part II)](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814c22550346895db926c5/html5/thumbnails/3.jpg)
• In a binary case, where two possible outcomes of an experiment with probabilities, p1 and p2 with p1 + p2 = 1
H = - K*[p1ln(p1) + p2ln(p2)]
• To determine H value when p1 is 0 or 1, one need L’Hopital’s rule
lim[u(x)/v(x)] as x approaches 0 equals lim[u’(x)/v’(x)]• Therefore, as p1 approaches 0,
lim[p1ln(p1)] = lim[(1/x)/(-1/x2)] = 0
• The uncertainty is therefore 0 when either p1 or p2 is zero!• Under what value of p1 while H reaches the maximum?
differentiate eq - K*[p1ln(p1) + p2ln(p2)] against p1 and set the derivative equal 0
dH/dp1 = -K*[ln(p1) + p1/(p1) - ln(1- p1) – (1- p1)/ (1- p1)] = 0
which leads to p1 = 1/2
![Page 4: Information theory (part II)](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814c22550346895db926c5/html5/thumbnails/4.jpg)
Unit of Information
• Choosing 2 as the basisof the logarithm and take K = 1, one gets H = 1
• We call the unit of information a bit for binary event.
• Decimal digit, H = log2(10) = 3.32, thus a decimal digit contains about 3 and 1/3 bits of information.
![Page 5: Information theory (part II)](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814c22550346895db926c5/html5/thumbnails/5.jpg)
Linguistics
• A more refined analysis works in terms of component syllables. One can test what is significant in a syllable in speech by swapping syllables and seeing if meaning or tense is changed or lost. The table gives some examples of the application of this statistical approach to some works of literature.
![Page 6: Information theory (part II)](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814c22550346895db926c5/html5/thumbnails/6.jpg)
Linguistics
• The type of interesting results that arise from such studies include:
(a) English has the lowest entropy of any major language, and
(b) Shake speare’s work has the lowest entropy of any author studied.
• These ideas are now progressing beyond the .scientific level and are impinging on new ideas of criticism. Here as in biology, the thermodynamic notions can be helpful though they must be applied with caution because concepts such as ‘quality’ cannot be measured as they are purely subjective
![Page 7: Information theory (part II)](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814c22550346895db926c5/html5/thumbnails/7.jpg)
Maximum entropy
• The amount of uncertainty:
• Examples on the connection between entropy and uncertainty (gases in a partitioned container)
• The determination of the probability that has maximum entropy.
ii
n
i
ni ppKH ln1
![Page 8: Information theory (part II)](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814c22550346895db926c5/html5/thumbnails/8.jpg)
• Suppose one knows the mean value of some particular variable x,
• Where the unknown probabilities satisfy the condition:
• In general there will be a large number of probability distributions consistent with the above information.
• We will determine the one distribution which yields the largest uncertainty (i.e. information).
• We need Lagrange multiplier to carry out the analysis
ix
n
i
ip1
1
![Page 9: Information theory (part II)](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814c22550346895db926c5/html5/thumbnails/9.jpg)
• Where
• and
• Then,
• Solving for ln(pi):
0
jjj
H
1i
i
iip
0ln
i
ii
i
i
i
iij jj
K
0ln jj
jj kk
jj lne
jj
![Page 10: Information theory (part II)](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814c22550346895db926c5/html5/thumbnails/10.jpg)
Determine the new Lagrange multipliers λ and u
• So that
• We define the partition function
• Then
j
j
ee
1
j
j
j
j
ee
j
j
ez
ze
j
j
zln
![Page 11: Information theory (part II)](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814c22550346895db926c5/html5/thumbnails/11.jpg)
Determine multiplier u
• We have
• Therefore,
j
jje
z
zz
zln
1
zK
xkjZ
K
K
e
H
j
j
jj
ln
ln
lnmax
![Page 12: Information theory (part II)](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814c22550346895db926c5/html5/thumbnails/12.jpg)
The connection to statistical thermodynamics
• The entropy is define as
• Then
N
NjNj
NjNNj
NjNjNjN
NjNjNjNNN
NjNks
j
jj
j j
j
ln
lnln
lnln
lnln
!ln!lnln
N
Nj
N
NjkNS
n
j
ln1
![Page 13: Information theory (part II)](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814c22550346895db926c5/html5/thumbnails/13.jpg)
• A disordered system is likely to be in any number of different quantum states. If Nj = 1 for N different states and Nj= 0 for all other available states,
• The above function is positive and increases with increasing N.
• Associating Nj/N with the probability pj
• The expected amount of information we would gain is a measure of our lack of knowledge of the state of the system.
• Negative entropy (negentropy)
NkNNN
kNSn
j
ln1
ln1
1
HK
kNkNS
j
jj
ln
![Page 14: Information theory (part II)](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814c22550346895db926c5/html5/thumbnails/14.jpg)
• The boltzmann distribution for non-degenerate energy state
• Where
ZN
Nj e kTJ
j
kTj
eZ
ZNkjZT
Nj
kTj
eS ln1
max
e kTj
jj
j
j
j
jj
Z
N
N
NNNU
ZNkT
US lnmax
![Page 15: Information theory (part II)](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814c22550346895db926c5/html5/thumbnails/15.jpg)
Summary• Information theory is an extension of thermodynamics and
probability theory. Much of the subject is associated with the names of Brillouin and Shannon. It was originally concerned with passing messages on telecom munication systems and with assessing the efficiency of codes. Today it is applied to a wide range of problems, ranging from the analysis of language to the design of computers.
• In this theory the word ‘information’ is used in a special sense. Sup pose that we are initially faced with a problem about which we have no ‘information’ and that there are P possible answers. When we are given some ‘information’ this has the effect of reducing the number of possible answers and if we are given enough ‘information’ we may get to a unique answer. The effect of increased information is thus to reduce the uncer tainty about a situation. In a sense, therefore, informatiouis the antithesis of entropy since entropy is a measure of the randomness or disorder of a system. This contrast led to the coining of the word rtegentropy to describe information.
• The basic unit of information theory is the bit—a shortened form of ‘binary digit’.
![Page 16: Information theory (part II)](https://reader035.vdocuments.us/reader035/viewer/2022070413/56814c22550346895db926c5/html5/thumbnails/16.jpg)
• For example, if one is given a playing card face down without any information, it could be any one of 52; if one is then told that it is an ace, it could be any one of 4; if told that it is also a spade, one knows for certain which card one has. As we are given more information, The situation becomes more certain. In general, to determine which of the P possible outcomes is realized, the required information I is defined as
• H = K ln P