the origin of entropy

Post on 20-Feb-2016

29 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

The Origin of Entropy. Rick Chang. Agenda. Introduction Reference What is information? A straight forward way to derive the form of entropy A mathematical way to derive the form of entropy Conclusion. Introduction. We use entropy matrices - PowerPoint PPT Presentation

TRANSCRIPT

The Origin of EntropyRick Chang

TEIL

@ N

TU

Agenda• Introduction• Reference• What is information?• A straight forward way to derive the form of

entropy• A mathematical way to derive the form of

entropy• Conclusion

2

TEIL

@ N

TU

Introduction• We use entropy matrices

to measure dependencies of any pairs of genes, but why ?

• What is entropy?

3

TEIL

@ N

TU

Introduction – cont.• I will :

try to explain what information, entropy are

• I will not :tell you how entropy is related to GA - I don’t know (may be a future work)

4

TEIL

@ N

TU

References• A mathematical theory of communication

By C.E. Shannon 1949 part I , Appendix 2

• Information theory, Inference, and learning algorithms

By David J.C MacKay 2003 chapter 1, 4

• Information theory and reliable communication

By Robert G. Gallager 1976 chapter 2

5

TEIL

@ N

TU

Shannon 1916 ~ 2001

6

TEIL

@ N

TU

What is information?• Ensemble • The outcome x is the value of a random

variable, which takes on one of a set of possible values,

having probabilities

with and

7

( ) 1i x

ia AP x a

TEIL

@ N

TU

What is information?

8

TEIL

@ N

TU

What is information?

9

• Hartley R. V. L. “Transmission of Information “ :If the number of messages in the set is finite then this number or any monotonic function of this number can be regarded as a measure of the information produced when one message is chosen from the set, all choices being equally likely.

TEIL

@ N

TU

A Straight forward way• When we try to measure the influence of

event y to event x, we may consider

10

> 1 : when occurrence of event y increase our belief of event x

= 1 :event x and y are independent

< 1

TEIL

@ N

TU

A Straight forward way – cont.• We define the information provided about the

event x by the occurrence of event y is

11

> 0 : when appearance of event y increase our belief of event x

= 0 : event x and y are independent

< 0

TEIL

@ N

TU

Why use logarithmic?• More convenient1. practically more useful

2. nearer to our intuitive feeling

we intuitively measures entities by linear comparison

3. mathematically more suitableMany of the limiting operations are simple in terms of the

logarithm

12

TEIL

@ N

TU

Mutual information

= I (y ; x)

13

Mutual information between event x and event y

TEIL

@ N

TU

Mutual information – cont.• Mutual information => use logarithmic to quantify the difference between the belief of event x given event y and the belief of event x

=> the amount of uncertainty of event x we can resolve after the occurrence of event y

14

TEIL

@ N

TU

Self-information• Consider an event y, p(x | y) = 1

=> the amount of uncertainty of event x we resolve after we know event x will certainly occur

=> the priori uncertainty of the event x

• Define Self-information of event x

15

TEIL

@ N

TU

Intuitively

16

We know everything about the system

Our priori knowledge about event x

Information about the system

TEIL

@ N

TU

Intuitively – cont.

17

We know everything about the system

Our priori knowledge about event x

After we know event x will certainly occur

Information about the system

TEIL

@ N

TU

Intuitively – cont.

18

Information of event x

Information about the system

Uncertainty of event x

TEIL

@ N

TU

Conditional Self-information• Same, define conditional self-information of

event x, given the occurrence of event y

• We now have

19

( | )( ; ) log( ) log( ( | )) log( ( ))( )

( ) ( | )

p x yI x y p x y p xp x

I x I x y

TEIL

@ N

TU

Intuitively – cont.

20

We know everything about event x (we know event x will certainly occur)

Our priori knowledge about event x

After the occurrence of event y

Information about event x

TEIL

@ N

TU

Intuitively – cont.

21

Mutual Information between event x and event y

Information about event x

TEIL

@ N

TU

A Straight Forward Way – cont. • Like above, define self-information of event x and event y

• We now have

22

( , ) ( ) ( | )I x y I x I y x ( )( | )( )

p x yp y xp x

( , ) ( | ) ( ) ( ) ( ) ( ; )I x y I y x I x I x I y I x y ( ; ) ( ) ( | )I x y I y I y x

TEIL

@ N

TU

A Straight Forward Way – cont.

• The uncertainty of event y is never increased by knowledge of x

23

( ) ( ) ( , ) ( ) ( | )I x I y I x y I x I y x

( ) ( | )I y I y x

TEIL

@ N

TU

From instance to expectation• I(x;y)

• I(x)

• I(x|y)

• I(x,y)

• I(x;y)=I(x)-I(x|y)

• I(x,y)=I(x)+I(y)-I(x;y) 24

• I(X;Y)

• H(X)

• H(X|Y)

• H(X,Y)

• I(X;Y)=H(X)-H(X|Y)

• H(X,Y)=H(X)+H(Y)-I(X;Y)

Average

TEIL

@ N

TU

Relationship

25

H(X,Y)

H(X)

H(Y)

H(X|Y) I(X;Y) H(Y|X)

TEIL

@ N

TU

Entropy• The entropy of an ensemble is defined to be

the average value of the self-information of all event x

26

1

1( ) ( ) log( )

n

i

H X p xp x

Average priori uncertainty of an ensemble

TEIL

@ N

TU

Interesting Properties of H(X)• H = 0 if and only if all the but one are zero,

this one having the value unity. Thus only when we are certain of the outcome does H vanish. Otherwise H is positive.

• For a given n, H is a maximum and equal to log(n) when all the are equal, i.e., . This is also intuitively the most uncertain situation.

• Any change toward equalization of the probabilities

, …, increases H. 27

TEIL

@ N

TU

A mathematical way• Can we find a measure of how uncertain we

are of an ensemble ?

• If there is such a measure, say, it is reasonable to require of it the following properties:1. H should be continuous in the 2. If all the are equal, =1/n, then H should be a

monotonic increasing function of n. 3. If a choice be broken down into two successive

choices, the original H should be the weighted sum of the individual values of H. 28

TEIL

@ N

TU

A mathematical way – cont.3. If a choice be broken down into two successive

choices, the original H should be the weighted sum of the individual values of H.

29

1 1 1 1 1 1 2 1 1( , , ) ( , ) ( , ) (1)2 3 6 2 2 2 3 3 2

H H H H

Second choice occurs half the

time

TEIL

@ N

TU

A mathematical way – cont.• Theorem: The only H satisfying the three above

properties is of the form:

30

1

1logn

ii i

H K pp

TEIL

@ N

TU

A mathematical way – cont.• Proof: Let From property(3) we can decompose a choice from equally likely possibilities into a series of m choices from s equally likely possibilities and obtain

31

1 1 1( , ,..., ) ( )H A nn n n

mA(s ) ( )mA s

𝑠𝑚 𝑠𝑠

𝑠m

A(s)

TEIL

@ N

TU

A mathematical way – cont.• Similarly• We can choose n arbitrarily large and find an m to

satisfy

32

nA(t ) ( )nA t

m n m+1s st

log log ( 1) loglog 1log

log , is arbitrarily small (1)log

m s n t m sm t mn s n

m tn s

TEIL

@ N

TU

A mathematical way – cont.• from the monotonic property of A(n)

33

m n m+1 A(s ) ( ) (s )(s) ( ) ( 1) (s)

( ) 1( )

( ) , is arbitrarily small (2)( )

A t AmA nA t m Am A t mn A s n

m A tn A s

TEIL

@ N

TU

A mathematical way – cont.• From equation (1) and (2)

• We get A(t) = K log(t) , K must be positive to satisfy property (2)

34

( ) log( ) 2 , is arbitrarily small( ) log( )A t tA s s

TEIL

@ N

TU

A mathematical way – cont.• Now suppose we have a choice from n possibilities

with commeasurable probabilities where all are integers.

• We can break down a choice from possibilities into a choice from n possibilities with probabilities and then, if the was chosen, a choice from with equal probabilities.

35𝑛

𝑛1

𝑛𝑖

∑𝑛𝑛𝑖 𝑛2

TEIL

@ N

TU

A mathematical way – cont.• Using property (3) again, we equate the total

choice from as computed by two methods

36

𝑛𝑛1

𝑛𝑛

∑𝑛𝑛𝑖 𝑛𝑖

1log ( ,..., ) ( log )i n i in n

K n H p p K p n

TEIL

@ N

TU

A mathematical way – cont.• Hence

• If the pi are not commeasurable, they may be approximated by rational and the same expression must hold by our continuity assumption (property(1) ).

• The choice of coefficient K is a matter of convenience and amounts to the choice of a unit of measure.

37

1( ,..., )

1lo

[ log log ]

lo gg in

n i i i in n n

i

ni

i

n i

H p p K p n p n

K pp

nK pn

TEIL

@ N

TU

Conclusion• We first use a intuitive method to measure

information content of an event or an ensemble• We explain why we choose logarithm

intuitively • Mutual information, entropy is introduced• We show the relationship between

information content and uncertainty• At last, we set three assumptions and derive

the only way to measure information content and show that logarithm must be adopted. 38

TEIL

@ N

TU

Thanks39

top related